I am reading 200MB of CSV file with 1344953 rows then filtering and writing into other 18 files and all within 25 seconds, but someone in my class has done it under 8 seconds.
Here is my code:
<?php
$sid = array (
"188" => "188",
"203" => "203",
"206" => "206",
"209" => "209",
"213" => "213",
"215" => "215",
"228" => "228",
"270" => "270",
"271" => "271",
"375" => "375",
"395" => "395",
"452" => "452",
"447" => "447",
"459" => "459",
"463" => "463",
"481" => "481",
"500" => "500",
"501" => "501"
);
$Separator = ',';
array_map('unlink', glob("data_*.csv"));
$fp = fopen("fragment_org.csv", "r");
while (($line = stream_get_line($fp, 0, "n")) !== false) {
$data = explode(";",$line);
$siteID = $data[4];
$ts = $data[0];
$nox = $data[1];
$no2 = $data[2];
$no = $data[3];
$pm10 = $data[5];
$nvpm10 = $data[6];
$vpm10 = $data[7];
$nvpm25 = $data[8];
$pm25 = $data[9];
$vpm25 = $data[10];
$co = $data[11];
$o3 = $data[12];
$so2 = $data[13];
$loc = $data[17];
$latlong = $data[18];
$latlong_ext = explode(",",$latlong);
$lat = $latlong_ext[0];
$long = $latlong_ext[1];
$wrr = array( $siteID, $ts, $nox, $no2, $no, $pm10, $nvpm10, $vpm10, $nvpm25, $pm25, $vpm25, $co, $o3, $so2, $loc, $lat, $long );
if($siteID == $sid[$siteID]) {
$fip = fopen('data_'.$siteID.'.csv', 'a');
if(!$nox == "" || !$co == "") {
fputs($fip, implode($Separator,$wrr).PHP_EOL);
}
fclose($fip);
}
}
fclose($fp);
Much of the time is taking when writing to multiple files i.e., $fip becasue it is working faster if I am writing everything to a single file.
Can someone kindly guide me on how to optimize it more. Someone said to me this, “Simply read a line and write it out to a specific file. Avoid opening and closing lots of files repeatedly (very time consuming).” , but I am doing that.
Here is what the CSV file that I am reading looks like:
First 5 lines from CSV
Thanks for your time and looking into it.