2

I didn't expect this script (throw-away) to be leaking and I haven't figured out what the culprit is. Can you spot anything? Although this is throw-away code, I'm concerned that I'll repeat this in the future. I've never had to manage memory in PHP, but with the number of rows in the db, it's blowing up my php instance (already upped the memory to 1Gb).

The california table is especially larger than the others (currently 2.2m rows, less as I delete duplicate rows). I get a memory error on line 31 ($row = mysql_fetch_assoc($res))

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocat e 24 bytes) in C:\Documents and Settings\R\My Documents\My Webpages\cdiac\cdiac_ dup.php on line 31

PHP 5.3.0, mysql 5.1.36. part of a wamp install.

here's the entire code. the purpose of this script is to delete duplicate entries (data was acquired into segmented tables, which was far faster at the time, but now I have to merge those tables.)

what's causing it? something I'm overlooking? or do I just need to watch the memory size and call garbage collection manually when it gets big?

<?php

define('DBSERVER', 'localhost');
define('DBNAME', '---');
define('DBUSERNAME', '---');
define('DBPASSWORD', '---');

$dblink = mysql_connect(DBSERVER, DBUSERNAME, DBPASSWORD);
mysql_select_db(DBNAME, $dblink);


$state = "AL";
//if (isset($_GET['state'])) $state=mysql_real_escape_string($_GET['state']); 
if (isset($argv[1]) ) $state = $argv[1];

echo "Scanning $state\n\n";


// interate through listing of a state to check for duplicate entries (same station_id, year, month, day)
$DBTABLE = "cdiac_data_". $state;
$query = "select * from $DBTABLE ";
$query .= " order by station_id, year, month, day ";

$res = mysql_query($query) or die ("could not run query '$query': " . mysql_errno() . " " . mysql_error());

$last = "";
$prev_row;
$i = 1;
$counter = 0;
echo ".\n";
while ($row = mysql_fetch_assoc($res)) {  
  $current = $row["station_id"] . "_" . $row["year"] . "_" . sprintf("%02d",$row["month"]) . "_" . sprintf("%02d",$row["day"]);
  echo str_repeat(chr(8), 80) . "$i  $current ";
  if ($last == $current) {
    //echo implode(', ', $row) . "\n";

    // merge $row and $prev_row
    // data_id  station_id, state_abbrev, year, month,  day,  TMIN, TMIN_flags, TMAX, TMAX_flags, PRCP, PRCP_flags, SNOW, SNOW_flags, SNWD, SNWD_flags

    printf("%-13s %8s %8s\n", "data_id:", $prev_row["data_id"], $row["data_id"]);
    if ($prev_row["data_id"] == $row["data_id"]) echo " + ";

    $set = "";
    if (!$prev_row["TMIN"] && $row["TMIN"])  $set .= "TMIN = " . $row["TMIN"] . ", ";
    if (!$prev_row["TMIN_flags"] && $row["TMIN_flags"])   $set .= "TMIN_flags = '" . $row["TMIN_flags"] . "', ";
    if (!$prev_row["TMAX"] && $row["TMAX"])   $set .= "TMAX = " . $row["TMAX"] . ", ";
    if (!$prev_row["TMAX_flags"] && $row["TMAX_flags"])   $set .= "TMAX_flags = '" . $row["TMAX_flags"] . "', ";
    if (!$prev_row["PRCP"] && $row["PRCP"])   $set .= "PRCP = " . $row["PRCP"] . ", ";
    if (!$prev_row["PRCP_flags"] && $row["PRCP_flags"])   $set .= "PRCP_flags = '" . $row["PRCP_flags"] . "', ";
    if (!$prev_row["SNOW"] && $row["SNOW"])   $set .= "SNOW = " . $row["SNOW"] . ", ";
    if (!$prev_row["SNOW_flags"] && $row["SNOW_flags"])   $set .= "SNOW_flags = '" . $row["SNOW_flags"] . "', ";
    if (!$prev_row["SNWD"] && $row["SNWD"])   $set .= "SNWD = " . $row["SNWD"] . ", ";
    if (!$prev_row["SNWD_flags"] && $row["SNWD_flags"])   $set .= "SNWD_flags = '" . $row["SNWD_flags"] . "', ";

    $delete = "";
    $update = "";
    if ($set = substr_replace( $set, "", -2 )) $update = "UPDATE $DBTABLE SET $set WHERE data_id=".$prev_row["data_id"]." and year=".$row["year"]." and month=".$row["month"]." and day=".$row["day"].";\n";
    if ($row["data_id"] != $prev_row["data_id"]) $delete = "delete from $DBTABLE where data_id=".$row["data_id"]." and year=".$row["year"]." and month=".$row["month"]." and day=".$row["day"].";\n\n";

    if ($update) {
      $r = mysql_query($update) or die ("could not run query '$update' \n".mysql_error());
    }
    if ($delete) {
      $r = mysql_query($delete) or die ("could not run query '$delete' \n".mysql_error());
    }    

    //if ($counter++ > 5) exit(0);
  }
  else {
    $last = $current;
    unset($prev_row);
    //copy $row to $prev_row
    foreach ($row as $key => $val) $prev_row[$key] = $val;
  }

  $i++;
}

    echo "\n\nDONE\n"; 
?>
fbas
  • 1,554
  • 2
  • 15
  • 26
  • though it gets smaller as I delete things, my California table has 2,200,000+ rows. I run out of memory (set at 1Gb) around 1.7m rows. – fbas Aug 27 '11 at 23:55
  • You should use one of the modern mysql extensions `MySQLi` or `PDO_MYSQL`. – KingCrunch Aug 28 '11 at 00:44
  • @kingcruch, I'm sure you're right. do you have any preference or defining reason for using one over the other? – fbas Aug 28 '11 at 00:51
  • 1
    @fbas: mysqli is lower level and PDO is higher. mysqli allows more access to MySQL-specific features. PDO has a better interface and supports other DBs. See the question ["mysqli or PDO - what are the pros and cons?"](http://stackoverflow.com/questions/13569/mysqli-or-pdo-what-are-the-pros-and-cons). – outis Aug 28 '11 at 01:38

3 Answers3

3

I would try two things:

1) Instead of running the UPDATE and DELETE queries inside the loop using mysql_query, save them in a text file, to execute later. For example: file_put_contents('queries.sql', $update, FILE_APPEND );

2) Instead of doing everything inside the while ($row = mysql_fetch_assoc($res)) loop, first grab all SELECT query results, then close database connection freeing all database resources, including the query result. Only after this, perform the loop process.

If you run out of memory while storing the database results in one array, you can try saving the results in a temporary file instead (one record per line / FILE_APPEND), and then use this file in the loop (reading one line per record, using fgets function).

J. Bruni
  • 19,262
  • 11
  • 70
  • 90
  • I agree with J. Bruni, plus the last "else" block really smells. Why don't you just do `$prev_row = $row` instead of that foreach? I know this is not the cause of the memory leak anyway. – Sebastián Grignoli Aug 28 '11 at 00:25
  • @sebastian, I originally used $prev_row = $row, but when I got memory errors, that was the first thing I changed, wondering if that was the reason for the leak - so I was making a copy of the row rather than reference. It has no effect. – fbas Aug 28 '11 at 00:47
2

Work smarter, not harder:

SELECT station_id, year, month FROM table
    GROUP BY station_id, year, month
    HAVING COUNT(*) > 1

That'll get you all the station_id/year/month tuples that appear in the table more than once. Assuming that most of your data is not duplicates, that'll save you a lot of memory, since now you just have to go through these tuples and fix up the rows matching them.

duskwuff -inactive-
  • 171,163
  • 27
  • 219
  • 269
  • I tried something similar to that. Took forever. I cancelled it. – fbas Aug 28 '11 at 04:03
  • even with appropriate indexes? – glglgl Aug 28 '11 at 06:10
  • sorry, didn't realize my answer sounded so dismissive. your solution works. and doesn't gobble up memory because the result set is appropriately small. I'm still wondering if there's a way to release the memory glommed in the loop, but your query is good. – fbas Aug 28 '11 at 12:45
  • If you're using PHP 5.2, try upgrading to 5.3. It's got much better memory management. – duskwuff -inactive- Aug 28 '11 at 18:01
  • it's 5.3.0, and I figure it's just an issue of having a large result set. it won't release it's memory. I guess the solution is to make smaller LIMIT queries of the larger set. this can be an issue if I'm editing rows in such a way that they will be re-ordered in subsequent queries. – fbas Aug 28 '11 at 22:42
0

I found this when trying to trace down a memory use problem on a script of mine. Having solved the issue for mine I thought it worth adding a reply here for the next person who comes along with the same issue.

I was using mysqli, but much the same applies for mysql.

The problem I found was the queries not freeing their results. The solution was to use mysqli_free_result() after executing the update and delete queries. But more importantly on the mysqli_query for the loop I used the extra parameter of *MYSQLI_USE_RESULT* . There are side effects of this, so use a separate connection for the update and delete queries.

Kickstart
  • 21,106
  • 2
  • 17
  • 32