0

I need to perform a batch insert in MySQL/MariaDB but since data is dynamic I need to build the proper SQL query. In a few steps:

  • I should find whether the current row exists or not in table - this is the first SELECT inside the loop
  • Right now I have 1454 but have to insert around 150k later, is better a batch query than 150k INSERT per item on the loop
  • If record already exists I should update it if doesn't then I should insert ,I just not care about UPDATE yet and the code you're seeing is only for INSERT

So here is what I am doing:

// Get values from Csv file as an array of values
$data = convertCsvToArray($fileName);
echo "DEBUG count(data): ", count($data), "\n";

$i = 0;
$sqlInsert = "INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt) ";

// Processing on each row of data
foreach ($data as $row) {
    $sql = "SELECT id,lastSyncAt FROM reps WHERE veeva_rep_id='{$row['Id']}'";
    echo "DEBUG: ", $sql, "\n";
    $rs = $conn->query($sql);

    if ($rs === false) {
        echo 'Wrong SQL: '.$sql.' Error: '.$conn->error, E_USER_ERROR;
    } else {
        $rows_returned = $rs->num_rows;

        $veeva_rep_id = "'".$conn->real_escape_string($row['Id'])."'";
        $first = "'".$conn->real_escape_string(ucfirst(strtolower($row['FirstName'])))."'";
        $last = "'".$conn->real_escape_string(ucfirst(strtolower($row['LastName'])))."'";
        $email = "'".$conn->real_escape_string($row['Email'])."'";
        $username = "'".$conn->real_escape_string($row['Username'])."'";
        $display_name = "'".$conn->real_escape_string(
                ucfirst(strtolower($row['FirstName'])).' '.ucfirst(strtolower($row['LastName']))
            )."'";

        // VALUES should be added only if row doesn't exists
        if ($rows_returned === 0) {

            // VALUES should be append until they reach 1000
            while ($i % 1000 !== 0) {
                $sqlInsert .= "VALUES($veeva_rep_id,$first,$last,$email,$username,NOW(),NOW(),$display_name,'VEEVA','https://pdone.s3.amazonaws.com/avatar/default_avatar.png',NOW(),NOW())";
                ++$i;;
            }

            // QUERY should be output to console to see if it's right or something is wrong
            echo "DEBUG: ", $sqlInsert, "\n";

            // QUERY should be executed if there are 1000 VALUES ready to add as a batch

            /*$rs = $conn->query($sqlInsert);

            if ($rs === false) {
                echo 'Wrong SQL: '.$sqlInsert.' Error: '.$conn->error, E_USER_ERROR;*/
            }
        } else {
            // UPDATE
            echo "UPDATE";
        }
    }
}

But this line of code: echo "DEBUG: ", $sql, "\n"; is not outputting nothing to console. I must be doing something wrong but I can't find what. Can any help me to build the proper batch query and to execute it each 1000 values append?

Proper output should be:

DEBUG count(data): 1454
DEBUG: SELECT id,lastSyncAt FROM reps WHERE veeva_rep_id='00580000008ReolAAC'
DEBUG: SELECT id,lastSyncAt FROM reps WHERE veeva_rep_id='005800000039SIWAA2'
....
DEBUG: INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt) VALUES(...), VALUES(...), VALUES(...)

Obtained result:

DEBUG count(data): 1454
DEBUG: SELECT id,lastSyncAt FROM reps WHERE veeva_rep_id='00580000008RGg6AAG'
DEBUG: INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt)
DEBUG: SELECT id,lastSyncAt FROM reps WHERE veeva_rep_id='00580000008RQ4CAAW'
DEBUG: INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt)
.... // until reach 1454 results

The table is empty so it should never goes through ELSE condition (UPDATE one).

EDIT

With help from the answer this is how the code looks now:

$data = convertCsvToArray($fileName);
echo "DEBUG count(data): ", count($data), "\n";

$i = 1;
$sqlInsert = "INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt) VALUES";

foreach ($data as $row) {
    $sql = "SELECT id,lastSyncAt FROM reps WHERE veeva_rep_id='{$row['Id']}'";
    $rs = $conn->query($sql);

    if ($rs === false) {
        echo 'Wrong SQL: '.$sql.' Error: '.$conn->error, E_USER_ERROR;
    } else {
        $rows_returned = $rs->num_rows;

        $veeva_rep_id = "'".$conn->real_escape_string($row['Id'])."'";
        $first = "'".$conn->real_escape_string(ucfirst(strtolower($row['FirstName'])))."'";
        $last = "'".$conn->real_escape_string(ucfirst(strtolower($row['LastName'])))."'";
        $email = "'".$conn->real_escape_string($row['Email'])."'";
        $username = "'".$conn->real_escape_string($row['Username'])."'";
        $display_name = "'".$conn->real_escape_string(
                ucfirst(strtolower($row['FirstName'])).' '.ucfirst(strtolower($row['LastName']))
            )."'";

        if ($rows_returned === 0) {
            if ($i % 1000 === 0) {
                file_put_contents("output.log", $sqlInsert."\n", FILE_APPEND);
                $sqlInsert = "INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt) VALUES";
            } else {
                $sqlInsert .= "($veeva_rep_id,$first,$last,$email,$username,NOW(),NOW(),$display_name,'VEEVA','https://pdone.s3.amazonaws.com/avatar/default_avatar.png',NOW(),NOW()), ";
            }

            $i++;
        } else {
            echo "UPDATE";
        }
    }
}

But still buggy because:

  • I have got a first empty INSERT query: INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt) VALUES
  • I have got a second INSERT query with 1000 VALUES() append, but what happened with the rest? The remaining 454?

Can any give me another tip? Help?

Aacini
  • 59,374
  • 12
  • 63
  • 94
ReynierPM
  • 15,161
  • 39
  • 158
  • 314
  • Maybe `if ($rows_returned === 0)` return `false` and you're runnig in your `else` statement...which is empty. Just to check add there an echo too. – PKeidel Jun 25 '15 at 13:51
  • @PKeidel there is no rows on the table result should be `0` in all the cases, but anyway I've added and didn't go through ELSE condition – ReynierPM Jun 25 '15 at 13:55
  • I just noticed that there are more problems... in the 3rd row `foreach ($data as $row) {` => There is no variable $data – PKeidel Jun 25 '15 at 13:59
  • So you are reading from and inserting to the same table? I think it might be useful to take a step back and actually talk about what you are trying to achieve here. Your approach of querying in a loop like this is generally considered an anti-pattern and may indicate that there is a better overall approach to take like `INSERT INTO ... SELECT` or similar. – Mike Brant Jun 25 '15 at 14:02
  • And you're overriding the value `$sql`. What do you want to insert into the table? You can't fetch data from one table and when there is no data insert some from the same table. There have to be another datasource for the data that should be inserted. – PKeidel Jun 25 '15 at 14:02
  • @MikeBrant I have edited the OP take a look a tell me if is better now – ReynierPM Jun 25 '15 at 14:05
  • @PKeidel I have fixed the `$sql` override didn't notice until few minutes ago and already fixed as I said – ReynierPM Jun 25 '15 at 14:07

3 Answers3

2

Since it looks like you are trying to load data from a CSV file, you might want to consider using LOAD DATA INFILE functionality which is designed specifically for this purpose.

Here is link to documentation: https://dev.mysql.com/doc/refman/5.6/en/load-data.html

Mike Brant
  • 66,858
  • 9
  • 86
  • 97
  • I could use that and I think it will be better and faster but what happens if a required field is empty? Lets said the CSV file has 4 columns, all of them required and has also 1454 rows, but row 101 has **NULL** in one of it's values, will the import still running or will fail because of the constraint from DB side? – ReynierPM Jun 26 '15 at 12:31
  • So, after reads the docs I just answer myself: `With LOAD DATA INFILE, data-interpretation and duplicate-key errors terminate the operation.`, `With LOAD DATA LOCAL INFILE, data-interpretation and duplicate-key errors become warnings and the operation continues because the server has no way to stop transmission of the file in the middle of the operation. For duplicate-key errors, this is the same as if IGNORE is specified` so perhaps I should use `LOAD DATA LOCAL INFILE` but I will like to know the failed rows, where those "warnings" goes to? MySQL log? – ReynierPM Jun 26 '15 at 12:34
1

You should have something like:

// Try fetching data from table 1

// If there is no record available, then fetch some data from table 2
// and insert that data inito table 1

You just wrote

$sql = "INSERT INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt) ";

// Processing on each row of data
foreach ($data as $row) {

But from an insert no data is selected and second...you didn't run a select, where comes $data from?

update Use if ($i % 1000 === 0) { instead of while ($i % 1000 !== 0) {

$i         = 0;
$sqlInsert = "INSERT INTO reps(veeva_rep_id,first,last,email,...) ";

// Processing on each row of data
foreach ($data as $row) {
    $sql = "SELECT id,lastSyncAt FROM reps WHERE veeva_rep_id='{$row['Id']}'";
    echo "DEBUG: ", $sql, "\n";
    $rs = $conn->query($sql);

    if ($rs === false) {
        echo 'Wrong SQL: '.$sql.' Error: '.$conn->error, E_USER_ERROR;
    } else {

        $veeva_rep_id = ...;
        $first = ...;
        $last = ...;
        $email = ...;
        // ...

        // VALUES should be added only if row doesn't exists
        if($rs->num_rows == 0) {
            // Insert some data
            $i++;

            if ($i % 1000 === 0) {
                echo "DEBUG: ", $sqlInsert, "\n";
                // execSql($sqlInsert);
                $sqlInsert = "INSERT INTO reps(veeva_rep_id,first,last,email,...) "; // reset
            } else {
                $sqlInsert .= "VALUES($veeva_rep_id,$first,$last,$email,...) ";
            }
        } else {
            echo "UPDATE";
        }
    }
}
PKeidel
  • 2,324
  • 1
  • 18
  • 28
  • There is not table1 and insert on table2, take a look to the OP, `$data` comes from Csv file – ReynierPM Jun 25 '15 at 14:07
  • I've edited the OP and add that lines in order to get post more clear ;) – ReynierPM Jun 25 '15 at 14:09
  • Nope, is not working getting wrong output, if you like I can provide you the .csv file and the convertToArray function so you can try on your own, could you? – ReynierPM Jun 25 '15 at 14:33
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/81540/discussion-between-reynierpm-and-pkeidel). – ReynierPM Jun 25 '15 at 14:35
1

consider using INSERT IGNORE INTO table to check if the record already exists. How to 'insert if not exists' in MySQL? if you haven't already done so, make veeva_rep_id a PRIMARY key so the INSERT IGNORE will work

also check out using PDO for transactions, prepared statements and dynamically generating queries using PDO PDO Prepared Inserts multiple rows in single query

<?php

$sql = 'INSERT IGNORE INTO reps(veeva_rep_id,first,last,email,username,lastLoginAt,lastSyncAt,display_name,rep_type,avatar_url,createdAt,updatedAt) VALUES ';

$insertQuery = array();
$insertData = array();

/*

assuming the array from the csv is like this

$data = array(
    0 => array('name' => 'Robert', 'value' => 'some value'),
    1 => array('name' => 'Louise', 'value' => 'another value')
);
*/

foreach ($data as $row) {
    $insertQuery[] = '(:veeva_rep_id' . $n . ', :first' . $n . ', :last' . $n . ', :email' . $n . ', :username' . $n . ', :lastLoginAt' . $n . ', :lastSyncAt' . $n . ', :display_name' . $n . ', :rep_type' . $n . ', :avatar_url' . $n . ', :createdAt' . $n . ', :updatedAt' . $n . ')';
    $insertData['veeva_rep_id' . $n] = $row['name'];
    $insertData['first' . $n] = $row['value'];
    $insertData['last' . $n] = $row['name'];
    $insertData['email' . $n] = $row['value'];
    $insertData['username' . $n] = $row['name'];
    $insertData['lastLoginAt' . $n] = $row['value'];
    $insertData['lastSyncAt' . $n] = $row['value'];
    $insertData['display_name' . $n] = $row['name'];
    $insertData['rep_type' . $n] = $row['value'];
    $insertData['avatar_url' . $n] = $row['value'];
    $insertData['createdAt' . $n] = $row['name'];
    $insertData['updatedAt' . $n] = $row['value'];

    $n++;
}

$db->beginTransaction();

if (!empty($insertQuery) and count($insertQuery)>1000) {
    $sql .= implode(', ', $insertQuery);

    $stmt = $db->prepare($sql);
    $stmt->execute($insertData);
}

$db->commit();

print $sql . PHP_EOL;

let me know if it helps.

Community
  • 1
  • 1
Alex Andrei
  • 6,957
  • 3
  • 24
  • 41
  • I have read both threads you leave me and I like to use [this](http://stackoverflow.com/a/4559320/719427) answer but I am confused about vars author use in there. What `$dataVals` means on the example code? Is like `$data` on yours? Will be `$data` on my code at OP? – ReynierPM Jun 26 '15 at 13:04
  • you mention batches of 1K inserts, how many of these batches you have. per time interval? If for ex. you have no more than 1 batch per minute the gain between bulk insert over one by one is minimal. Especially if you use transactions, since the "batching" will be done by the driver anyway. – Alex Andrei Jun 26 '15 at 13:22
  • I am not sure how many times a INSERT takes but is not to much for 1454 rows although I am executing the same code in a set of 150k read from a CSV file and converted to an array through `convertCsvToArray()` function and that is the time I am trying to make it smaller because takes almost 2hr and half to perform all the INSERT. – ReynierPM Jun 26 '15 at 13:31
  • For the record this is how `var_export($data)` array looks like: `0 => array ('Id' => '0018000001EJderAAD','Title__c' => '','FirstName' => 'Dorila','LastName' => 'Mauney','Suffix_vod__c' => '','NPI_vod__c' => '1891832606','LastModifiedDate' => '2015-06-26T04:21:49.000Z',), 1 => array ('Id' => '0018000001EJdesAAD','Title__c' => '','FirstName' => 'Tan','LastName' => 'Nguyen','Suffix_vod__c' => '','NPI_vod__c' => '1639106107','LastModifiedDate' => '2015-06-26T04:21:06.000Z',), ...,` – ReynierPM Jun 26 '15 at 13:44