0

I have a mysql table which contains a Primary auto_increment key.

I have 500 csv files, each about 3gb worth of data and the bulk of the data in one column.

Currently I'm loading the files into mysql using:

#!/bin/bash
for file in /files/*.csv
do
    mysql -e "load data local infile '$f' into table myTable FIELDS TERMINATED BY ',' 
    ENCLOSED BY '\"' escaped by '\"' IGNORE 1 LINES"  -u user -ppass
done

Are there any ways to improve performance? Maybe removing the primary key while inserting and then adding it afterwards? Or is there a way to insert in parallel instead of one file at a time?

d-_-b
  • 18,782
  • 33
  • 120
  • 200
  • You could try looping through each line and executing an insert for each line. It may or may not perform better than a one off load but it may be worth a go. – Raman Sailopal Jul 26 '17 at 14:36
  • Take a look at https://stackoverflow.com/questions/2463602/mysql-load-data-infile-acceleration – Ulises André Fierro Jul 26 '17 at 16:44
  • Also, as far as inserting in parallel, I don't think it would help much since the processing steps will be the same as far as the engine goes, i.e. Same workload going through the same channel, but I could be wrong – Ulises André Fierro Jul 26 '17 at 16:47

1 Answers1

0

The new MySQL Shell as of version 5.0.17 has a parallel bulk loader for CSV, TSV, and JSON files.

Dave Stokes
  • 611
  • 4
  • 9
  • Can you elaborate? Maybe a link to what exactly is it/how to use it? – tomer.z Aug 09 '19 at 09:04
  • See https://elephantdolphin.blogspot.com/2019/08/parallel-table-importer-in-mysql-shell.html for an example of using the new parallel bulk loader utility in the MySQL Shell – Dave Stokes Aug 09 '19 at 14:11