32

sometimes, I have to re-import data for a project, thus reading about 3.6 million rows into a MySQL table (currently InnoDB, but I am actually not really limited to this engine). "Load data infile..." has proved to be the fastest solution, however it has a tradeoff: - when importing without keys, the import itself takes about 45 seconds, but the key creation takes ages (already running for 20 minutes...). - doing import with keys on the table makes the import much slower

There are keys over 3 fields of the table, referencing numeric fields. Is there any way to accelerate this?

Another issue is: when I terminate the process which has started a slow query, it continues running on the database. Is there any way to terminate the query without restarting mysqld?

Thanks a lot DBa

DBa
  • 1,183
  • 3
  • 10
  • 16
  • about 60 to 100 bytes, 5 to 8 fields. Nothing really large, it's the vast quantity which makes the whole thing that slow. – DBa Mar 17 '10 at 16:14

3 Answers3

53

if you're using innodb and bulk loading here are a few tips:

sort your csv file into the primary key order of the target table : remember innodb uses clustered primary keys so it will load faster if it's sorted !

typical load data infile i use:

truncate <table>;

set autocommit = 0;

load data infile <path> into table <table>...

commit;

other optimisations you can use to boost load times:

set unique_checks = 0;
set foreign_key_checks = 0;
set sql_log_bin=0;

split the csv file into smaller chunks

typical import stats i have observed during bulk loads:

3.5 - 6.5 million rows imported per min
210 - 400 million rows per hour
Praveen R
  • 113
  • 2
  • 6
Jon Black
  • 15,289
  • 5
  • 40
  • 41
  • Disabling unique_checks already improved the performance, as well as sorting by primary key. Thanks! – DBa Apr 12 '10 at 15:02
  • 10 year old solution still holds true. I went from 400k/min to 7M/min with the suggested optimization. – Devin May 23 '20 at 21:50
  • March 2021: I am using the Parallel Import Utility (multi-threaded LOAD INFILE) util.importTable that was introduced to mySQL in Version 8.0.17. I had all the optimizations listed here except sql_log_bin=0. Adding sql_log_bin cut load time for an indexed 1.1 GB file with 3.1 million rows from 6:40 to 6:19, a 21 second improvement. – wistlo Mar 05 '21 at 18:06
7

This blog post is almost 3 years old, but it's still relevant and has some good suggestions for optimizing the performance of "LOAD DATA INFILE":

http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/

Ike Walker
  • 59,827
  • 13
  • 100
  • 104
1

InnoDB is a pretty good engine. However, it highly relies on being 'tuned'. One thing is that if your inserts are not in the order of increasing primary keys, innoDB can take a bit longer than MyISAM. This can easily be overcome by setting a higher innodb_buffer_pool_size. My suggestion is to set it at 60-70% of your total RAM on a dedicated MySQL machine.