0

I am not that experienced with SQL and I was wondering if there is a more efficient way to accomplish what I am doing.

I am using python with SQLAlchemy to import very large data sets from csv to a sql table. I only want to import certain columns in the csv, so I made a function that grabs the column names from the csv, then another that filters out the ones I don't need (conveniently, the ones I don't need are at the end of the column list). Then, I use a for loop that generates the INSERT INTO command and executes it by appending the column names (they are the same in the csv and the sql table) and the values for each row in the reader. Ie:

row1: INSERT INTO table (c1, c2, c3) VALUES (v1, v2, v3)
row2: INSERT INTO table (c1, c2, c3) VALUES (v4, v5, v6)
row3: INSERT INTO table (c1, c2, c3) VALUES (v7, v8, v9)

My csv files have around 15k lines so you can imagine this takes quite a while.

Is there a way to consolidate the queries so inserting these values takes a shorter amount of time?

1 Answers1

0

If you can get the data on the server LOAD DATA INFILE can parse CSV. If not, use multi-insert.

The first approach is faster but LOAD DATA INFILE works best if the file is directly on the MySQL server itself. You can enable LOCAL mode.

tadman
  • 194,930
  • 21
  • 217
  • 240
  • Would multi-insert be faster than what I currently do now? – steven chen Oct 15 '19 at 23:38
  • It's often a factor of *N* faster, where *N* is how many rows you insert at once. In other words, your limit is queries per second, not rows per second. – tadman Oct 16 '19 at 00:35