1

I am trying to import a csv file to a MySQL database, and I created a script in groovy for doing that using Hibernate. The thing is, the csv file is quite huge, it has about 48 million lines. I was opening a session and committing a transaction for each line, but then I saw that this would take a long time. By the pace of rows added to the database after one day running the script, it would take more than one month to export everything. So, I thought that maybe, the bottleneck was at opening a session and committing a transaction for every line... then I decided to open just one session, read every line, instantiate an entity for every single line, save it on the session, and after reading all the 48 millions lines and storing the 48 millions entities in the session, commit one single transaction with all these instances. Well, that didn't work either. Is there any way of importing this csv file to a mysql database in a much faster way? Or it should really take that long?

pedrobisp
  • 627
  • 1
  • 7
  • 14

3 Answers3

3

I reccommend that you do not use Java and use LOAD DATA INFILE 'c:/tmp/discounts.csv' ...

see http://www.mysqltutorial.org/import-csv-file-mysql-table/

Scary Wombat
  • 41,782
  • 5
  • 32
  • 62
2

Hibernate in a script? you must be kidding! :)

I would stick with groovy.sql.Sql, because it's simple and flexible to use and doesn't have hibernate's burden with mapping or session flushing.

your script might be looking as following:

Sql db = new Sql(...)
int flushCounter = 0
new File( 'my.csv' ).splitEachLine( /;/ ){ parts -> // or a full-blown csv-parser here
  db.executeInsert 'insert into mytable (id,a,b,c) values (null,?,?,?)', parts 
  flushCounter++
  if( 0 == flushCounter % 1000 ) db.commit() // flush each 1000 records
}
db.commit()
db.close()
injecteer
  • 16,220
  • 3
  • 39
  • 72
1

I found very interesting link with solution to your problem. Maybe it will be useful http://chriseiffel.com/everything-linux/how-to-import-a-large-csv-file-to-mysql/