51

I have a file of about 30000 lines of data that I want to load into a sqlite3 database. Is there a faster way than generating insert statements for each line of data?

The data is space-delimited and maps directly to an sqlite3 table. Is there any sort of bulk insert method for adding volume data to a database?

Has anyone devised some deviously wonderful way of doing this if it's not built in?

I should preface this by asking, is there a C++ way to do it from the API?

Alexander Farber
  • 18,345
  • 68
  • 208
  • 375
scubabbl
  • 12,155
  • 7
  • 33
  • 35

12 Answers12

59
  • wrap all INSERTs in a transaction, even if there's a single user, it's far faster.
  • use prepared statements.
Javier
  • 57,083
  • 7
  • 74
  • 120
40

You want to use the .import command. For example:

$ cat demotab.txt
44      92
35      94
43      94
195     49
66      28
135     93
135     91
67      84
135     94

$ echo "create table mytable (col1 int, col2 int);" | sqlite3 foo.sqlite
$ echo ".import demotab.txt mytable"  | sqlite3 foo.sqlite

$ sqlite3 foo.sqlite
-- Loading resources from /Users/ramanujan/.sqliterc
SQLite version 3.6.6.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from mytable;
col1    col2
44      92
35      94
43      94
195     49
66      28
135     93
135     91
67      84
135     94

Note that this bulk loading command is not SQL but rather a custom feature of SQLite. As such it has a weird syntax because we're passing it via echo to the interactive command line interpreter, sqlite3.

In PostgreSQL the equivalent is COPY FROM: http://www.postgresql.org/docs/8.1/static/sql-copy.html

In MySQL it is LOAD DATA LOCAL INFILE: http://dev.mysql.com/doc/refman/5.1/en/load-data.html

One last thing: remember to be careful with the value of .separator. That is a very common gotcha when doing bulk inserts.

sqlite> .show .separator
     echo: off
  explain: off
  headers: on
     mode: list
nullvalue: ""
   output: stdout
separator: "\t"
    width:

You should explicitly set the separator to be a space, tab, or comma before doing .import.

Elliot Cameron
  • 4,969
  • 2
  • 24
  • 31
ramanujan
  • 5,235
  • 5
  • 28
  • 31
  • 2
    This is great, and very fast. 20 minutes reduced to 3 seconds. – Gazzer Mar 12 '11 at 14:42
  • does this work for tables which have auto increment primary keys? I tried using a NULL in the file for an auto increment column but it throws an error. – Aditya Naidu Jan 30 '12 at 19:05
  • 2
    Looking at the code for SQLite's shell.c, .import is just using a prepared statement inside a transaction. – dlanod Aug 09 '12 at 01:53
  • sqlite has a strange behaviour with \t on command line, you must give a REAL tab to the -separator option. On command line, use Control-v to insert a real TAB. – The Demz Dec 02 '13 at 13:03
23

You can also try tweaking a few parameters to get extra speed out of it. Specifically you probably want PRAGMA synchronous = OFF;.

  • 24
    pragma synchronous = OFF is a bad idea - it'll hardly impact performance at all for bulk inserts, and your DB will be corrupted on a power failure. A much better idea is to wrap your inserts in a transaction. – Eamon Nerbonne Aug 31 '09 at 12:39
  • 14
    Wrapping the INSERTS in a TRANSACTION and using PRAGMA journal_mode = MEMORY; Will prevent the INSERTs from hitting the disk until the end of the transaction. – Ted Mar 17 '10 at 23:04
  • 4
    Beware that MEMORY will corrupt db on a power failure – Anders Rune Jensen Jun 06 '11 at 22:38
  • 3
    PRAGMA journal_mode = WAL; will allow many writers at a time, and you could eventually use threads to write the data. Note that with Write-Ahead Logging activated, the DB is not corrupted after power failure. – Amine Zaine May 15 '16 at 02:06
21
  • Increase PRAGMA cache_size to a much larger number. This will increase the number of pages cached in memory. NOTE: cache_size is a per-connection setting.

  • Wrap all inserts into a single transaction rather than one transaction per row.

  • Use compiled SQL statements to do the inserts.
  • Finally, as already mentioned, if you are willing forgo full ACID compliance, set PRAGMA synchronous = OFF;.
paxos1977
  • 135,245
  • 26
  • 85
  • 125
  • 1
    `PRAGMA default_cache_size` is now [deprecated](https://sqlite.org/pragma.html#pragma_default_cache_size) – david Mar 19 '19 at 19:39
  • 1
    `cache_size` can be used instead of the deprecated `default_cache_size`. However, `cache_size` is for a single connection. – golmschenk May 30 '20 at 17:12
18

I've tested some pragmas proposed in the answers here:

  • synchronous = OFF
  • journal_mode = WAL
  • journal_mode = OFF
  • locking_mode = EXCLUSIVE
  • synchronous = OFF + locking_mode = EXCLUSIVE + journal_mode = OFF

Here's my numbers for different number of inserts in a transaction:

Increasing the batch size can give you a real performance boost, while turning off journal, synchronization, acquiring exclusive lock will give an insignificant gain. Points around ~110k show how random background load can affect your database performance.

Also, it worth to mention, that journal_mode=WAL is a good alternative to defaults. It gives some gain, but do not reduce reliability.

C# Code.

astef
  • 6,519
  • 3
  • 42
  • 78
  • 1
    One thing I noticed in one of my projects is that a batch should be restricted to a single table if at all possible. If you are inside a transaction and updating table a then table b in a loop, this will run much much slower than looping twice, once for table a then again for table b. – skamradt Feb 19 '21 at 21:42
12

RE: "Is there a faster way that generating insert statements for each line of data?"

First: Cut it down to 2 SQL statements by making use of Sqlite3's Virtual table API e.g.

create virtual table vtYourDataset using yourModule;
-- Bulk insert
insert into yourTargetTable (x, y, z)
select x, y, z from vtYourDataset;

The idea here is that you implement a C interface that reads your source data set and present it to SQlite as a virtual table and then you do a SQL copy from the source to the target table in one go. It sounds harder than it really is and I've measured huge speed improvements this way.

Second: Make use of the other advise provided here i.e. the pragma settings and making use of a transaction.

Third: Perhaps see if you can do away with some of the indexes on the target table. That way sqlite will have less indexes to update for each row inserted

Hannes de Jager
  • 2,766
  • 5
  • 34
  • 55
6

There is no way to bulk insert, but there is a way to write large chunks to memory, then commit them to the database. For the C/C++ API, just do:

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);

...(INSERT statements)

sqlite3_exec(db, "COMMIT TRANSACTION", NULL, NULL, NULL);

Assuming db is your database pointer.

Community
  • 1
  • 1
scott
  • 61
  • 1
  • 1
4

A good compromise is to wrap your INSERTS between BEGIN; and END; keyword i.e:

BEGIN;
INSERT INTO table VALUES ();
INSERT INTO table VALUES ();
...
END;
Flavien Volken
  • 14,820
  • 9
  • 78
  • 105
3

I found this to be a good mix for an one shot long import.

.echo ON

.read create_table_without_pk.sql

PRAGMA cache_size = 400000; PRAGMA synchronous = OFF; PRAGMA journal_mode = OFF; PRAGMA locking_mode = EXCLUSIVE; PRAGMA count_changes = OFF; PRAGMA temp_store = MEMORY; PRAGMA auto_vacuum = NONE;

.separator "\t" .import a_tab_seprated_table.txt mytable

BEGIN; .read add_indexes.sql COMMIT;

.exit

source: http://erictheturtle.blogspot.be/2009/05/fastest-bulk-import-into-sqlite.html

some additional info: http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/

maazza
  • 6,198
  • 15
  • 54
  • 91
3

Depending on the size of the data and the amount of RAM available, one of the best performance gains will occur by setting sqlite to use an all-in-memory database rather than writing to disk.

For in-memory databases, pass NULL as the filename argument to sqlite3_open and make sure that TEMP_STORE is defined appropriately

(All of the above text is excerpted from my own answer to a separate sqlite-related question)

Community
  • 1
  • 1
pestophagous
  • 3,603
  • 2
  • 30
  • 39
  • 1
    The link points to an incomplete document. There is less information than one would hope for, – Richard Oct 09 '09 at 22:23
1

If you are just inserting once, I may have a dirty trick for you.

The idea is simple, first inserting into a memory database, then backup and finally restore to your original database file.

I wrote the detailed steps at my blog. :)

circle
  • 122
  • 4
0

I do a bulk insert with this method:

colnames = ['col1', 'col2', 'col3'] 
nrcols = len(colnames) 
qmarks = ",".join(["?" for i in range(nrcols)]) 
stmt = "INSERT INTO tablename VALUES(" + qmarks + ")" 
vals = [[val11, val12, val13], [val21, val22, val23], ..., [valn1, valn2, valn3]] 
conn.executemany(stmt, vals)

colnames must be in the order of the column names in the table 
vals is a list of db rows
each row must have the same length, and
contain the values in the correct order 
Note that we use executemany, not execute