I am importing millions of rows from a tab file and SQLite .import .mode tabs is very slow. I have three indexes so probably the slowness comes from the indexing. But first I would like to check that .import add the rows grouping lots/all of them into a single commit. I was unable to find documentation how .import works. Do someone knows?. If the index is the problem (I had that issue before with mysql) how can I disable it and reindex at the end of the .import?
[Update 1]
Following @sixfeetsix comment.
My schema is:
CREATE TABLE ensembl_vf_b36 (
variation_name varchar(20),
chr varchar(4),
start integer,
end integer,
strand varchar(5),
allele_string varchar(3),
map_weight varchar(2),
flags varchar(50),
validation_status varchar(100),
consequence_type varchar(50)
);
CREATE INDEX pos_vf_b36_idx on ensembl_vf_b36 (chr, start, end);
data:
rs35701516 NT_113875 352 352 1 G/A 2 NULL NULL INTERGENIC
rs12090193 NT_113875 566 566 1 G/A 2 NULL NULL INTERGENIC
rs35448845 NT_113875 758 758 1 A/C 2 NULL NULL INTERGENIC
rs17274850 NT_113875 1758 1758 1 G/A 2 genotyped cluster,freq INTERGENIC
There are 15_608_032 entries in this table
And these are the stats
$ time sqlite3 -separator ' ' test_import.db '.import variations_build_36_ens-54.tab ensembl_vf_b36'
real 29m27.643s
user 4m14.176s
sys 0m15.204s
[Update 2]
@sixfeetsix has a good answer and if you are reading this, you would be also interested in
Faster bulk inserts in sqlite3?
Sqlite3: Disabling primary key index while inserting?
[update3] Solution from 30 min -> 4 min
Even with all the optimisations (see accepted answer) still takes almost 30 minutes but if the indexes are not used and added at the end then total time is 4 minutes:
-- importing without indexes:
real 2m22.274s
user 1m38.836s
sys 0m4.850s
-- adding indexes
$ time sqlite3 ensembl-test-b36.db < add_indexes-b36.sql
real 2m18.344s
user 1m26.264s
sys 0m6.422s