What are the performance characteristics of sqlite with very large database files?

Question

2020 update, about 11 years after the question was posted and later closed, preventing newer answers.

Almost everything written here is obsolete. Once upon a time sqlite was limited to the memory capacity or to 2 GB of storage (32 bits) or other popular numbers... well, that was a long time ago.

Official limitations are listed here. Practically sqlite is likely to work as long as there is storage available. It works well with dataset larger than memory, it was originally created when memory was thin and it was a very important point from the start.

There is absolutely no issue with storing 100 GB of data. It could probably store a TB just fine but eventually that's the point where you need to question whether SQLite is the best tool for the job and you probably want features from a full fledged database (remote clients, concurrent writes, read-only replicas, sharding, etc...).

Original:

I know that sqlite doesn't perform well with extremely large database files even when they are supported (there used to be a comment on the sqlite website stating that if you need file sizes above 1GB you may want to consider using an enterprise rdbms. Can't find it anymore, might be related to an older version of sqlite).

However, for my purposes I'd like to get an idea of how bad it really is before I consider other solutions.

I'm talking about sqlite data files in the multi-gigabyte range, from 2GB onwards. Anyone have any experience with this? Any tips/ideas?

Using threading (connection per thread) might help only for reading - http://stackoverflow.com/a/24029046/743263 — malkia, Jun 04 '14 at 04:32
Hooking http://softwareengineering.stackexchange.com/q/332069/24257 and https://wiki.mozilla.org/Performance/Avoid_SQLite_In_Your_Next_Firefox_Feature#How_to_Store_Your_Data — Pacerier, Jan 30 '17 at 12:18
Year 2016: I have a 5 GB database that runs on SQLite with no problems. I installed the exact same dataset on Postgres. SQLite ran a complex query in 2.7 ms, Postgres in 2.5 ms. I ended up on Postgres for the easier Regex access and better index features. But I was impressed by SQLite and could have used it as well. — Paulb, Apr 06 '17 at 10:57
2020: I updated the question. Everything here is in dire need of an update after 11 years of being closed, blocking answers and edits. Editing the question itself might not follow stackoverflow rules but better this way than leaving stale information to mislead the next generation of developers. — user5994461, Oct 01 '20 at 09:42
@Paulb Do you remember approximately the hardware specifications of the machine you were using? Thanks. — nishanthshanmugham, Mar 27 '21 at 21:10

score 253 · Accepted Answer · answered May 01 '09 at 15:25

So I did some tests with sqlite for very large files, and came to some conclusions (at least for my specific application).

The tests involve a single sqlite file with either a single table, or multiple tables. Each table had about 8 columns, almost all integers, and 4 indices.

The idea was to insert enough data until sqlite files were about 50GB.

Single Table

I tried to insert multiple rows into a sqlite file with just one table. When the file was about 7GB (sorry I can't be specific about row counts) insertions were taking far too long. I had estimated that my test to insert all my data would take 24 hours or so, but it did not complete even after 48 hours.

This leads me to conclude that a single, very large sqlite table will have issues with insertions, and probably other operations as well.

I guess this is no surprise, as the table gets larger, inserting and updating all the indices take longer.

Multiple Tables

I then tried splitting the data by time over several tables, one table per day. The data for the original 1 table was split to ~700 tables.

This setup had no problems with the insertion, it did not take longer as time progressed, since a new table was created for every day.

Vacuum Issues

As pointed out by i_like_caffeine, the VACUUM command is a problem the larger the sqlite file is. As more inserts/deletes are done, the fragmentation of the file on disk will get worse, so the goal is to periodically VACUUM to optimize the file and recover file space.

However, as pointed out by documentation, a full copy of the database is made to do a vacuum, taking a very long time to complete. So, the smaller the database, the faster this operation will finish.

Conclusions

For my specific application, I'll probably be splitting out data over several db files, one per day, to get the best of both vacuum performance and insertion/delete speed.

This complicates queries, but for me, it's a worthwhile tradeoff to be able to index this much data. An additional advantage is that I can just delete a whole db file to drop a day's worth of data (a common operation for my application).

I'd probably have to monitor table size per file as well to see when the speed will become a problem.

It's too bad that there doesn't seem to be an incremental vacuum method other than auto vacuum. I can't use it because my goal for vacuum is to defragment the file (file space isn't a big deal), which auto vacuum does not do. In fact, documentation states it may make fragmentation worse, so I have to resort to periodically doing a full vacuum on the file.

Very useful info. Pure speculation but I wonder if the new backup api can be used to create a non fragmented version of your database on a daily basis, and avoid the need to run a VACUUM. — eodonohoe, May 03 '09 at 16:36
Yes, inserts were done in batches of 10000 messages per transaction. — Snazzer, May 14 '09 at 15:17
What filesystem did you use? If ext{2,3,4}, what was the data= setting, was journaling enabled? Besides io patterns, the way sqlite flushes to disk may be significant. — Tobu, Feb 22 '11 at 23:07
I was testing mainly on windows, so can't comment on the behavior on linux. — Snazzer, Mar 09 '11 at 03:59
Unfortunately the code is pretty tied to the software where I work, so hard to tease that out. I think my description should be enough to replicate it though...let me know if you need additional information. — Snazzer, Apr 12 '12 at 23:59
Interesting... Have you tried benchmarking using multiple databases (they can be attached, up to 62) with 1 table each against the multiple table approach? — Alix Axel, Jun 06 '13 at 15:37
It is normal that INSERTs slow down the more data there is _if_ there is an index defined on the table. — Davor Josipovic, Jun 15 '17 at 15:17

score 178 · Answer 2 · edited Jun 30 '11 at 17:10

178

We are using DBS of 50 GB+ on our platform. no complains works great. Make sure you are doing everything right! Are you using predefined statements ? *SQLITE 3.7.3

Transactions
Pre made statements

Apply these settings (right after you create the DB)

PRAGMA main.page_size = 4096;
PRAGMA main.cache_size=10000;
PRAGMA main.locking_mode=EXCLUSIVE;
PRAGMA main.synchronous=NORMAL;
PRAGMA main.journal_mode=WAL;
PRAGMA main.cache_size=5000;

Hope this will help others, works great here

edited Jun 30 '11 at 17:10

takrl

6,098
3
55
64

answered Jun 30 '11 at 11:21

Alex

1,797
1
10
2

23

Recently tested with dbs in the 160GB range, works great as well. – Snazzer Jul 13 '11 at 21:43
12

Also `PRAGMA main.temp_store = MEMORY;`. – Vikrant Chaudhary Oct 23 '11 at 14:40
41

@Alex, why there are two PRAGMA main.cache_size=5000;? – Jack Nov 01 '11 at 16:04
@VikrantChaudhary and Alex: Won't changing the `page_size` mess up with the optimizations described on http://www.sqlite.org/pragma.html#pragma_page_size? – Alix Axel May 07 '13 at 23:29
@VikrantChaudhary: `temp_store = MEMORY` should be the default in most setups. – Alix Axel May 13 '13 at 12:55
24

Don't just blindly apply these optimizations. In particular synchronous=NORMAL is not crash-safe. I.e., a process crash at the right time can corrupt your database even in the absence of disk failures. http://www.sqlite.org/pragma.html#pragma_synchronous – mpm Feb 17 '14 at 20:27
23

@Alex can you please explain those values and the difference between'em and default ones ? – 4m1nh4j1 Jul 13 '14 at 13:05
4

How many tables do you folks have in your 50G and 160G databases @Snazzer and @Alex? – Spade Sep 25 '14 at 04:56
I like SO answer to be concise - read the fine manual for the details guys! – Lester Cheung Nov 08 '19 at 22:22

score 66 · Answer 3 · answered Apr 24 '09 at 14:46

66

I've created SQLite databases up to 3.5GB in size with no noticeable performance issues. If I remember correctly, I think SQLite2 might have had some lower limits, but I don't think SQLite3 has any such issues.

According to the SQLite Limits page, the maximum size of each database page is 32K. And the maximum pages in a database is 1024^3. So by my math that comes out to 32 terabytes as the maximum size. I think you'll hit your file system's limits before hitting SQLite's!

answered Apr 24 '09 at 14:46

Paul Lefebvre

5,958
3
26
36

3

Depending on what operations you are performing, trying deleting 3000 rows in a 8G sqlite database, it takes enough time for you to brew a nice pot of french press, lol – benjaminz Jun 28 '17 at 15:28
7

@benjaminz, you must be doing it wrong. If you wrap deletion of 3k rows in one transaction, it should be almost instant. I had this mistake myself: deleting 10k rows one by one took 30 min. But once I wrapped all delete statements into one transaction, it took 5s. – mvp Aug 05 '19 at 16:03

score 59 · Answer 4 · answered May 28 '10 at 14:22

59

Much of the reason that it took > 48 hours to do your inserts is because of your indexes. It is incredibly faster to:

1 - Drop all indexes 2 - Do all inserts 3 - Create indexes again

answered May 28 '10 at 14:22

user352992

599
4
2

23

Thats well known...but for a long running process you're not going to periodically drop your indexes to rebuild them, especially when you're going to be querying them to do work. That is the approach being taken though when the sqlite db has to be rebuilt from scratch, the indexes are created after all the inserts are done. – Snazzer May 28 '10 at 17:22
25

@Snazzer in a similar situation we used an "accumulator" table: once per day we would then move the accumulated rows from the accumulator table to the main table within a single transaction. Where needed a view took care of presenting both tables as a single table. – CAFxX Oct 14 '12 at 07:05
4

Another option is to keep the indexes, but pre-sort the data in index-order before you insert it. – Steven Kryskalla Feb 19 '14 at 23:42
1

@StevenKryskalla how does that compare to dropping the indexes and recreating them? Any links you know of that have benchmarked? – mcmillab Feb 06 '19 at 05:29
1

@mcmillab This was years ago so I don't remember all the details or the benchmark stats, but thinking intuitively, inserting N randomly ordered elements into an index will take O(NlogN) time, while inserting N sorted elements will take O(N) time. – Steven Kryskalla Feb 06 '19 at 19:54

Lester Cheung · Answer 5 · 2016-01-25T14:52:09.913

35

Besides the usual recommendation:

Drop index for bulk insert.
Batch inserts/updates in large transactions.
Tune your buffer cache/disable journal /w PRAGMAs.
Use a 64bit machine (to be able to use lots of cache™).
[added July 2014] Use common table expression (CTE) instead of running multiple SQL queries! Requires SQLite release 3.8.3.

I have learnt the following from my experience with SQLite3:

For maximum insert speed, don't use schema with any column constraint. (~~Alter table later as needed~~ You can't add constraints with ALTER TABLE).
Optimize your schema to store what you need. Sometimes this means breaking down tables and/or even compressing/transforming your data before inserting to the database. A great example is to storing IP addresses as (long) integers.
One table per db file - to minimize lock contention. (Use ATTACH DATABASE if you want to have a single connection object.
SQLite can store different types of data in the same column (dynamic typing), use that to your advantage.

Question/comment welcome. ;-)

edited Jan 25 '16 at 14:52

answered Jan 17 '12 at 00:53

Lester Cheung

1,536
14
17

1

How much of an impact do you get from 'one table per db file'? Sounds interesting. Do you think it would matter much if your table only has 3 tables and is being built from scratch? – Martin Velez Aug 15 '12 at 07:25
4

@martin hate to say it but the answer is *it depends*. The idea is partition the data to manageable size. In my use case I gather data from different hosts and do reporting on the data after the fact so this approach worked well. Partition by date/time as suggested by others should work well for data that span long period of time I would imagine. – Lester Cheung Nov 06 '12 at 08:41
3

@Lester Cheung: Regarding your second #1: It is my understanding from the docs and personal experience that to this day, SQLite3 does not support adding constraints with ALTER TABLE after the table's creation. The only way to add or remove constraints from existing table rows is to create a new table with the desired characteristics and copy over all the rows, which is likely to be much slower than inserting once with constraints. – Mumbleskates Dec 20 '15 at 00:00
3

@Widdershins you are absolute right - ALTER TABLE in SQLite does not not allow adding constraints. I don't know what I was smoking - will update the answer - thanks. – Lester Cheung Jan 25 '16 at 14:40
None of those suggestions have anything to do with using humongous SQLite db files. Was the question edited since this answer was submitted? – A. Rager Apr 04 '16 at 03:46

score 11 · Answer 6 · answered Jul 15 '11 at 08:29

11

I have a 7GB SQLite database. To perform a particular query with an inner join takes 2.6s In order to speed this up I tried adding indexes. Depending on which index(es) I added, sometimes the query went down to 0.1s and sometimes it went UP to as much as 7s. I think the problem in my case was that if a column is highly duplicate then adding an index degrades performance :(

answered Jul 15 '11 at 08:29

Mike Oxynormas

111
1
2

10

Why would a column with many duplicates degrade performance (serious question)? – Martin Velez Aug 15 '12 at 07:28
7

a column with low cardinality is harder to index: http://stackoverflow.com/questions/2113181/does-it-make-sense-to-use-an-index-that-will-have-a-low-cardinality – metrix Jan 02 '14 at 21:32

score 9 · Answer 7 · answered Apr 24 '09 at 01:29

9

I think the main complaints about sqlite scaling is:

Single process write.
No mirroring.
No replication.

answered Apr 24 '09 at 01:29

Unknown

43,382
24
132
174

score 9 · Answer 8 · edited May 13 '13 at 13:00

9

There used to be a statement in the SQLite documentation that the practical size limit of a database file was a few dozen GB:s. That was mostly due to the need for SQLite to "allocate a bitmap of dirty pages" whenever you started a transaction. Thus 256 byte of RAM were required for each MB in the database. Inserting into a 50 GB DB-file would require a hefty (2^8)*(2^10)=2^18=256 MB of RAM.

But as of recent versions of SQLite, this is no longer needed. Read more here.

edited May 13 '13 at 13:00

Alix Axel

141,486
84
375
483

answered May 16 '09 at 19:00

25

I'm very sorry that I have to point this out, but `2^18` is in fact only 256 K. – Gabriel Schreiber Nov 29 '11 at 09:06
7

@GabrielSchreiber that, and also the fact that 50GB is not (2^10) MB, that's only 1GB. So for a 50GB database, you need 12.5MB of memory: (2^8) * (2^10) * 50 – elipoultorak Aug 25 '15 at 13:18

score 8 · Answer 9 · answered Apr 30 '09 at 20:12

8

I've experienced problems with large sqlite files when using the vacuum command.

I haven't tried the auto_vacuum feature yet. If you expect to be updating and deleting data often then this is worth looking at.

answered Apr 30 '09 at 20:12

eodonohoe

264
2
6

What are the performance characteristics of sqlite with very large database files?

9 Answers9

Linked

Related