6

For querying an sqlite table based on a list of IDs (i.e. distinct primary keys) I am using following statement (example based on the Chinook Database):

SELECT * FROM Customer WHERE CustomerId IN (1,2,3,8,20,35)

However, my actual list of IDs might become rather large (>1000). Thus, I was wondering if this approach using the IN statement is the most efficient or if there is a better/optimized way to query an sqlite table based on a list of primary keys.

CL.
  • 158,085
  • 15
  • 181
  • 214
Johannes
  • 862
  • 10
  • 25

3 Answers3

1

If the number of elements in the IN is large enough, SQLite constructs a temporary index for them. This is likely to be more efficient than creating a temporary table manually.

The length of the IN list is limited only be the maximum length of an SQL statement, and by memory.

CL.
  • 158,085
  • 15
  • 181
  • 214
  • Can you link to where this is documented, the automatic usage of temp tables for large IN queries? My experience with SQLite on Android, I ran into the SQLITE_MAX_VARIABLE_NUMBER, which is 999. – Jeff Lockhart Sep 13 '18 at 22:52
  • @JeffLockhart Your Android might have an older SQLite version. The only documentation for such details is the source code. – CL. Sep 14 '18 at 06:51
  • Interesting. If you could possibly indicate approximately where in the source this optimization occurs, I might be able to narrow it to the version it was introduced in and compare with the versions running on the various Android platform versions: https://stackoverflow.com/questions/2421189/version-of-sqlite-used-in-android It was a few years ago I initially experienced this, so I'd be curious to know if the behavior has changed. – Jeff Lockhart Sep 14 '18 at 18:46
  • 1
    @JeffLockhart The IN(…) contents are parsed with `exprlist(A)` in `parse.y`. The temp table is generated in the last part of `sqlite3FindInIndex()` in `expr.c`. Anyway, if you are using parameters, then you cannot avoid running into SQLITE_MAX_VARIABLE_NUMBER. – CL. Sep 15 '18 at 06:59
  • Thanks for the info. Exploring the source, it's described in `expr.c` that: `Whether this is an 'x IN(SELECT...)' or an 'x IN()' expression it is handled the same way. An ephemeral table is filled with index keys representing the results from the SELECT or the .` – Jeff Lockhart Sep 17 '18 at 07:34
  • Additionally: `If no preexisting index is available for the IN clause and IN_INDEX_NOOP is an allowed reply and the RHS of the IN operator is a list, not a subquery and the RHS is not constant or has two or fewer terms, then it is not worth creating an ephemeral table to evaluate the IN operator so return IN_INDEX_NOOP.` So it looks like 3 or more elements gets the temp table optimization automatically. – Jeff Lockhart Sep 17 '18 at 07:36
  • At least the documentation of this behavior was added in this check-in: https://www.sqlite.org/src/info/0f6c9b05e688e281 So it's likely been since at least v3 and would have been in every version of Android. That's helpful to know the `SQLITE_MAX_VARIABLE_NUMBER` limit doesn't affect this optimization. – Jeff Lockhart Sep 17 '18 at 07:39
0

Because the statement you wrote does not include any instructions to SQLite about how to find the rows you want the concept of "optimizing" doesn't really exist -- there's nothing to optimize. The job of planning the best algorithm to retrieve the data belongs to the SQLite query optimizer.

Some databases do have idiosyncrasies in their query optimizers which can lead to performance issues but I wouldn't expect SQLite to have any trouble finding the correct algorithm for this simple query, even with lots of values in the IN list. I would only worry about trying to guide the query optimizer to another execution plan if and when you find that there's a performance problem.

Larry Lustig
  • 46,058
  • 13
  • 95
  • 143
0

SQLite Optimizer Overview

IN (expression-list) does use an index if available.

Beyond that, I can't glean any guarantees from it, so the following is subject to a performance measaurement.

Axis 1: how to pass the expression-list

  • hardocde as string. Overhead for int-to-string conversion and string-to-int parsing
  • bind parameters (i.e. the statement is ... WHERE CustomerID in (?,?,?,?,?,?,?,?,?,?....), which is easier to build from a predefined string than hardcoded values). Prevents int → string → int conversion, but the default limit for number of parameters is 999. This can be increased by SQLITE_LIMIT_VARIABLE_NUMBER, but might lead to excessive allocations.
  • Temporary table. Possibly less efficient than any of the above methods after the statement is prepared, but that doesn't help if most time is spent preparing the statement

Axis 2: Statement optimization

If the same expression-list is used in multiple queries against changing CustomerIDs, one of the following may help:

  • reusing a prepared statement with hardcoded values (i.e. don't pass 1001 parameters)
  • create a temporary table for the CustomerIDs with index (so the index is created once, not on the fly for every query)

If the expression-list is different with every query, ist is probably best to let SQLite do its job. The following might be an improvement

  • create a temp table for the expression-list
  • bulk-insert expression-list elements using union all
  • use a sub query

(from my experience with SQLite, I'd expect it to be on par or slightly worse)


Axis 3 Ask Richard

the sqlite mailing list (yeah I know, that technology even older than rotary phones!) is pretty active with often excellent advise, including from the author of SQLite. 90% chance someone will dismiss you ass "Measure before asking suhc a question!", 10% chance someone gives you detailed insight.

Community
  • 1
  • 1
peterchen
  • 38,919
  • 19
  • 95
  • 176