0

There have been a number of questions about speeding up INSERTs into an SQL database, such as this, this, this, this and (my favorite) this. Many of the questions masquerade as language dependent, but the question generally reduces to:

What general techniques will speed up INSERTS into my [specific DBMS] SQL database from my [specific language] program?

Community
  • 1
  • 1
fearless_fool
  • 29,889
  • 20
  • 114
  • 193
  • This also highly depends on the DBMS being used. Which is it? – a_horse_with_no_name Feb 12 '14 at 21:22
  • Agreed. Editing title to make it clear this attempts to be DBMS agnostic... – fearless_fool Feb 12 '14 at 21:25
  • Hi moderators: I humbly request that you reconsider the "too broad" hold. I've re-worded the question to make it more specific (ironically by asking for a "general technique") that speeds up inserts. There really are not that many answers for this specific question (the answer enumerate them), so this question isn't overly broad as defined by the SO guidelines. Thanks. – fearless_fool Feb 14 '14 at 04:16
  • P.S. One moderator pointed out that the question didn't show any effort in the way of code or results. Since this was posed as a Q&A (http://stackoverflow.com/help/self-answer), all the effort was intentionally put in the answer, not the question. – fearless_fool Feb 19 '14 at 18:02

1 Answers1

3

The biggest inefficiencies during INSERTS arise from two sources: individual data transactions take time, and the ORM adaptor between your language and the database isn't always particularly efficient.

This answer uses Ruby On Rails as its example language, but the techniques shown here apply to almost any high-level language with an interface to an underlying database.

Conclusion (TL;DR)

Inserting 500 records at a time via either of the bulk insert methods listed below can give you speedups in excess of 20x. With tuning, that could become even higher.

The Tests

Let's start with an in-memory array of 123001 records. (These were derived from "trips.txt" file from the [City of Toronto Transportation Datasets].1)

dataset = Utilities.load_csv(Rails.root.join("datasets", "trips.txt")

Baseline: One-at-a-time inserts

In Ruby On Rails, you can do:

dataset.each {|record| Trip.create!(record) }

which essentially translates into 123,000 individual SQL calls of the form:

INSERT INTO "trips" (<column_names>) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)  [<column name/value pairs>]
(123000 more times...)

Speedup 1: Wrapped in a transaction

This is almost the same, but wraps the inner loop in a single SQL transaction:

begin transaction
INSERT INTO "trips" (<column_names>) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)  [<column name/value pairs>]
(123000 more times...)
commit transaction

Speedup 2: Bulk INSERT version A

This one breaks the original array into batches of 500 (to avoid limitations of SQLite and potential string buffer sizes), where each transaction has the form:

INSERT INTO trips (comma_separated_column_names) VALUES
    (comma_separated_values_for_row_1),
    (comma_separated_values_for_row_2),
    ...
    (comma_separated_values_for_row_500);
... repeated 246 times

Speedup 3: Bulk INSERT version B

Some DBMSs don't support the syntax of Bulk INSERT version A (notably older versions of SQLite). The following form is functionally identical and supported by many databases (though not strictly SQL-92 compliant):

     INSERT INTO trips (comma_separated_column_names)
          SELECT comma_separated_values_for_row_1
UNION ALL SELECT comma_separated_values_for_row_2
...
UNION ALL SELECT comma_separated_values_for_row_500
...repeated 246 times

Speedup 4: DBM-specific methods

As pointed out by https://stackoverflow.com/users/20860/bill-karwin, one can

take advantage of vendor-specific bulk-loading commands, for example LOAD DATA INFILE for MySQL, COPY for PostgreSQL, or SQL*Loader for Oracle, etc. Each brand has their own command or tool, so there's no vendor-neutral way of doing this, but these methods often have an order of magnitude better performance, so they should not be overlooked.

While these are not general techniques, they will be useful in specific cases. We didn't benchmark any of these in the tests below.

Relative Speedups

We tested the various techniques shown above in MySQL, PostgreSQL and SQLite. The numbers below show the speed of various methods relative to the Baseline case. The first number is the relative user + system time, the number in (parenthesis) is the relative elapsed time.

(Note: I have chosen NOT to display absolute times because this is not a discussion about which database is fastest -- there are too many variables to make sensible claims about that. If nudged, I'll post both the code on github you can run your own test and draw your own conclusion.)

MySQL

  • Wrapped In A Transaction: 1.2x (1.4x)
  • Bulk INSERT version A: 24.3x (19.0x)
  • Bulk INSERT version B: 24.3x (17.1x)

PostgreSQL

  • Wrapped In A Transaction: 1.2x (1.6x)
  • Bulk INSERT version A: 27.2x (16.7x)
  • Bulk INSERT version B: 27.2x (13.9x)

SQLite

  • Wrapped In A Transaction: 1.6x (2.4x)
  • Bulk INSERT version A: 25.8x (24.7x)
  • Bulk INSERT version B: 24.1x (34.1x)

Testing notes

Processing environment: 2.66 GHz Intel Core i7, 8GB 1067 MHz DDR3
Operating System: OS X v 10.9.5 
Ruby version: 2.0.0 (64 bit) 
Rails version: 4.0.2 
MySQL: Server version: 5.1.49  
PostgreSQL: psql (9.3.1, server 8.3.14) 
SQLite: SQLite version 3.7.12 2012-04-03 19:43:07
Community
  • 1
  • 1
fearless_fool
  • 29,889
  • 20
  • 114
  • 193
  • Speedup 3 is actually SQL compliant whereas Speedup 2 isn't. The SQL standard doesn't allow for `SELECT`s without a `FROM` clause (and the `insert` with a multi-row `values` clause *is* specified in the SQL standard) – a_horse_with_no_name Feb 12 '14 at 21:23
  • Dang! Are you sure? I pored over the BNF for SQL-92 in http://savage.net.au/SQL/sql-92.bnf.html until my eyeballs bled just to make sure it was compliant. Can you point me to a counter example? – fearless_fool Feb 12 '14 at 21:28
  • A `` is `select` followed by ` – a_horse_with_no_name Feb 12 '14 at 21:37
  • @a_horse_with_no_name - ah, you're correct. Previous versions of SQLite didn't support Speedup 3, which is why I'd been using Speedup 2. If you know of a syntax that's more broadly supported, I'll gladly swap it in. Otherwise, I might simply nuke Speedup 2. What do you suggest? – fearless_fool Feb 12 '14 at 21:40
  • Both variants are supported by several DBMS (some support both, some only one and some neither). I'd suggest you simply remove the reference to "SQL-98 compliant" and mention that not every DBMS supports everything – a_horse_with_no_name Feb 12 '14 at 21:44
  • 3
    Speedup 4 might be to take advantage of vendor-specific bulk-loading commands, for example `LOAD DATA INFILE` for MySQL, `COPY` for PostgreSQL, or `SQL*Loader` for Oracle, etc. Each brand has their own command or tool, so there's no vendor-neutral way of doing this, but these methods often have an order of magnitude better performance, so they should not be overlooked. – Bill Karwin Feb 12 '14 at 21:50
  • Bill Karwin: I invite you to excerpt your commant as an alternate answer -- you'll probably get some points for it! :) – fearless_fool Feb 12 '14 at 22:21
  • @a_horse_with_no_name: references to sql standards removed. thanks! – fearless_fool Feb 12 '14 at 22:31
  • @fearless_fool, the question has been put on hold, so I can no longer add an answer. Feel free to add my comment as 'Speedup 4' in your own answer above. – Bill Karwin Feb 14 '14 at 00:14
  • @BillKarwin: let me see if I can get the question un-held first. If that doesn't work, I'll add your Speedup 4 suggestion. – fearless_fool Feb 14 '14 at 17:51