The biggest inefficiencies during INSERTS arise from two sources: individual data transactions take time, and the ORM adaptor between your language and the database isn't always particularly efficient.
This answer uses Ruby On Rails as its example language, but the techniques shown here apply to almost any high-level language with an interface to an underlying database.
Conclusion (TL;DR)
Inserting 500 records at a time via either of the bulk insert methods listed below can give you speedups in excess of 20x. With tuning, that could become even higher.
The Tests
Let's start with an in-memory array of 123001 records. (These were derived from "trips.txt" file from the [City of Toronto Transportation Datasets].1)
dataset = Utilities.load_csv(Rails.root.join("datasets", "trips.txt")
Baseline: One-at-a-time inserts
In Ruby On Rails, you can do:
dataset.each {|record| Trip.create!(record) }
which essentially translates into 123,000 individual SQL calls of the form:
INSERT INTO "trips" (<column_names>) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) [<column name/value pairs>]
(123000 more times...)
Speedup 1: Wrapped in a transaction
This is almost the same, but wraps the inner loop in a single SQL transaction:
begin transaction
INSERT INTO "trips" (<column_names>) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) [<column name/value pairs>]
(123000 more times...)
commit transaction
Speedup 2: Bulk INSERT version A
This one breaks the original array into batches of 500 (to avoid limitations of SQLite and potential string buffer sizes), where each transaction has the form:
INSERT INTO trips (comma_separated_column_names) VALUES
(comma_separated_values_for_row_1),
(comma_separated_values_for_row_2),
...
(comma_separated_values_for_row_500);
... repeated 246 times
Speedup 3: Bulk INSERT version B
Some DBMSs don't support the syntax of Bulk INSERT version A (notably older versions of SQLite). The following form is functionally identical and supported by many databases (though not strictly SQL-92 compliant):
INSERT INTO trips (comma_separated_column_names)
SELECT comma_separated_values_for_row_1
UNION ALL SELECT comma_separated_values_for_row_2
...
UNION ALL SELECT comma_separated_values_for_row_500
...repeated 246 times
Speedup 4: DBM-specific methods
As pointed out by https://stackoverflow.com/users/20860/bill-karwin, one can
take advantage of vendor-specific bulk-loading commands, for example
LOAD DATA INFILE for MySQL, COPY for PostgreSQL, or SQL*Loader for
Oracle, etc. Each brand has their own command or tool, so there's no
vendor-neutral way of doing this, but these methods often have an
order of magnitude better performance, so they should not be
overlooked.
While these are not general techniques, they will be useful in specific cases. We didn't benchmark any of these in the tests below.
Relative Speedups
We tested the various techniques shown above in MySQL, PostgreSQL and SQLite. The numbers below show the speed of various methods relative to the Baseline case. The first number is the relative user + system time, the number in (parenthesis) is the relative elapsed time.
(Note: I have chosen NOT to display absolute times because this is not a discussion about which database is fastest -- there are too many variables to make sensible claims about that. If nudged, I'll post both the code on github you can run your own test and draw your own conclusion.)
MySQL
- Wrapped In A Transaction: 1.2x (1.4x)
- Bulk INSERT version A: 24.3x (19.0x)
- Bulk INSERT version B: 24.3x (17.1x)
PostgreSQL
- Wrapped In A Transaction: 1.2x (1.6x)
- Bulk INSERT version A: 27.2x (16.7x)
- Bulk INSERT version B: 27.2x (13.9x)
SQLite
- Wrapped In A Transaction: 1.6x (2.4x)
- Bulk INSERT version A: 25.8x (24.7x)
- Bulk INSERT version B: 24.1x (34.1x)
Testing notes
Processing environment: 2.66 GHz Intel Core i7, 8GB 1067 MHz DDR3
Operating System: OS X v 10.9.5
Ruby version: 2.0.0 (64 bit)
Rails version: 4.0.2
MySQL: Server version: 5.1.49
PostgreSQL: psql (9.3.1, server 8.3.14)
SQLite: SQLite version 3.7.12 2012-04-03 19:43:07