psycopg2: insert multiple rows with one query

Question

I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:

INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);

The only way I know is

args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)

but I want some simpler way.

score 246 · Answer 1 · edited Mar 09 '17 at 11:40

246

I built a program that inserts multiple lines to a server that was located in another city.

I found out that using this method was about 10 times faster than executemany. In my case tup is a tuple containing about 2000 rows. It took about 10 seconds when using this method:

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)

and 2 minutes when using this method:

cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)

edited Mar 09 '17 at 11:40

Alex Riley

132,653
39
224
205

answered Apr 13 '12 at 19:53

ant32

2,607
1
11
4

16

Still very relevant almost two years later. An experience today suggests that as the number of rows you want to push increases, the better it is to use the `execute` strategy. I saw speedup of around 100x thanks to this! – Rob Watts Jan 22 '14 at 21:16
1

Interesting...I wonder how much this depends on the type of data being inserted, or if you have any roles or triggers on your tables. I'm inserting a mix of numeric, string, and date data into tables with roles performing foreign key checks, and for me, `executemany()` is 10 seconds faster. Curious. :/ – dmn Jul 29 '14 at 18:18
4

Perhaps `executemany` runs a commit after each insert. If you instead wrap the whole thing in a transaction, maybe that would expedite things? – Richard Apr 12 '15 at 17:06
4

Just confirmed this improvement myself. From what I've read psycopg2's `executemany` doesn't do anything optimal, just loops and does many `execute` statements. Using this method, a 700 row insert to a remote server went from 60s to <2s. – Nelson Apr 27 '15 at 23:22
You would think executemany would do things more efficiently but looks like MySQL suffers from this executemany slowdown issue. Can a patch be submitted or executemany does this for a reason? – ThinkCode Mar 17 '16 at 02:34
1

If `autocommit` in the connection object is set to `True`, this might be slowing down `cur.executemany()` unnecessarily. – andrew Mar 25 '17 at 18:41
6

Maybe I'm being paranoid, but concatenating the query with a `+` seems like it could open up to sql injection, I feel like @Clodoaldo Neto `execute_values()` solution is safer. – Will Munn Jan 17 '18 at 11:55
Is this safe from SQL injection? What if my values come from an untrusted source? – pdowling Mar 29 '18 at 09:35
3

Batch preparation of inserts is really so much faster, thanks a lot for this hint. Want to add: I had a Value Error for ','join because mogrify returns byte and not string (and I was storing results of mogrifiy in a list first). I used .decode on the mogrify result and this solved the problem. – Alex Jun 19 '18 at 12:17
1

The docs http://initd.org/psycopg/docs/extras.html#fast-execution-helpers seem to say that `executemany` is not a great solution. `execute_values` or `copy_from` are better options – aydow Jul 02 '18 at 23:49
39

in case someone encounters the following error: [TypeError: sequence item 0: expected str instance, bytes found] run this command instead [args_str = ','.join(cur.mogrify("(%s,%s)", x).decode("utf-8") for x in tup)] – mrt Sep 03 '18 at 23:08
@WillMunn my GWAPT made me more paranoid but there's no site for SQLi there since all inputs are sanitized. You're right in the sense that it's best to practice good habits using the safe methods though. – Jonathan Neufeld Feb 11 '21 at 00:47

Clodoaldo Neto · Answer 2 · 2017-05-18T12:20:25.470

186

New execute_values method in Psycopg 2.7:

data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
    cursor, insert_query, data, template=None, page_size=100
)

The pythonic way of doing it in Psycopg 2.6:

data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)

Explanation: If the data to be inserted is given as a list of tuples like in

data = [(1,'x'), (2,'y')]

then it is already in the exact required format as

the values syntax of the insert clause expects a list of records as in

insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg adapts a Python tuple to a Postgresql record.

The only necessary work is to provide a records list template to be filled by psycopg

# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))

and place it in the insert query

insert_query = 'insert into t (a, b) values {}'.format(records_list_template)

Printing the insert_query outputs

insert into t (a, b) values %s,%s

Now to the usual Psycopg arguments substitution

cursor.execute(insert_query, data)

Or just testing what will be sent to the server

print (cursor.mogrify(insert_query, data).decode('utf8'))

Output:

insert into t (a, b) values (1, 'x'),(2, 'y')

edited May 18 '17 at 12:20

answered Jun 22 '15 at 16:50

Clodoaldo Neto

98,807
21
191
235

1

How does the performance of this method compare with cur.copy_from? – Michael Goldshteyn Mar 03 '16 at 22:05
1

[Here's a gist with a benchmark](https://gist.github.com/jsheedy/efa9a69926a754bebf0e9078fd085df6). copy_from scales to about 6.5X faster on my machine with 10M records. – Joseph Sheedy Jul 13 '16 at 00:28
Looks nice - I think you have a stray , at the end of your initial definition of insert_query (unless you were trying to make it a tuple?) and are missing a s after the % for %s also in the initial definition of insert_query. – deadcode May 18 '17 at 01:22
2

using `execute_values` I was able to get my system running at 1k records a minute up to 128k records a minute – Conrad.Dean Nov 15 '18 at 15:17
1

My inserts didn't get registered properly until I called `connection.commit()` after the `execute_values(...)`. – Philipp Aug 18 '20 at 15:24
1

@Phillipp that's normal with every execute statement, unless you're in auto commmit mode. – Chris Sep 03 '20 at 10:17
Using execute_values we got results ~20% faster compared to the same page size using execute_batch. – AKHIL MATHEW Apr 27 '21 at 11:37

Antoine Dusséaux · Answer 3 · 2018-11-24T09:12:01.650

Update with psycopg2 2.7:

The classic executemany() is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com

This implementation was added to psycopg2 in version 2.7 and is called execute_values():

from psycopg2.extras import execute_values
execute_values(cur,
    "INSERT INTO test (id, v1, v2) VALUES %s",
    [(1, 2, 3), (4, 5, 6), (7, 8, 9)])

Previous Answer:

To insert multiple rows, using the multirow VALUES syntax with execute() is about 10x faster than using psycopg2 executemany(). Indeed, executemany() just runs many individual INSERT statements.

@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify() returns bytes, cursor.execute() takes either bytes or strings, and ','.join() expects str instance.

So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8'):

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)

Or by using bytes (with b'' or b"") only:

args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)

Joseph Sheedy · Answer 4 · 2016-09-30T20:25:25.147

31

cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be

args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))

For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.

edited Sep 30 '16 at 20:25

answered Jun 09 '15 at 01:06

Joseph Sheedy

5,326
2
26
29

3

[Here is a benchmark](https://gist.github.com/jsheedy/efa9a69926a754bebf0e9078fd085df6) comparing copy_from/IteratorFile with a query builder solution. copy_from scales to about 6.5X faster on my machine with 10M records. – Joseph Sheedy Sep 30 '16 at 17:49
3

do you have to dick around with escaping strings and timestamps etc? – CpILL Jun 13 '17 at 09:35
Yes, you'll have to make sure you have a well formed TSV records. – Joseph Sheedy Jun 14 '17 at 02:51

score 27 · Answer 5 · edited Oct 02 '15 at 01:22

27

A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):

A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:

namedict = ({"first_name":"Joshua", "last_name":"Drake"},
            {"first_name":"Steven", "last_name":"Foo"},
            {"first_name":"David", "last_name":"Bar"})

You could easily insert all three rows within the dictionary by using:

cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)

It doesn't save much code, but it definitively looks better.

edited Oct 02 '15 at 01:22

Eric Leschinski

123,728
82
382
321

answered Dec 14 '11 at 11:06

ptrn

3,992
4
26
29

41

This will run many individual `INSERT` statements. Useful, but not the same as a single multi-`VALUE`d insert. – Craig Ringer Apr 09 '13 at 02:26
And in the same document it is written The cur.executemany statement will automatically iterate through the dictionary and execute the INSERT query for each row. – sp1rs Jan 21 '21 at 05:09

J.J · Answer 6 · 2016-11-24T01:47:10.843

All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).

Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:

valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows = []
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
        # row == [1, 'a', 'yolo', ... ]
        sqlrows += row
        if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
                # sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
                insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
                cur.execute(insertSQL, sqlrows)
                con.commit()
                sqlrows = []
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()

But it should be noted that if you can use copy_from(), you should use copy_from ;)

Bringing up from the dead, but what happens in the situation of the last few rows? I assume you actually run that final clause again on the last remaining rows, in the case you have an even number of rows? — mcpeterson, Nov 23 '16 at 23:03
Correct, sorry I must have forgotten to do that when i wrote the example - thats pretty stupid of me. Not doing so wouldn't have given people an error, which makes me worried how many people copy/pasted the solution and went about their business..... Anyway, very grateful mcpeterson - thank you! — J.J, Nov 24 '16 at 01:49

score 4 · Answer 7 · answered Jul 27 '17 at 01:29

4

I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify returns a byte string.

Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.

args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup) 
cur.execute(b"INSERT INTO table VALUES " + args_str)

answered Jul 27 '17 at 01:29

jprockbelly

1,372
12
25

An easier option would be to decode `cur.mogrify()` – Hemil Aug 14 '20 at 12:36

score 2 · Answer 8 · edited Jun 05 '20 at 02:43

The cursor.copyfrom solution as provided by @jopseph.sheedy (https://stackoverflow.com/users/958118/joseph-sheedy) above (https://stackoverflow.com/a/30721460/11100064) is indeed lightning fast.

However, the example he gives are not generically usable for a record with any number of fields and it took me while to figure out how to use it correctly.

The IteratorFile needs to be instantiated with tab-separated fields like this (r is a list of dicts where each dict is a record):

    f = IteratorFile("{0}\t{1}\t{2}\t{3}\t{4}".format(r["id"],
        r["type"],
        r["item"],
        r["month"],
        r["revenue"]) for r in records)

To generalise for an arbitrary number of fields we will first create a line string with the correct amount of tabs and field placeholders : "{}\t{}\t{}....\t{}" and then use .format() to fill in the field values : *list(r.values())) for r in records:

        line = "\t".join(["{}"] * len(records[0]))

        f = IteratorFile(line.format(*list(r.values())) for r in records)

complete function in gist here.

score 1 · Answer 9 · answered Oct 18 '15 at 11:04

Another nice and efficient approach - is to pass rows for insertion as 1 argument, which is array of json objects.

E.g. you passing argument:

[ {id: 18, score: 1}, { id: 19, score: 5} ]

It is array, which may contain any amount of objects inside. Then your SQL looks like:

INSERT INTO links (parent_id, child_id, score) 
SELECT 123, (r->>'id')::int, (r->>'score')::int 
FROM unnest($1::json[]) as r

Notice: Your postgress must be new enough, to support json

Jeff Widman · Answer 10 · 2017-10-19T02:47:56.650

1

If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES clause for a single INSERT statement:

rows = []
for i, name in enumerate(rawdata):
    row = {
        'id': i,
        'name': name,
        'valid': True,
    }
    rows.append(row)
if len(rows) > 0:  # INSERT fails if no rows
    insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
    session.execute(insert_query)

edited Oct 19 '17 at 02:47

answered Apr 27 '16 at 02:48

Jeff Widman

16,338
10
59
80

Under the hood SQLAlchemy uses psychopg2's executemany() for calls like this and therefore this answer will have severe performance issues for large queries. See the execute method http://docs.sqlalchemy.org/en/latest/orm/session_api.html. – sage88 Jan 05 '17 at 22:48
2

I don't think that's the case. It's been a bit since I looked at this, but IIRC, this is actually building a single insertion statement in the `insert_query` line. Then, `session.execute()` is just calling psycopg2's `execute()` statement with a single massive string. So the "trick" is building the entire insertion statement object first. I am using this to insert 200,000 rows at a time and saw massive performance increases using this code compared to the normal `executemany()`. – Jeff Widman Jan 06 '17 at 01:13
1

The SQLAlchemy doc you linked to has a section that shows exactly how this works and even says: "It is essential to note that passing multiple values is NOT the same as using traditional executemany() form". So it is explicitly calling out that this works. – Jeff Widman Jan 06 '17 at 01:23
1

I stand corrected. I didn't notice your usage of the values() method (without it SQLAlchemy just does executemany). I would say edit the answer to include a link to that doc so that I can change my vote, but obviously you've already included it. Perhaps mention that this is not the same thing as calling an insert() with execute() with a list of dicts? – sage88 Jan 06 '17 at 06:15
how does it perform compared to execute_values? – MrR Apr 30 '19 at 01:26

gerardw · Answer 11 · 2020-04-06T11:09:45.987

1

execute_batch has been added to psycopg2 since this question was posted.

It is slower than execute_values but simpler to use.

edited Apr 06 '20 at 11:09

answered Feb 22 '20 at 14:59

gerardw

4,437
33
33

3

See other comments. psycopg2's method `execute_values` is **faster** than `execute_batch` – Fierr Apr 05 '20 at 14:54

score 1 · Answer 12 · answered May 22 '20 at 15:19

executemany accept array of tuples

https://www.postgresqltutorial.com/postgresql-python/insert/

    """ array of tuples """
    vendor_list = [(value1,)]

    """ insert multiple vendors into the vendors table  """
    sql = "INSERT INTO vendors(vendor_name) VALUES(%s)"
    conn = None
    try:
        # read database configuration
        params = config()
        # connect to the PostgreSQL database
        conn = psycopg2.connect(**params)
        # create a new cursor
        cur = conn.cursor()
        # execute the INSERT statement
        cur.executemany(sql,vendor_list)
        # commit the changes to the database
        conn.commit()
        # close communication with the database
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()

score -1 · Answer 13 · answered Nov 16 '15 at 15:21

If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:

 t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
      {'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
      {'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]

conn.execute("insert into campaign_dates
             (id, start_date, end_date, campaignid) 
              values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
             t)

As you can see only one query will be executed:

INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT

Showing logging from the sqlalchemy engine is NOT a demonstration of only running a single query, it just means that the sqlalchemy engine ran one command. Under the hood this is using psychopg2's executemany which is very inefficient. See the execute method http://docs.sqlalchemy.org/en/latest/orm/session_api.html. — sage88, Jan 05 '17 at 23:06

score -4 · Answer 14 · answered Jul 16 '15 at 10:41

-4

Using aiopg - The snippet below works perfectly fine

    # items = [10, 11, 12, 13]
    # group = 1
    tup = [(gid, pid) for pid in items]
    args_str = ",".join([str(s) for s in tup])
    # insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
    yield from cur.execute("INSERT INTO group VALUES " + args_str)

answered Jul 16 '15 at 10:41

Nihal Sharma

2,065
7
34
55

11

This method is not safe from SQL injections. As psycopg2 documentation states ([that aiopg2 links to](https://aiopg.readthedocs.org/en/stable/core.html#aiopg.Connection)): ['Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.'](http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters) – Michał Pawłowski Sep 10 '15 at 16:13

score -5 · Answer 15 · answered Jan 02 '18 at 02:01

Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:

engine = create_engine(
    "postgresql+psycopg2://scott:tiger@host/dbname",
    use_batch_mode=True)

http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109

Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..

psycopg2: insert multiple rows with one query

15 Answers15

Linked