158

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.

What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?

Ondra Žižka
  • 36,997
  • 35
  • 184
  • 250
gpilotino
  • 12,007
  • 8
  • 45
  • 61
  • See also: http://stackoverflow.com/questions/5269590/why-doesnt-this-rule-prevent-duplicate-key-violations – Dave Jarvis Apr 03 '12 at 18:16
  • See also: http://stackoverflow.com/q/1109061/330315 and http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates – a_horse_with_no_name Aug 31 '15 at 06:41
  • 6
    as of 9.5, it's possible natively: http://stackoverflow.com/a/34639631/4418 – warren Apr 07 '16 at 18:39
  • Emulating MySQL: `ON DUPLICATE KEY UPDATE` on PgSQL 9.5 is still somewhat impossible, because PgSQL `ON CLAUSE` equivalent requires you to provide the constraint name, whilst MySQL could capture any constraint without the need of defining it. This prevents me from "emulating" this feature without rewriting queries. – NeverEndingQueue Oct 13 '18 at 11:55

11 Answers11

206

With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):

INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")

9.5 brings support for "UPSERT" operations. INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.

...

Further example of new syntax:

INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1) 
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Community
  • 1
  • 1
warren
  • 28,486
  • 19
  • 80
  • 115
100

Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!


Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):

CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
  WHERE EXISTS(SELECT 1 FROM my_table 
                WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
  DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";

Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.

Community
  • 1
  • 1
EoghanM
  • 20,021
  • 21
  • 80
  • 110
  • @sema you mean if `another_schema.my_table` contains duplicates according to the constraints of `my_table`? – EoghanM Feb 10 '14 at 17:43
  • 2
    @EoghanM I tested the rule in postgresql 9.3 and could still insert duplicates with multiple row insert statements like e.g. INSERT INTO "my_table" (a,b),(a,b); (Assuming that row (a,b) did not exist in "my_table" yet.) – sema Feb 12 '14 at 08:39
  • @sema, gotcha - that must mean the rule is executed at the start over all the data to be inserted, and not reexecuted after each row is inserted. One approach would be to insert your data into another temporary table first which doesn't have any constraints, and then doing `INSERT INTO "my_table" SELECT DISTINCT ON (pk_col_1, pk_col_2) * FROM the_tmp_table;` – EoghanM Feb 12 '14 at 13:09
  • @EoghanM Another approach is to relax duplicate constraints temporarily and to accept duplicates on insert, but remove duplicates afterwards with `DELETE FROM my_table WHERE ctid IN (SELECT ctid FROM (SELECT ctid,ROW_NUMBER() OVER (PARTITION BY pk_col_1,pk_col_2) AS rn FROM my_table) AS dups WHERE dups.rn > 1);` – sema Feb 13 '14 at 09:52
  • I'm having the problem described by @sema. If I do an insert (a, b), (a, b), it throws an error. Is there a way to suppress the errors, also in this case? – Diogo Melo Jul 25 '14 at 19:59
  • I tried this rule and the runtime seems to be quadratic on the number of rows I'm inserting. Any ideas why this is the case? Seems to be a bug as a naive implementation should work in linear time ( O(n*m) where n are the rows in the insert and m the rows already in the table) – oerpli Apr 08 '16 at 10:09
  • This works for me! A problem is you have to ensure there is no duplicate tuples in the bulk data you are going to insert, or you will get the 'duplicate key violation' error all the same. To achieve this, you still need to modify the select clause with keyword distinct: insert into my_table (col1, col2) select distinct col3, col4 from another_table where ... – snowfox Apr 25 '16 at 03:37
45

For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:

INSERT INTO target_table (field_one, field_two, field_three ) 
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;

For those of us who have an earlier version, this right join will work instead:

INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table 
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
hanmari
  • 1,044
  • 9
  • 18
  • The second approach doesn't work when making a big insert in an concurrent environment. You get a `Unique violation: 7 ERROR: duplicate key value violates unique constraint` when `target_table` had another row inserted into it *while* this query was being executed, if their keys, indeed, duplicate one another. I believe that locking `target_table` will help, but concurrency will obviously suffer. – G. Kashtanov May 22 '18 at 06:32
  • 2
    `ON CONFLICT (field_one) DO NOTHING` is the best part of the answer. – Abel Callejo Jul 10 '19 at 06:21
  • Props for the ON CONFLIT DO NOTHING. Loved it – D. Melo Aug 20 '20 at 14:45
40

Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.

You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.

There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.

That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.

This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)

Gili
  • 76,473
  • 85
  • 341
  • 624
Magnus Hagander
  • 20,794
  • 3
  • 50
  • 41
  • 4
    "If you're dealing with large amounts of rows" that's exactly my case. I want to bulk update/insert rows and with mysql i can do this with only ONE query without any looping. Now I wonder if this is possible with postgresql too: to use just one query to bulk update OR insert. You say: "you're best of splitting it into two queries, one for INSERT and one for UPDATE" but how can I do an insert which does not throw errors on duplicate keys ? (ie. "INSERT IGNORE") – gpilotino Jun 18 '09 at 09:41
  • 4
    Magnus meant that you use a query like this: "start transaction; create temporary table temporary_table as select * from test where false; copy temporary_table from 'data_file.csv'; lock table test; update test set data=temporary_table.data from temporary_table where test.id=temporary_table.id; insert into test select * from temporary_table where id not in (select id from test) as a" – Tometzky Jun 18 '09 at 11:32
  • 33
    *Update:* with PostgreSQL 9.5 this is now as simple as `INSERT ... ON CONFLICT DO NOTHING;`. See also answer http://stackoverflow.com/a/34639631/2091700. – Alphaaa Apr 07 '16 at 13:02
  • Important, SQL-standard `MERGE` is *not* an concurrency safe upsert, unless you take a `LOCK TABLE` first. People use it that way, but it's wrong. – Craig Ringer Sep 21 '17 at 13:01
  • 1
    With v9.5 it's now a 'native' feature, so please check @Alphaaa 's comment (just advertising the comment that advertises the answer) – Camilo Delvasto Apr 06 '18 at 18:44
27

To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.

INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
  ('935',' Citroën Brazil','Citroën'),
  ('ABC', 'Toyota', 'Toyota'),
  ('ZOM',' OM','OM')
  ) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
  WHERE NOT EXISTS (
    --ignore anything that has already been inserted
    SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
Keyo
  • 12,779
  • 17
  • 69
  • 107
23
INSERT INTO mytable(col1,col2) 
    SELECT 'val1','val2' 
    WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
ssuperczynski
  • 2,473
  • 2
  • 37
  • 57
user2342158
  • 331
  • 2
  • 3
  • What is the impact of multiple transactions all trying to do the same thing? Is it possible that between the where not exists executing and the insert executing some other transaction does insert a row? And if Postgres can prevent that, then isn't postgres introducing a point of synchronization across all those transactions when they hit this? – Καrτhικ Feb 07 '14 at 16:30
  • This does not work with multiple transactions, because the newly added data is not visible to the other transactions. – Dave Johansen Jun 09 '15 at 15:20
15

Looks like PostgreSQL supports a schema object called a rule.

http://www.postgresql.org/docs/current/static/rules-update.html

You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.

I haven't tried this myself, so I can't speak from experience or offer an example.

Bill Karwin
  • 462,430
  • 80
  • 609
  • 762
  • 1
    if i understood well these rules are triggers that get executed every time a statement is called. what if i want to apply the rule for only one query ? i have to create the rule then immediately destroy it ? (what about race conditions ?) – gpilotino Jun 17 '09 at 22:21
  • 3
    Yes, I'd have the same questions as well. The rule mechanism is the closest thing I could find in PostgreSQL to MySQL's INSERT IGNORE or ON DUPLICATE KEY UPDATE. If we google for "postgresql on duplicate key update" you find other folks recommending the Rule mechanism, even though a Rule would apply to any INSERT, not just on an ad hoc basis. – Bill Karwin Jun 17 '09 at 23:41
  • 5
    PostgreSQL supports transactional DDL, which means that if you create a rule and drop it within a single transaction, the rule will never have been visible outside of (and therefore will never have had any effect outside of) that transaction. – cdhowie Jun 29 '15 at 20:05
13

As @hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:

query = "INSERT INTO db_table_name(column_name)
         VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"

The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table. I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1. Example:

q = 'ALTER id_column serial RESTART WITH 1'

If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial. I hope this information is helpful to everyone. *I have no college degree in software development/coding. Everything I know in coding, I study on my own.

Bradmage
  • 1,087
  • 1
  • 12
  • 30
Yankeeownz
  • 301
  • 3
  • 11
4

This solution avoids using rules:

BEGIN
   INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION 
   WHEN unique_violation THEN
     UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;

but it has a performance drawback (see PostgreSQL.org):

A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one. Therefore, don't use EXCEPTION without need.

NumberFour
  • 3,233
  • 7
  • 44
  • 68
1

On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.

David Noriega
  • 167
  • 1
  • 1
  • 6
  • 3
    This approach will be quite prone to strange race conditions, I wouldn't recommend it... – Steven Schlansker Nov 28 '11 at 22:33
  • 1
    +1 This is easy and generic. If used with care this can actually be a simple solution. – Wouter van Nifterick Oct 23 '12 at 14:11
  • 1
    It will also not work when the existing data has been altered post-insert (but not on the duplicate key) and we want to keep the updates. This is the scenario when there's SQL scripts that are written for a number of slightly different systems, like db updates that run on production, QA, dev and test systems. – Hanno Fietz Nov 23 '12 at 13:35
  • 1
    Foreign key can be a no-problem if you create them with `DEFERRABLE INITIALLY DEFERRED` flags. – temoto Mar 22 '13 at 17:49
-1

For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:

DO
$do$
BEGIN
PERFORM id
FROM whatever_table;

IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;