10

For a table like this one:

CREATE TABLE Users(
    id SERIAL PRIMARY KEY,
    name TEXT UNIQUE
);

What would be the correct one-query insert for the following operation:

Given a user name, insert a new record and return the new id. But if the name already exists, just return the id.

I am aware of the new syntax within PostgreSQL 9.5 for ON CONFLICT(column) DO UPDATE/NOTHING, but I can't figure out how, if at all, it can help, given that I need the id to be returned.

It seems that RETURNING id and ON CONFLICT do not belong together.

vitaly-t
  • 20,421
  • 5
  • 85
  • 117
  • This logic really should be in the database? – Krismorte Mar 18 '16 at 12:13
  • @ErwinBrandstetter this seems more complicated than I thought. I was trying to figure out whether to go for a one-query solution within a Node.js application, or just to stick with `SELECT id FROM users WHERE name='text'`, check if found, and if not then do `INSERT...`, which is 2 queries, but now it seems way simpler than the one-query option. I didn't expect it to be this awkward. I was hoping the new `ON CONFLICT` would be of help, but alas, it ain't. Thanks for the links! – vitaly-t Mar 18 '16 at 12:16
  • The matter is hugely complex as soon as concurrent write access comes into play. I'll add another answer. – Erwin Brandstetter Mar 18 '16 at 16:46
  • @ErwinBrandstetter it would seem that the simplest solution is to use a transaction and do two separate queries - select+insert. That's what I'm using at the moment. – vitaly-t Mar 18 '16 at 16:51
  • I think I found a better solution. – Erwin Brandstetter Mar 18 '16 at 17:35
  • Does this answer your question? [SELECT or INSERT a row in one command](https://stackoverflow.com/questions/6722344/select-or-insert-a-row-in-one-command) – Bergi Dec 03 '19 at 21:23

2 Answers2

10

The UPSERT implementation is hugely complex to be safe against concurrent write access. Take a look at this Postgres Wiki that served as log during initial development. The Postgres hackers decided not to include "excluded" rows in the RETURNING clause for the first release in Postgres 9.5. They might build something in for the next release.

This is the crucial statement in the manual to explain your situation:

The syntax of the RETURNING list is identical to that of the output list of SELECT. Only rows that were successfully inserted or updated will be returned. For example, if a row was locked but not updated because an ON CONFLICT DO UPDATE ... WHERE clause condition was not satisfied, the row will not be returned.

Bold emphasis mine.

For a single row to insert:

Without concurrent write load on the same table

WITH ins AS (
   INSERT INTO users(name)
   VALUES ('new_usr_name')         -- input value
   ON     CONFLICT(name) DO NOTHING
   RETURNING users.id
   )
SELECT id FROM ins
UNION  ALL
SELECT id FROM users          -- 2nd SELECT never executed if INSERT successful
WHERE  name = 'new_usr_name'  -- input value a 2nd time
LIMIT  1;

With possible concurrent write load on the table

Consider this instead (for single row INSERT):

To insert a set of rows:

All three with very detailed explanation.

Erwin Brandstetter
  • 479,275
  • 111
  • 893
  • 1,042
  • Do you see any problem in the earlier answer given by Clodoaldo Neto? What I like about it, it doesn't require PostgreSQL 9.5 to work. – vitaly-t Mar 18 '16 at 18:05
  • @vitaly-t: It's a solid statement, but does not cover the mentioned race condition. It's also probably more expensive. You might add the `LIMIT 1` from my statement for performance. – Erwin Brandstetter Mar 18 '16 at 18:14
  • Thank you for your answer. My previous research was formulated here: https://github.com/vitaly-t/pg-promise/blob/master/examples/select-insert.md, and I hoped that I could improve it with a single-query alternative. But now given all the complexity of it, I'm not really sure if it can be considered an improvement. But at least it is good to know that it is possible. – vitaly-t Mar 18 '16 at 18:19
  • 1
    @vitaly-t: You might add `FOR SHARE` or `FOR KEY SHARE` to the `EXISTS` subselect in Clodoaldo's answer to cover the race condition in older versions. Or compare to `ON CONFLICT ... DO NOTHING` to compare apples to apples. Also, looking at your implementation on github, I would consider the [server-side function detailed in my linked answer](http://stackoverflow.com/a/15950324/939860) which is superior to multiple round trips to the db server in your approach. – Erwin Brandstetter Mar 18 '16 at 18:34
  • 1
    I would post it as a comment but it would be too long so I reformed my answer. – Clodoaldo Neto Mar 18 '16 at 19:30
  • @vitaly-t: Note the updates. A possible corner case wasn't covered. – Erwin Brandstetter May 07 '20 at 14:05
3

For a single row insert and no update:

with i as (
    insert into users (name)
    select 'the name'
    where not exists (
        select 1
        from users
        where name = 'the name'
    )
    returning id
)
select id
from users
where name = 'the name'

union all

select id from i

The manual about the primary and the with subqueries parts:

The primary query and the WITH queries are all (notionally) executed at the same time

Although that sounds to me "same snapshot" I'm not sure since I don't know what notionally means in that context.

But there is also:

The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot

If I understand correctly that same snapshot bit prevents a race condition. But again I'm not sure if by all the statements it refers only to the statements in the with subqueries excluding the main query. To avoid any doubt move the select in the previous query to a with subquery:

with s as (
    select id
    from users
    where name = 'the name'
), i as (
    insert into users (name)
    select 'the name'
    where not exists (select 1 from s)
    returning id
)
select id from s
union all
select id from i
Clodoaldo Neto
  • 98,807
  • 21
  • 191
  • 235
  • The fact that CTEs are based on the same snapshot cannot prevent race conditions. One example: Imagine two transactions starting at virtually the same moment. Both find that 'the name' does not exist yet and try to `INSERT`. You get a unique violation. There are good reasons for the new `INSERT ... ON CONFLICT ...` – Erwin Brandstetter Mar 19 '16 at 01:53
  • @ErwinBrandstetter I'm not sure what you describe is accurate when you use the word `transaction`, which implies the use of `begin` and `commit`. In such a case the second transaction would be locked till the first one has finished, by the very definition of transactions. – vitaly-t Mar 19 '16 at 03:25
  • @Erwin: Now I get where your mindset is at. As I see it those two transactions are just that: transactions. One will fail with a proper unique violation message to be catch by and dealt with by the application. It is not like a misleading result was produced. I think that trying to avoid the second transaction from failing is over engineering. Not a significant gain compared to the added complexity. – Clodoaldo Neto Mar 19 '16 at 11:41
  • @Clodoaldo: That's what UPSERT (or INSERT or SELECT) is about: to avoid the error and get the desired result in one step. And the case I mentioned is just one of several possible complications. – Erwin Brandstetter Mar 19 '16 at 17:26
  • 1
    @vitaly-t: In Postgres, *every* DML statement runs inside a transaction. Most clients run in autocommit mode. `BEGIN` and `COMMIT` are optional to wrap multiple statements in one transaction. Transactions are not locked. Resources are locked - which can block transactions. Locks are acquired along the way and released at the end of a transaction. Concurrent transactions can run the same `SELECT` and find the same row "missing". Locks on missing rows are impossible in Postgres. More than one might try to `INSERT` the same key and only one would succeed with a unique index. – Erwin Brandstetter Mar 19 '16 at 17:37
  • 1
    @ErwinBrandstetter thank you, I can see now that use of transactions in this case doesn't help at all, and so I corrected my example here: https://github.com/vitaly-t/pg-promise/blob/master/examples/select-insert.md – vitaly-t Mar 19 '16 at 18:47