Handling race conditions in PostgreSQL

Question

I have several workers, each holding its own connection to PostgreSQL. The workers manipulate with different tables.

The workers handle parallel requests from outside the system. One of the tables being accessed is the table of users. When some information comes, I first need to ensure there is a record for the user in the table. If there is no record, I wish to create one at first.

I'm using the following idiom:

if [user does not exist] then [create user]

The code of [user does not exist] is:

SELECT id FROM myschema.users WHERE userId='xyz'

and I test whether any row is returned.

The (simplified) code of [create user] is:

INSERT INTO myschema.users VALUES ('xyz')

When my system handles parallel streams of different information concerning the same user, I often get PostgreSQL error:

Key (id)=(xyz) already exists

It happens because the SELECT command returns no rows, then another worker creates the user, any my worker attempts to do the same, resulting in exemplary concurrency error.

According to PostgreSQL documentation, by default, whenever I implicitly start a transaction, the table becomes locked for as long as I don't commit it. I'm not using autocommit and I only commit the transaction in blocks, e.g. after the whole if-else block.

Indeed, I could put the if-else stuff into SQL directly, but it does not solve my problem of locking in general. I was supposing that "the winner takes it all" paradigm will work, and that the first worker which manages to execute the SELECT command will own the locks until it calls COMMIT.

I've read many different topics here at SO, but I'm still not sure what the right solution is. Should I use explicit locking of tables, because the implicit locking does not work? How can I ensure that only single worker owns a table at time?

I'm pretty sure this is not the right solution, but the following approach worked for us: `User.transaction { User.update_all({userId: user_id}, {userId: user_id}); User.create!(userId: 'xyz') unless User.exists?(userId: 'xyz') }`. The first "fake" update command locks the row (if it exists), the next one creates a new row (unless it exists). We also set some custom transaction isolation level as far as I remember, and we used mysql, not postgresql. That's all I remember. — DNNX, Jan 30 '14 at 18:27
Take a look at http://stackoverflow.com/questions/17267417/how-do-i-do-an-upsert-merge-insert-on-duplicate-update-in-postgresql and its links. — Craig Ringer, Jan 31 '14 at 00:40

maja · Accepted Answer · 2014-01-30T18:34:51.333

13

You have to care about the transaction isolation level. It should be set to "SERIALIZABLE".

The reason are Phantom Reads - The transaction doesn't lock the whole table, but only the rows which have already been read by the transaction.

So, if another transaction inserts new data, they haven't been locked yet, and the error appears.

Serializable avoids this, by blocking all other transactions, until this one finished.

You can do this via

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

The documentations: http://www.postgresql.org/docs/9.1/static/transaction-iso.html

If you want to learn more about this topic, I can highly recommend you this video: http://www.youtube.com/watch?v=zz-Xbqp0g0A

edited Jan 30 '14 at 18:34

answered Jan 30 '14 at 18:28

maja

14,242
14
72
106

1

While `SERIALIZABLE` transaction isolation is a clean solution, it is also rather *expensive*. And you need to be prepared for serialization failures and retry in this case. Here is are cheaper (and also clean) alternative for the "INSERT or SELECT" problem: http://stackoverflow.com/questions/15939902/is-select-or-insert-in-a-function-prone-to-race-conditions/15950324#15950324 – Erwin Brandstetter Jan 20 '15 at 11:41

score 9 · Answer 2 · answered Jan 31 '14 at 01:03

Actually, after some messing with ISOLATION LEVEL SERIALIZABLE as proposed by @maja, I've discovered much simpler mechanism:

PERFORM pg_advisory_lock(id);
...
# do something that others must wait for
...
PERFORM pg_advisory_unlock(id);

where id is a BIGINT value which I may choose arbitrarily according to my application's logic.

This gave me both the power and the flexibility I was looking for.

Handling race conditions in PostgreSQL

2 Answers2

Linked