I have several workers, each holding its own connection to PostgreSQL. The workers manipulate with different tables.
The workers handle parallel requests from outside the system. One of the tables being accessed is the table of users. When some information comes, I first need to ensure there is a record for the user in the table. If there is no record, I wish to create one at first.
I'm using the following idiom:
if [user does not exist] then [create user]
The code of [user does not exist]
is:
SELECT id FROM myschema.users WHERE userId='xyz'
and I test whether any row is returned.
The (simplified) code of [create user]
is:
INSERT INTO myschema.users VALUES ('xyz')
When my system handles parallel streams of different information concerning the same user, I often get PostgreSQL error:
Key (id)=(xyz) already exists
It happens because the SELECT
command returns no rows, then another worker creates the user, any my worker attempts to do the same, resulting in exemplary concurrency error.
According to PostgreSQL documentation, by default, whenever I implicitly start a transaction, the table becomes locked for as long as I don't commit it. I'm not using autocommit and I only commit the transaction in blocks, e.g. after the whole if-else
block.
Indeed, I could put the if-else
stuff into SQL directly, but it does not solve my problem of locking in general. I was supposing that "the winner takes it all" paradigm will work, and that the first worker which manages to execute the SELECT
command will own the locks until it calls COMMIT
.
I've read many different topics here at SO, but I'm still not sure what the right solution is. Should I use explicit locking of tables, because the implicit locking does not work? How can I ensure that only single worker owns a table at time?