8

It's well known that Model.find_or_create_by(X) actually does:

  1. select by X
  2. if nothing found -> create by X
  3. return a record (found or created)

and there may be race condition between steps 1 and 2. To avoid a duplication of X in the database one should use an unique index on the set of fields of X. But if you apply an unique index then one of competing transactions would fail with exception (when trying to create a copy of X).

How can I implement 'a safe version' of #find_or_create_by which would never raise any exception and always work as expected?

Erwin Brandstetter
  • 479,275
  • 111
  • 893
  • 1,042
907th
  • 771
  • 5
  • 18
  • How about using find_or_create_by: http://guides.rubyonrails.org/active_record_querying.html#find-or-build-a-new-object ? – Sam Peacey Jan 15 '13 at 06:09
  • How it'll help to avoid problem with data duplication? – 907th Jan 15 '13 at 06:15
  • 1
    Well, unless I misunderstood the question, you can do something like `find_or_create_by_user_id_and_title(user.id, "My Uploads")` and this will return an existing record if there is one, or create a new one if there isn't. – Sam Peacey Jan 15 '13 at 06:27

3 Answers3

8

The answer is in the doc

Whether that is a problem or not depends on the logic of the application, but in the particular case in which rows have a UNIQUE constraint an exception may be raised, just retry:

begin
  CreditAccount.find_or_create_by(user_id: user.id)
rescue ActiveRecord::RecordNotUnique
  retry
end

Solution 1

You could implement the following in your model(s), or in a Concern if you need to stay DRY

def self.find_or_create_by(*)
  super
rescue ActiveRecord::RecordNotUnique
  retry
end

Usage: Model.find_or_create_by(X)


Solution 2

Or if you don't want to overwrite find_or_create_by, you can add the following to your model(s)

def self.safe_find_or_create_by(*args, &block)
  find_or_create_by *args, &block
rescue ActiveRecord::RecordNotUnique
  retry
end

Usage: Model.safe_find_or_create_by(X)

Benj
  • 12,503
  • 1
  • 34
  • 69
3

It's the recurring problem of "SELECT-or-INSERT", closely related to the popular UPSERT problem. The upcoming Postgres 9.5 supplies the new INSERT .. ON CONFLICT DO NOTHING | UPDATE to provide clean solutions for each.

Implementation for Postgres 9.4

For now, I suggest this bullet-proof implementation using two server-side plpgsql functions. Only the helper-function for the INSERT implements the more expensive error-trapping, and that's only called if the SELECT does not succeed.

This never raises an exception due to a unique violation and always returns a row.

Assumptions:

  • Assuming a table named tbl with a column x of data type text. Adapt to your case accordingly.

  • x is defined UNIQUE or PRIMARY KEY.

  • You want to return the whole row from the underlying table (return a record (found or created)).

  • In many cases the row is already there. (Does not have to be the majority of cases, SELECT is a lot cheaper than INSERT.) Else it may be more efficient to try the INSERT first.

Helper function:

CREATE OR REPLACE FUNCTION f_insert_x(_x text)
  RETURNS SETOF tbl AS
$func$
BEGIN
   RETURN QUERY
   INSERT INTO tbl(x) VALUES (_x) RETURNING *;

EXCEPTION WHEN UNIQUE_VIOLATION THEN  -- catch exception, no row is returned
   -- do nothing
END
$func$ LANGUAGE plpgsql;

Main function:

CREATE OR REPLACE FUNCTION f_x(_x text)
  RETURNS SETOF tbl AS
$func$
BEGIN
   LOOP
      RETURN QUERY
      SELECT * FROM tbl WHERE x = _x
      UNION  ALL
      SELECT * FROM f_insert_x(_x)  -- only executed if x not found
      LIMIT  1;

      EXIT WHEN FOUND;       -- else keep looping
   END LOOP;
END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_x('foo');

SQL Fiddle demo.

The function is based on what I have worked out in this related answer:

Detailed explanation and links there.

We could also create a generic function with polymorphic return type and dynamic SQL to work for any given column and table (but that's beyond the scope of this question):

Basics for UPSERT in this related answer by Craig Ringer:

Community
  • 1
  • 1
Erwin Brandstetter
  • 479,275
  • 111
  • 893
  • 1,042
1

There is a method called find_or_create_by in rails

This link will help you to understand it better

But personally I prefer to have a find first and if nothing found then create, (I think it has more control)

Ex:

user = User.find(params[:id])
#User.create(#attributes) unless user

HTH

sameera207
  • 16,117
  • 18
  • 81
  • 143
  • Suppose you have two processes doing find_or_create in parallel: you'll get two separate records in DB... The question is how to avoid such situation? – 907th Jan 15 '13 at 09:07
  • same approach you could use with some locking which you can pass to active record , according to your requirements. read more here http://guides.rubyonrails.org/active_record_querying.html#locking-records-for-update – sameera207 Jan 15 '13 at 09:42
  • Transactions was the first thing I think about. But seems like it's not the cure. It aims to lock existing records, not ones that just created in this or other transaction. Later I found solution with REPLACE statement for MySQL and INSERT trigger for PostgreSQL. I'd like to know is there other solutions – 907th Jan 15 '13 at 12:18