12

I want to do the SELECT / INSERT version of an UPSERT. Below is a template of the existing code:

// CREATE TABLE Table (RowID INT NOT NULL IDENTITY(1,1), RowValue VARCHAR(50))

IF NOT EXISTS (SELECT * FROM Table WHERE RowValue = @VALUE)
BEGIN
   INSERT Table VALUES (@Value)
   SELECT @id = SCOPEIDENTITY()
END
ELSE
   SELECT @id = RowID FROM Table WHERE RowValue = @VALUE)

The query will be called from many concurrent sessions. My performance tests show that it will consistently throw primary key violations under a specific load.

Is there a high-concurrency method for this query that will allow it to maintain performance while still avoiding the insertion of data that already exists?

8kb
  • 10,143
  • 7
  • 37
  • 49
  • 1
    This is similar to: http://stackoverflow.com/questions/13540/insert-update-stored-proc-on-sql-server/193876#193876 – Sam Saffron Aug 29 '10 at 07:15
  • I agree it's similar, but I would say the difference is that there is no need for an update, just an insert or select statement. In your answer, you use SERIALIZABLE as a joint hint. Would that be your recommendation for the query above on the SELECT statement? – 8kb Aug 29 '10 at 07:20
  • Yerp an insert with the serializable hint, potentially in a transaction should do the trick. I can try to write an answer but this iPad makes me sound brash :( – Sam Saffron Aug 29 '10 at 07:25
  • Will putting the query into a transaction with a serializable hint hurt performance? There is other processing that has to happen after the query using the @id value. – 8kb Aug 29 '10 at 07:29
  • 3
    Mostly inserts, or mostly updates? – gbn Aug 29 '10 at 08:37
  • 2
    @gbn ... I think you mean mostly inserts or selects ... which is key to being able to optimise this .. if insert is rare, then you can start with a select, and then do a transactionally sound insert if you miss followed by a select just in case the insert did not insert. The way to optimise is highly dependent on the access pattern. Im not sure if MERGE is what you want here. – Sam Saffron Aug 29 '10 at 09:29
  • @Sam Saffron: oops, yes I do. I meant "mostly inserts or mostly selects" – gbn Aug 29 '10 at 09:56

3 Answers3

16

You can use LOCKs to make things SERIALIZABLE but this reduces concurrency. Why not try the common condition first ("mostly insert or mostly select") followed by safe handling of "remedial" action? That is, the "JFDI" pattern...

Mostly INSERTs expected (ball park 70-80%+):

Just try to insert. If it fails, the row has already been created. No need to worry about concurrency because the TRY/CATCH deals with duplicates for you.

BEGIN TRY
   INSERT Table VALUES (@Value)
   SELECT @id = SCOPE_IDENTITY()
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
    ELSE -- only error was a dupe insert so must already have a row to select
      SELECT @id = RowID FROM Table WHERE RowValue = @VALUE
END CATCH

Mostly SELECTs:

Similar, but try to get data first. No data = INSERT needed. Again, if 2 concurrent calls try to INSERT because they both found the row missing the TRY/CATCH handles.

BEGIN TRY
   SELECT @id = RowID FROM Table WHERE RowValue = @VALUE
   IF @@ROWCOUNT = 0
   BEGIN
       INSERT Table VALUES (@Value)
       SELECT @id = SCOPE_IDENTITY()
   END
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
    ELSE
      SELECT @id = RowID FROM Table WHERE RowValue = @VALUE
END CATCH

The 2nd one appear to repeat itself, but it's highly concurrent. Locks would achieve the same but at the expense of concurrency...

Edit:

Why not to use MERGE...

If you use the OUTPUT clause it will only return what is updated. So you need a dummy UPDATE to generate the INSERTED table for the OUTPUT clause. If you have to do dummy updates with many calls (as implied by OP) that is a lot of log writes just to be able to use MERGE.

pim
  • 10,145
  • 4
  • 59
  • 61
gbn
  • 394,550
  • 75
  • 549
  • 647
  • @gbn - why would you use your 2nd suggestion (for high concurrency) instead of the MERGE command (assuming the @8kb is using sql 2008+) ? – Pure.Krome Aug 29 '10 at 11:30
  • @Pure.Krome: because it's insert/select, not insert/update. YOu'll end up with a dummy UPDATE to use OUTPUT. Great. – gbn Aug 29 '10 at 11:38
  • @gbn - so we cannot check the SCOPE_IDENTITY to grab the _last_ / _most recent_ identity that was inserted .. assuming an insert happened? if it didn't, then u can use @@Rowcount to check that the update worked... ?? – Pure.Krome Aug 29 '10 at 11:57
  • @Pure.Krome: it's *not* an UPDATE – gbn Aug 29 '10 at 12:03
  • @gbn - becuase merge does update then insert, you're saying that, because the posted said ONLY insert, then merge shouldn't be used .. because it will never update and SHOULDN'T update? (i think i've gotcha now) ... – Pure.Krome Aug 29 '10 at 12:15
  • @Pure.Krome: yes. MERGE is not appropriate in this situation – gbn Aug 29 '10 at 12:26
  • @gbn - yes, there are mainly selects. And my problem with MERGE was trying to figure how to determine if the record was inserted or not (and still do the select). But this solution solves the problem. Thanks! – 8kb Aug 29 '10 at 16:47
  • Some interesting and scary things about MERGE https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ – systempuntoout Jul 08 '15 at 09:44
1
// CREATE TABLE Table (RowID INT NOT NULL IDENTITY(1,1), RowValue VARCHAR(50))

-- be sure to have a non-clustered unique index on RowValue and RowID as your clustered index.

IF EXISTS (SELECT * FROM Table WHERE RowValue = @VALUE)
   SELECT @id = RowID FROM Table WHERE RowValue = @VALUE
ELSE BEGIN
   INSERT Table VALUES (@Value)
   SELECT @id = SCOPEIDENTITY()
END
Mehul Mistri
  • 14,844
  • 14
  • 66
  • 94
dpostman
  • 11
  • 1
0

As always, gbn's answer is correct and ultimately lead me to where I needed to be. However, I found a particular edge case that wasn't covered by his approach. That being a 2601 error which identifies a Unique Index Violation.

To compensate for this, I've modified his code as follow

...
declare @errornumber int = ERROR_NUMBER()
if @errornumber <> 2627 and @errornumber <> 2601
...

Hopefully this helps someone!

pim
  • 10,145
  • 4
  • 59
  • 61