Ordered insertion at next unused index, generic SQL

Question

There have been various similar questions, but they either referred to a too specific DB or assumed unsorted data.

In my case, the SQL should be portable if possible. The index column in question is a clustered PK containing a timestamp.

The timestamp is 99% of the time larger than previously inserted value. On rare occasions however, it can be smaller, or collide with an existing value.

I'm currently using this code to insert new values:

IF NOT EXISTS (select * from Foo where Timestamp = @ts) BEGIN
    INSERT INTO Foo ([Timestamp]) VALUES (@ts);
END
ELSE BEGIN
    INSERT INTO Foo ([Timestamp]) VALUES (
    (SELECT Max (t1.Timestamp) - 1
    FROM Foo t1
    WHERE Timestamp < @ts
    AND NOT EXISTS (select * from Foo t2 where t2.Timestamp = t1.Timestamp - 1))
    );
END;

If the row is unused yet, just insert. Else, find the closest free row with a smaller value using an EXISTS check.

I am a novice when it comes to databases, so I'm not sure if there is a better way. I'm open for any ideas to make the code simpler and/or faster (around 100-1000 insertions per second), or to use a different approach altogether.

Edit Thank you for your comments ans answers so far.

To explain about the nature of my case: The timestamp is the only value ever used to sort the data, minor inconsistencies can be neglected. There are no FK relationships.

However, I agree that my approach is flawed, outweighing the reasons to use the presented idea in the first place. If I understand correctly, a simple way to fix the design is to have a regular, autoincremented PK column in combination with the known (and renamed) timestamp column, which will be clustered.

From a performance POV, I don't see how this could be worse than the initial approach. It also simplifies the code a lot.

I think you are on a seriosly wrong path. First of all, why do you need your timestamps be all different? If two things happen at the same time, their timestamps must be the same, too. Second, it's generally not recommended to use a real-world value for PK (clustered or not). Use an autoincrement ID and store your timestamp just as it is. (You may consider using an index on the timestamp field.) — Dercsár, Jan 25 '11 at 16:17
Are you sure that timestamp is the appropriate primary key for your data? If it can be a duplicate sometimes, it probably isn't. If it is, then the reason why may help to answer the question. Or: is this fundamentally an issue of identity (i.e. you use that difference in timestamp to tell the difference between two records - almost certainly a bad idea if they are sometimes duplicates) or an issue of efficient data retrieval? If the latter, choosing a different primary key and then changing the index to something more friendly to one-sided updates might work better. — dataduck, Jan 25 '11 at 16:20

score 4 · Answer 1 · answered Jan 25 '11 at 16:22

This method is a prescription for disaster. In the first place you will have race conditions which will cause user annoyance when their insert won't work. Even worse, if you are adding to another table using that value as the foreign key and the whole thing is not in one transaction, you may be adding child data to the wrong record.

Further, looking for the lowest unused value is a recipe for further data integrity messes if you have not properly set up foreign key relationships and deleted a record without getting all of it's child records. Now you just joined to records which don;t belong with the new record.

This manual method is flawed and unreliable. All the major databases have a way to create an autogenerated value. Use that instead, the problems have been worked out and tested.

Timestamp BTW is a SQL server reserved word and should never be used as a fieldname.

score 3 · Answer 2 · edited May 23 '17 at 11:47

One idea would be to add a surrogate identity/autonumber/sequence key, so the primary key becomes (timestamp, newkey).

This way, you preserve row order and uniqueness without code

To run the code above, you'd need to fiddle with lock granularity and concurrency hints in the code above, or TRY/CATCH to retry with the alternate value (SQL Server). This removes portability. However, under heavy load you'd have to keep retrying because the alternate value may already exist.

score 3 · Accepted Answer · answered Jan 25 '11 at 16:14

If you can't guaranteed that your PK values are unique, then it's not a good PK candidate. Especially if it's a timestamp - I'm sure Goldman Sachs would love it if their high-frequency trading programs could cause collisions on an insert and get inserted 1 microsecond earlier because the system fiddles the timestamp of their trade.

Since you can't guarantee uniqueness of the timestamps, a better choice would be to use a plain-jane auto-increment int/bigint column, which takes care of the collision problem, gives you a nice method of getting insertion order, and you can still sort on the timestamp field to get a nice straight timeline if need be.

score 0 · Answer 4 · answered Jan 25 '11 at 19:55

A Timestamp as a key? Really? Every time a row is updated, its timestamp is modified. The SQL Server timestamp data type is intended for use in versioning rows. It is not the same as the ANSI/ISO SQL timestamp — that is the equivalent of SQL Server's datetime data type.

As far as "sorting" on a timestamp column goes: the only thing that guaranteed with a timestamp is that every time a row is inserted or updated it gets a new timestamp value and that value is a unique 8-octet binary value, different from the previous value assigned to the row, if any. There is no guarantee that that value has any correlation to the system clock.

The field `Timestamp` is of type bigint. It is not related to the timestamp type. — mafu, Jan 26 '11 at 12:38

Ordered insertion at next unused index, generic SQL

4 Answers4