There is a trivial problem:
- assign uniqueidentifier to any externalId
- do not overwrite the uniqueidentifier once it is assigned - just return existing uniqueidentifier
Imagine a table
ExternalId | Guid
--------------------------------
some1 | accf-0334-dfdf-....
Now, the twist is the scale. We want billions of externalIds to be mapped like this and we need to be able to assign these identifiers fast (thousands/sec)
We started of with a simple SQL Server table but it was not performing well. We moved the same schema to a Cassandra ColumnFamily - the writes are super fast and its sharded but: before writing we have to read (to make sure the externalId is not assigned already) so we hit the read seek I/O limit again.
Hashing (to determine uniqueidentifier) is unfortunately not possible as we already have hundreds of millions assigned. Caching is problematic because in most cases we are assigning a 'brand new externalId' so it wouldn't be in the database at all.
Does anybody have any suggestions for the solution here?