0

If I needed to unqiuely identify rows in a T-SQL database table I would set them as:

[id] INT NOT NULL IDENTITY(1,1) PRIMARY KEY

But if I need ID to be unique among several databases (on different computers) I go with:

[uid] UNIQUEIDENTIFIER NOT NULL

and then set an index on that column:

CREATE INDEX index_name ON [TableName] ([uid]);

So I was wondering how much of a performance hit would the database take by changing INT to UNIQUEIDENTIFIER?

ahmd0
  • 14,832
  • 29
  • 117
  • 218
  • 3
    Don't do it. Not only will your indexes be 10 times as wide as they should be, think about all the space you're wasting in related tables pointing back to the GUID. Please read: http://sqlskills.com/BLOGS/PAUL/post/Clustered-or-nonclustered-index-on-a-random-GUID.aspx / http://sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx – Aaron Bertrand Feb 24 '12 at 00:41
  • No, I won't set it as a primary key, that would still be an INT with autoincrement. The GUID field will be only an index. Is that still bad? (Again I care most about performance and then size.) – ahmd0 Feb 24 '12 at 00:44
  • 1
    Yes, it is. How many bytes is that, compared to INT (4) or BIGINT (8)? Also how are you generating the GUID values? Your indexes will be heavily fragmented unless you use `NEWSEQUENTIALID` and that can still be very heavy on page splits. It wastes a lot of space and your joins are going to suffer as well. Forget that the title of those blog posts mention "primary key" and read them in their entirety anyway (as well as comments). I can find you many more blog posts with similar evidence if you like... – Aaron Bertrand Feb 24 '12 at 00:46
  • They are generated in ASP.NET logic and inserted into the DB. No built-in SQL stuff. – ahmd0 Feb 24 '12 at 00:49
  • So they definitely won't be sequential in any way. This is even less desirable than `NEWSEQUENTIALID`. – Aaron Bertrand Feb 24 '12 at 00:50
  • So aside from the size consideration, you're saying that I will take a big performance hit with GUIDs, right? – ahmd0 Feb 24 '12 at 00:52
  • 2
    I can guarantee it. Well, unless you are only saving the data to the database once and never querying, repeating the data in related tables, etc... – Aaron Bertrand Feb 24 '12 at 01:03
  • The only way to know for sure is to test it. Fill the table as you normally would with both methods and then do typical selects against the table in both scenarios. Measure the impact by doing a set statistics io on; and set statistics time on. I can't say by way of percentage, but my money is on the guids being slower. It may be acceptably slower for you though. – Ben Thul Feb 24 '12 at 02:07
  • @AaronBertrand I don't understand 10 times as wide. A GUID is 2 to the power 128 and and Int 32 is 2 to the power 32. Is that not 4 times as wide? – paparazzo Feb 24 '12 at 02:38
  • @BalamBalam sorry, apparently my sarcasm/exaggeration wasn't obvious. I guess I should have stuck with "a lot wider." – Aaron Bertrand Feb 24 '12 at 03:28
  • @Ben Thul Thanks, but testing might be an expensive proposition as well. My question here was actually for those people who tried it out (and not just up for someone's ideological speculation or sarcasm.) I'll post a result if I ever make a comprehensive test ... – ahmd0 Feb 24 '12 at 05:03
  • The only sarcasm was about 10x. Sorry I should have said 4x but does that really mean you should write off *all* of the information I've tried to provide you with here as speculation? Very gracious of you. – Aaron Bertrand Feb 24 '12 at 06:08
  • @BenThul You are not going to find people that did it and tested because we don't need data to know it is a big hit. Random data as a PK (or any index) will cause fragmentation. Fragmentation is bad - it slow inserts and searches. Fragmentation happens faster than you would guess. – paparazzo Feb 24 '12 at 14:52
  • @BalamBalam: My point was whether the slowdown was acceptable for his scenario. The only way to know that is to put numbers to it. Also, the way you stated your comment implies that you're assuming that the table will be clustered by the GUID. The OP has said that this isn't the case, so only the non-clustered index on the GUID will get fragmented. – Ben Thul Feb 24 '12 at 22:51
  • @BenThul No my comment does not imply PK - random data on ANY index will cause fragmentation. The question does state changing Int to UNIQUEIDENTIFIER. You imply to the OP can simple simply fill the table and measure the impact in a single test. It is pretty clear from the discussion this is an active table and not a load then use. A single load comparison is not a valid measure of production use. Put a months load and it and then you are starting to get some real numbers. – paparazzo Feb 24 '12 at 23:55
  • Guys, I appreciate you taking time to give your input on this. Aaron Bertrand, I didn't mean to come off too harsh on you. What amazes me though, is what is the purpose of adding a new data type (UNIQUEIDENTIFIER) when it has some a deplorable performance? – ahmd0 Feb 25 '12 at 00:47
  • If you need globally unique identifier it serves a purpose but as a key it is a costly key. That is why iden and NEWSEQUENTIALID are available to you. You will not find any DB vendor recommending GUID as you first choice for a PK. – paparazzo Feb 25 '12 at 19:14
  • 1
    @BalamBalam: "Testing" implies throwing typical load at it. Again, the mere existence of fragmentation doesn't mean that it won't work for the OP's situation. Zealotry for or against a particular solution doesn't get you anything. There's a trade-off in everything. – Ben Thul Feb 25 '12 at 19:40
  • There are excellent answers to this question here: http://stackoverflow.com/a/9727426/441729 and here: http://stackoverflow.com/a/1151733/441729 – grahamesd Mar 19 '15 at 17:53

1 Answers1

5

GUID is not the only option for surrogate values to be unique across databases. Two options come to mind immediately:

  1. Create your own hashing. Use a BIGINT and set the identity seed on each database, starting with, say, 1000000000 on database A, 2000000000 on database B, etc. This supports many, many databases.
  2. Have a central sequence generator that just bumps up a central identity column and gives the new id back to be used in the insert. Scale and concurrency might be more of a factor here, but only at the high end I'd suspect.

In SQL Server 2012, you will be able to implement (2) much easier using SEQUENCE.

Aaron Bertrand
  • 242,666
  • 35
  • 420
  • 451
  • Thanks. The reason I chose GUID is because of the ASP.NET logic that is already written for them. It would very "Expensive" to rewrite asp.net part. So if I go with BIGINTs instead I will need 2 to account for a single GUID. How would I do that? – ahmd0 Feb 24 '12 at 00:51
  • 2
    Why would you need two bigints? As for expensive, why didn't you ask this question before investing in guids? :-) – Aaron Bertrand Feb 24 '12 at 01:03