6

I have a parent table and child table where the columns that join them together are the UNIQUEIDENTIFIER type.

The child table has a clustered index on the column that joins it to the parent table (its PK, which is also clustered).

I have created a copy of both of these tables but changed the relationship columns to be INTs instead, have rebuilt the indexes so that they are essentially the same structure and can be queried in the same way.

When I query for a known 20 records from the parent table, pulling in all the related records from the child tables, I get identical query costs across both, i.e. 50/50 cost for the batches.

If this is true, then my giant project to change all of the tables like this appears to be pointless, other than speeding up inserts. Can anyone provide any light on the situation?


EDIT:

The question is not about which is more efficient, but why is the query execution plan showing both queries as having the same cost?

cjk
  • 43,338
  • 9
  • 74
  • 109
  • Please see http://www.google.com/search?q=site%3Astackoverflow.com+int+vs+UNIQUEIDENTIFIER – Dirk Vollmar Mar 19 '10 at 09:45
  • Related: http://stackoverflow.com/questions/1151625/int-vs-unique-identifier-for-id-field-in-database, http://stackoverflow.com/questions/504905/is-it-better-to-use-an-uniqueidentifierguid-or-a-bigint-for-an-identity-column, http://stackoverflow.com/questions/1080817/sql-server-guid-from-active-directory-vs-int – ЯegDwight Mar 19 '10 at 09:51
  • @divo - a lot of those links say that yes, int is more efficient than GUID, (as do the current anwwers) but my question is why am I not seeing that in a fairly straightforward example? – cjk Mar 19 '10 at 09:52
  • Because it makes no difference? I mean, seriously - how muc hdata do you use for testing? Small developer db - you will not be able to measure it. 100 million rows? guess that is visible. Do you have a good test case? – TomTom Mar 19 '10 at 10:08
  • @ck: definitely you could see difference. You need to go for load testing. – Saar Mar 19 '10 at 10:08

3 Answers3

4

Much more efficient.

Int is much smaller. This means you get much smaller indices, which means you get much better memory use and load time for index access. It depends a lot, though, on how large your tables are and what you do with them.

ЯegDwight
  • 23,615
  • 10
  • 43
  • 51
TomTom
  • 1
  • 9
  • 78
  • 143
  • Doesn't really answer the question, I want to know why the execution doesn't support this theory. – cjk Mar 19 '10 at 09:53
  • 2
    It does. You dont really describe any execution. "When I query for a known 20 records from the parent table". That is ridiculous slmall - 20 records, what you expect to see? How large is the parent table? – TomTom Mar 19 '10 at 10:09
  • fair enough, I'll try and bump up the volumes. I was jsut hoping the execution plan would show that the theory of using ints would mean less work, even with smaller volumes. Thanks for the info. – cjk Mar 19 '10 at 13:57
4

Seek-in a key in a clustered index is basically the same on a 4 bytes key, a 16 bytes key, or 160 bytes key. The cost of comparing the slots with the predicate is just noise in the overall cost of query (execution preparation, preparing execution context, opening the rowsets, locating the pages etc), even when no IO is involved.

While no one will argue that GUIDs and INT are on equal footing, comparing just 20 seeks will not reveal the differences. One thing you can measure immediately is space: a saving of 12 bytes per row and per non-leaf page on clustered index, plus 12 bytes on every leaf page on non-clustered indexes will add up over millions of rows and tens of tables and indexes. Less space means less IO, better memory cache performance, better goodness overall, and that can be measured, but you need to measure real loads, not a puny 20 rows seek.

Under lab conditions you will be able to measure the difference in raw speed between seeking an INT or a GUID, but that shouldn't be your focus. The argument of INT vs. GUID is not drivan by something like 5% performance gain in a seek, is driven by space savings and by guid randomness leading to fragmentation, both very easy to measure metrics that make a solid case for INT on their own grounds, no need to bring in the seek performance argument.

Remus Rusanu
  • 273,340
  • 38
  • 408
  • 539
  • Thanks, that covers a lot of useful information. I've been able to estimate the space savings, but in putting together a business case I need to show how the performance will improve, as disk and even memory are relatively cheap. I was hoping to show the execution of one query against the other would indicate x% improvement. – cjk Mar 22 '10 at 08:33
1

On top of what Remus said, using GUID for clustered indexes is going to lead to tremendous fragmentation of them in most cases, affecting the performance of the queries in terms of IO. This happens when you don't use sequentially generated guids which I suppose is mostly the case when an application generates guid outside of database. To create sequential guid ('bigger' than the previously generated in database) you have to use function newsequentialid()

Comparison of cost of two plans in one batch is not accurate in all cases. The cost is estimated amongst others on number of IO operations needed to execute the query. In small databases, difference between INT and GUID will not change IO significantly enough to show the difference in execution plans.

Piotr Rodak
  • 1,555
  • 1
  • 9
  • 7