Jelle Bens

Guid vs Int

Rather interesting article on when tho use a guid or an int as a Primary Key:

http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

Summary:

IDENTITY

PROS

small storage footprint;
optimal join / index performance (e.g. for time range queries, most rows recently inserted will be on a limited number of pages);
highly useful for data warehousing;
native data type of the OS, and easy to work with in all languages;
easy to debug (where userid = 457);
generated automatically (retrieved through SCOPE_IDENTITY() rather than assigned);
not updateable (though some consider this a disadvantage, strangely enough).

CONS

cannot be reliably "predicted" by applications â€” can only be retrieved after the INSERT;
need a complex scheme in multi-server environments, since IDENTITY is not allowed in some forms of replication;

can be duplicated, if not explicitly set to PRIMARY KEY:

CREATE TABLE splunge(
splungeID INT IDENTITY(1,1),
foo CHAR(1)
)
GO
SET NOCOUNT ON
INSERT splunge(foo) VALUES('a')
INSERT splunge(foo) VALUES('b')
INSERT splunge(foo) VALUES('c')
GO
DBCC CHECKIDENT('splunge', RESEED, 2)
GO
INSERT splunge(foo) VALUES('d')
GO
SELECT * FROM splunge
GO
DROP TABLE splunge
GO

if part of the clustered index on the table, this can create an insert hot-spot;
proprietary and not directly portable;
only unique within a single table;
gaps can occur (e.g. with a rolled back transaction), and this can cause chicken little-style alarms.

GUID()

PROS

since they are {more or less} guaranteed to be unique, multiple tables/databases/instances/servers/networks/data centers can generate them independently, then merged without clashes;
required for some forms of replication;
can be generated outside the database (e.g. by an application);
distributed values prevent hot-spot (as long as you don't cluster this column, which can lead to abnormally high fragmentation).

CONS

the wider datatype leads to a drop in index performance (if clustered, each insert almost guaranteed to 'dirty' a different page), and an increase in storage requirements;
cumbersome to debug (where userid = {BAE7DF4-DDF-3RG-5TY3E3RF456AS10});
updateable (need to propogate changes, or prevent the activity altogether);
sensitive to time rollbacks in certain environments (e.g. daylight savings time rollbacks);
GROUP BY and other set operations often require CAST/CONVERT;
not all languages and environments directly support GUIDs;
there is no statement like SCOPE_GUID() to determine the value that was generated, e.g. by NEWID();
there are display issues to consider, such as response.write issues (see Article #2358), and also consider cases where you display a large amount of rows as checkboxes or other form elements with IDs - your page just got a whole lot bigger, and you have to handle the IDs in a custom way because JavaScript can't deal with objects that have dashes in their name/id.

Jelle Bens

Tuesday, April 03, 2007

No comments:

Post a Comment