Q: I’ve learned that my clustering key (i.e., the columns on which I defined my clustered index) should be unique, narrow, static, and ever-increasing. However, my clustering key is on a GUID. Although a GUID is unique, static, and relatively narrow, I’d like to change my clustering key, and therefore change my clustered index definition. How can I change the definition of a clustered index?
A: This question is much more complex than it seems, and the process you follow is going to depend on whether the clustered index is enforcing a primary key constraint. In SQL Server 2000, the DROP_EXISTING clause was added to let you change the definition of the clustered index without causing all of the nonclustered indexes to be rebuilt twice. The first rebuild is because when you drop a clustered index, the table reverts to being a heap, so all of the lookup references in the nonclustered indexes must be changed from the clustering key to the row identifier (RID), as I described in the answer to the previous question. The second nonclustered index rebuild is because when you build the clustered index again, all nonclustered indexes must use the new clustering key.
To reduce this obvious churn on the nonclustered indexes (along with the associated table locking and transaction log generation), SQL Server 2000 included the DROP_EXISTING clause so that the clustering key could be changed and the nonclustered indexes would need to be rebuilt only once (to use the new clustering key).
However, the bad news is that the DROP_EXISTING clause can be used to change only indexes that aren’t enforcing a primary key or unique key constraint (i.e., only indexes created using a CREATE INDEX statement). And, in many cases, when GUIDs are used as the primary key, the primary key constraint definition might have been created without specifying the index type. When the index type isn’t specified, SQL Server defaults to creating a clustered index to enforce the primary key. You can choose to enforce the primary key with a nonclustered index by explicitly stating the index type at definition, but the default index type is a clustered index if one doesn’t already exist. (Note that if a clustered index already exists and the index type isn’t specified, SQL Server will still allow the primary key to be created; it will be enforced using a nonclustered index.)
Clustering on a key such as a GUID can result in a lot of fragmentation. However, the level of fragmentation also depends on how the GUIDs are being generated. Often, GUIDs are generated at the client or using a function (either the newid() function or the newsequentialid() function) at the server. Using the client or the newid() function to generate GUIDs creates random inserts in the structure that’s now ordered by these GUIDs—because it’s the clustering key. As a result of the performance problems caused by the fragmentation, you might want to change your clustering key or even just change the function (if it’s server side). If the GUID is being generated using a DEFAULT constraint, then you might have the option to change the function behind the constraint from the newid() function to the newsequentialid() function. Although the newsequentialid() function doesn’t guarantee perfect contiguity or a gap-free sequence, it generally creates values greater than any prior generated. (Note that there are cases when the base value that’s used is regenerated. For example, if the server is restarted, a new starting value, which might be lower than the current value, will be generated.) Even with these exceptions, the fragmentation within this clustered index will be drastically reduced.
So, if you still want to change the definition of the clustered index and the clustered index is being used to enforce your table’s primary key, it’s not going to be a simple process. And, this process should be done when users aren’t allowed to connect the database, otherwise data integrity problems can occur. Additionally, if you’re changing the clustering key to use a different column(s), then you’ll also need to remember to recreate your primary key to be enforced by a nonclustered index instead. Here’s the process to follow to change the definition of a clustered index:
Listing 1: Code to Generate the ALTER INDEX Statements
The ROLLBACK AFTER n clause at the end of the ALTER DATABASE statement lets you terminate user connections and put the database into a restricted state for modifications. As for automating the disabling of foreign key constraints, I leveraged some of the code from sp_fkeys and significantly altered it to generate the DISABLE command (similarly to how we did this in step 1 for disabling nonclustered indexes), which Listing 2 shows.
Listing 2: Code to Generate the DISABLE Command
Use the column for DISABLE_STATEMENTS to disable the foreign key constraints, and keep the remaining information handy because you’ll need it to reenable and recheck the data, as well as verify the foreign key constraints after you’ve recreated the primary key as a unique nonclustered index.
Although this sounds like a complicated process, you can analyze it, review it, and script much of the code to minimize errors. The end result is that no matter what your clustering key is, it can be changed. Why you might want to change the clustering key is a whole other can of worms that I don’t have space to go into in this answer, but keep following the Kimberly & Paul: SQL Server Questions Answered blog and I’ll open that can, and many more, in the future!