Last month, I described how to use the new SQL Server 2005 sys.dm_db_index_physical_stats() function to detect fragmentation. This month, I'll tell you how SQL Server can most efficiently remove that fragmentation.
Keep in mind that fragmentation isn't always a bad thing.The biggest performance penalty from having fragmented data arises when an application needs to perform an ordered scan on that data.The more the logical order differs from the physical order, the greater the cost of scanning the data. When an application needs only one or a few rows of data, it doesn't matter whether the table or index data is in logical order, is physically contiguous, or is spread all over the disk in totally random locations. So long as you have a good index to find the rows in which the application is interested, SQL Server can efficiently find one or a few rows, wherever they happen to be located physically.
Defragmentation is designed to remove logical fragmentation from an index's leaf level while keeping the index online and as available as possible.When defragmenting an index, SQL Server needs to take only an intent-shared (IS) lock on the index B-tree (unlike rebuilding a clustered index, for which SQL Server needs an exclusive lock on the entire table). SQL Server 2005 uses the ALTER INDEX command's REORGANIZE option to initiate defragmentation. (Therefore, I'll be using the terms defragment and reorganize interchangeably.) The general form of the command to remove fragmentation is
As you can see, SQL Server 2005 makes some enhancements to the ALTER INDEX command's functionality. SQL Server 2005 supports partitioned indexes, so you can choose to defragment just one partition; the default is to defragment all partitions. Specifying a partition number when the index isn't partitioned will result in an error.You can also control whether the defragmentation will affect LOB data.
Defragmenting an index takes place in two phases.The first phase is a compaction phase; the second phase involves the rearranging of data pages so that the logical and physical order of the pages match. Most of SQL Server 2005's internal improvements to the defragmentation operation occur in the compaction phase.
Fill 'Er Up
Every index is created with an initial fullness percentage, called the fillfactor.You can specify this value in the CREATE INDEX command or have SQL Server use the default value (specified as a server-wide configuration option).The fillfactor value is the percentage to which each leaf-level page should be filled. If the initial value is 25, the leaf level will use four times as many pages than if the value was 100 (a value that would put four times more data on each leaf page).The initial fillfactor value is stored with the index metadata, so SQL Server can inspect this value during defragmentation. SQL Server will attempt to reestablish the initial fillfactor if that value is greater than the existing fillfactor. Putting more rows on each page and increasing the percentage of each page's fullness can compact data, so SQL Server might end up removing pages from the index after defragmentation is complete. (If the existing fillfactor is greater than the initial fillfactor, SQL Server would have to add more pages to reestablish the initial value—something that doesn't happen during defragmentation.)
The compaction algorithm inspects adjacent pages (in logical order) to determine whether there is room to move rows from the second page to the first. SQL Server 2005 makes this process even more efficient by looking at a sliding window of eight logically consecutive pages and determining whether rows from any of the eight pages can be moved to other pages to completely empty a page.
How Large is LOB?
As you saw earlier, the ALTER INDEX command provides the option to compact LOB pages.The LOB data types are image, text, ntext, varchar(max), nvarchar(max), varbinary(max), and xml; the default is ON. Reorganizing a specified clustered index compacts all LOB columns in the clustered index. Reorganizing a non-clustered index compacts all LOB nonkey columns (called included columns) in the index. If you specify LOB_COMPACTION when there are no LOB columns present, SQL Server 2005 simply ignores the request.
In SQL Server 2000, the only way you can compact LOBs in a table is to unload and reload the LOB data. This process involves copying the data to an external storage area, setting the LOB column to NULL, then copying the LOB data back into the table. SQL Server 2005's LOB compaction makes this process much more efficient by finding low-density extents (i.e., extents that are less than 75 percent utilized). SQL Server 2005 moves pages out of these low-density uniform extents and places the data from those pages into unused pages in other uniform extents already allocated to the LOB allocation unit.This functionality allows better use of disk space that otherwise would be wasted on low-density LOB extents. SQL Server 2005 allocates no new extents during the compaction phase or during the next defragmentation phase.
The second phase of the reorganization operation moves data to new pages in the row-data allocation unit. The goal is for the data's logical order to match physical order. SQL Server 2005 keeps the index online because this phase processes only two pages at a time, in an operation similar to a bubble sort. Consider an index on a column of datetime data. Monday's data logically precedes Tuesday's data, which precedes Wednesday's data, which precedes Thursday's data. If Monday's data is on page 88, Tuesday's is on page 50, Wednesday's is on page 100, and Thursday's is on page 77, the physical and logical ordering doesn't match in the slightest and we have logical fragmentation.When defragmenting this index, SQL Server 2005 will determine the first physical page belonging to the leaf level (page 50) and the first logical page in the leaf level (page 88, which holds Monday's data), and will swap the data on those two pages, using one additional new page as a temporary storage area. After this first swap, the first logical page, holding Monday's data, is now on the lowest numbered physical page (page 50). After each page swap, SQL Server releases all locks and latches and saves the key of the last moved page.The next iteration of the algorithm uses the saved key to find the next logical page (Tuesday's data, which is now on page 88).The next physical page is 77, which holds Thursday's data, so another swap is made to place Tuesday's data on page 77 and Thursday's on page 88. This process continues until no more swaps need to be made.
SQL Server 2005 Books Online (BOL) provides the recommendations in Table 1 for when to reorganize an index versus when to rebuild it, based on the amount of fragmentation found. Notice there is no lower limit for when to use the REORGANIZE option. Obviously, you won't reorganize if the value is 0, but neither should you defragment your index when the value is so low that the benefit of reorganizing isn't worth the cost (see the sidebar "To Defrag or Not To Defrag?" for an explanation of the potential cost).You'll need to run some tests to determine how much fragmentation negatively affects your application's performance and when defragmentation is warranted.
Keeping an Eye on Things
You need to be aware of some restrictions to invoking the REORGANIZE option. Certainly, if you've disabled an index (a new SQL Server 2005 feature), it can't be defragmented. Also, because the process of removing fragmentation needs to work on individual pages, you'll get an error if you try to reorganize an index for which the ALLOW_PAGE_LOCKS option is set to OFF.You can't reorganize an index when a concurrent online index build exists on that index or when another process is concurrently reorganizing the index.
SQL Server 2005 doesn't return a report at the completion of the reorganization, like SQL Server 2000 does. (The report is a row of output that contains the number of pages in the table, the number of pages moved, and the number of pages removed.) However, you can observe the progress of each index's reorganization by observing the complete_ percentage and estimated_completion_time columns in the sys.dm_exec_requests Dynamic Management View (DMV). The sys.dm_exec_requests view is one of the DMVs that you should use as a replacement for the SQL Server 2000 sysprocesses table.The value in the complete_percentage column reports the percentage completed in one index's reorganization; the estimated_completion_time column shows the estimated time (in milliseconds) required to complete the remainder of the index reorganization. If you're reorganizing multiple indexes in the same command, you might see these values go up and down as each index is defragmented in turn. (Both columns also keep track of progress for commands other than index defragmenting. These commands include the DBCC CHECK commands, DBCC SHRINKDATABASE, DBCC SHRINKFILE, BACKUP/RESTORE, and ROLLBACK. In a future article, I'll go into these progress-report values in more detail and explain how to make use of all the DMVs that contain data similar to the sysprocesses table.)
Get It Together
The internal process of removing fragmentation in SQL Server 2005 is similar to the process in SQL Server 2000. However, the command for invoking the defragmentation operation (the ALTER INDEX command with the REORGANIZE option) is different. This syntax presents options that aren't available in earlier SQL Server versions, such as the option to remove fragmentation from LOB data columns. SQL Server 2005 also provides more options for keeping track of the defragmentation's progress, using the new sys.dm_exec_requests DMV.
Once again, keep in mind that fragmentation isn't always bad (it's rarely good, but not good isn't the same thing as bad in this case).You can compare it to a messy desk on which you nevertheless know where everything is.When you're looking for one particular paper, you can find it quickly, but finding all the papers relating to a project you're working on will take much longer. In that situation, a nicely organized desk is a better option.
Kalen Delaney (firstname.lastname@example.org) is a principal mentor of Solid Quality Learning and provides SQL Server training and consulting to clients around the world. Her most recent book is Inside Microsoft SQL Server 2000 (Microsoft Press).