Editor's Note: SQL Server Magazine welcomes feedback about the magazine. Send your comments to firstname.lastname@example.org. Please include your full name, email address, and daytime phone number with your letter. We edit all letters and replies for style, length, and clarity.
Feng Shui and Page Splitting
I enjoyed Morris Lewis's "Feng Shui and Database Design" (March 2002, InstantDoc ID 23707) and have a question about selecting a clustered index so that inserts are scattered throughout the table, as Lewis suggested. For huge tables with a large transaction volume, wouldn't this cause substantial page splitting and, thus, substantial disk I/O? I was considering going without a clustered index and just using nonclustered indexes on the custid and orderdate columns or using a composite index.
Page splitting is less of a concern with 8K pages than it was with SQL Server 6.5's 2K pages because both the page being split and the newly allocated page will be roughly 50 percent empty. Remember that page splits occur only for inserts and updates that cause a row to move from one page to another. In choosing the field on which to index, you need to ensure that inserts won't cluster in one particular area of the database file. If they don't cluster in a certain area, eventually you'll reach a point where data pages will have enough space for a new row most of the time. Even if inserts do cluster in a certain area of the file, you'll still get to a point where pages will have some free space.
Also, because of how SQL Server 2000 and 7.0 allocate pages, the statistical trend will be to allocate new pages near the split, starting with the extent in which the full page resides. The only time this won't occur is if your database application rarely deletes enough rows on any given page to free up enough space. In that case, the newly allocated page will probably be physically located far away from the full page. How this will affect your I/O depends on how many files you have in your filegroup and where those files reside. The solution is to periodically rebuild your clustered index to let SQL Server move the rows to physically contiguous pages. You can also create your indexes with a fill factor of, say, 50 percent. But remember that the lower the fill factor, the more pages you need for your data. More pages translates into more I/O and more memory in the data cache when SQL Server needs to query the table.
Going back to the feng shui concept, page splits cause only a short-term blockage of data flow. Because of the allocation maps that SQL Server 2000 and 7.0 use, finding a new page takes only microseconds on most machines, and transferring 4K of data (at most) to the new page can be done in one or two machine instructions, depending on the CPU. The only real long-term side effect is that the number of pages allocated to the table will grow and most of the pages will probably be less than 100 percent full. If your disk controller is already impeding data flow, page splits will exacerbate the problem.
You should, therefore, focus more on what page splits do to your overall I/O performance and less on the effects of splitting the page itself. Whether the page splits are excessive depends on whether the inserted rows cluster in a certain area, on where the files in your filegroups reside, how big the table is in relation to the data cache, and how much capacity your disk controller has. Measure the success of your clustered index on how it impacts I/O over time. If your application is like most, you'll see disk activity reach a steady state.