How record DELETEs can cause index fragmentation

Question: Our team has been discussing causes of index fragmentation and someone argued that DELETE operations can cause fragmentation just like INSERT and UPDATE operations. This doesn’t make sense to me – can you explain how this is possible?

Answer: Record deletion can indeed cause fragmentation.

There are two kinds of fragmentation described by SQL Server Books Online: logical fragmentation and low page density. Some people have other names for these, including external, internal, and physical fragmentation. I always discourage people from using these names, as they are not referenced or explained by Books Online (and I wrote the Books Online around fragmentation :-)

Related: How Does Index Fragmentation Affect SELECT Performance?

Logical fragmentation is where the data file page with the next set of key values in an index is not the next physically adjacent page. A variant of this is extent fragmentation, where the adjacency is at the extent (8 contiguous data file pages) level rather than the page level. Both of these lead to reduced range scan efficiency through reduced readahead I/O size.

Low page density is when there is empty space on data file pages in an index, either caused by page split operations, record deletion, or record size where only a few records can fit on a page, forcing empty space (e.g. a 5000-byte clustered index record size, where only one record can fit per 8KB page, leading to 3000 bytes of empty, wasted space per page). This leads to reduced data density – extra space required on disk to store the data, wasted buffer pool memory, and more I/Os are required to read the same amount of data. You can read more about the performance implications in my in-depth blog post here.

As I mentioned above, record deletion can cause low page density, which is a form of index fragmentation. Consider the case where a table has a clustered index with an INT IDENTITY key. New records will always be inserted with a key value higher than all previously inserted records, so the record insertion point will be (conceptually) on the right-hand size of the index structure.

If records are randomly deleted from the table, this will cause free space ‘holes’ throughout the index structure. As the index key means that all new records are inserted at the right-hand side of the index, the free-space holes will never be filled by newly inserted records. Over time, with many deletes, the amount of free space in the index may become significant, leading to classic low data density issues as I described previously. This will necessitate removing the free space with an ALTER INDEX … REORGANIZE or ALTER INDEX … REBUILD operation.

To summarize, all DML operations (INSERT, UPDATE, and DELETE) can cause index fragmentation, if you consider both logical fragmentation and low page density, and the index structure lends itself to index fragmentation of one kind or the other occurring.

Discuss this Blog Entry 2

on Feb 14, 2014

Thanks! This information is exactly what I looking for. But I still have one more question:

What do you advise for a table with IDENTITY PRIMARY KEY index: delete unused rows or mark them with some kind of "deleted" BOOLEAN field?

If use DELETE command -- it will cause an index fragmentation, but we can reorganize an index at any time by using REBUILD operation and get a "fresh" high-speed table.

If use "deleted" boolean field -- it will NOT cause an index fragmentation, but database size will be increased and increased, and increased ... with unusable rows. Is it a problem?

Thanks!

Roman.

on Feb 14, 2014

Hey Roman - that's really up to you. You're either choosing continued growth where you'll never get the space back, or gradually decreasing page density where you need to periodically need to reclaim the space. Totally up to you which you choose. Thanks

Please or Register to post comments.

What's SQL Server Questions Answered?

Practical tips and answers to many of your questions about SQL Server including database management and performance issues.

Contributors

Paul S. Randal

Paul Randal worked on Microsoft's SQL Server team for nine years in development and management roles, writing many of the DBCC commands. Randal was ultimately responsible for SQL Server 2008'...

Kimberly L. Tripp

Kimberly L. Tripp has been working with SQL Server since 1990, and she’s worked as a consultant, trainer, speaker, and writer specializing in core SQL Server performance tuning and availability...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×