Q: Our index maintenance plan is about as simple as it gets: We rebuild all of our indexes every night. The problem is that it’s taking longer and longer to complete, and the log backups are huge every morning. What can we do better?
This plan becomes a problem because an index rebuild operation will always build a new index, which means extra space is required to build the new index before dropping the old one; a lot of CPU and I/O resources are required, which can overload the system if there isn’t a designated maintenance window; and backups become an issue.
If you’re using the FULL recovery model, the entire index rebuild operation is fully logged, which means the transaction log file must be at least as large as the index being rebuilt. It also means the next transaction log backup will essentially contain the entire index. This will have knock-on effects for log shipping because the log backup will have to be copied to the log shipping secondary servers and restored—essentially replaying the index rebuild; for transactional replication because the transaction log must be scanned by the replication log reader agent job, although it won’t be replicated; and for database mirroring because the transaction log generated by the index rebuild must be transmitted to the mirror server.
In the BULK_LOGGED recovery model, although the amount of transaction log generated is a lot less, the next transaction log backup will still contain roughly the same amount of data as if the operation were performed in the FULL recovery model because it must also back up the data extents changed by the minimally logged operation.
What’s worse is that an index rebuild operation will always build a new index, even if there’s minimal or no fragmentation. So yes, I’d say that it definitely isn’t optimal to rebuild all indexes every night.
Many people move to a strategy in which fragmentation is analyzed every night and only the fragmented indexes are processed. This is a step in the right direction, but running the sys.dm_db_index_physical_stats function is also very I/O intensive. (For an in-depth explanation of this function, see my blog post " Inside sys.dm_db_index_physical_stats".)
I prefer to analyze fragmentation over a week or two and then have index maintenance driven by a table containing the objects and indexes that are worth analyzing to see if fragmentation needs to be taken care of. This method cuts down on the resources required to do the analysis phase and can drastically reduce the length of time the index maintenance takes to run.
Once the analysis phase is working, I would then carefully move away from just using ALTER INDEX REBUILD and start including ALTER INDEX REORGANIZE operations. An index reorganize addresses only the fragmentation that exists, so it can be much more efficient for a lightly fragmented index than simply rebuilding the whole thing. (In fact, this is one of the reasons I wrote its predecessor—DBCC INDEXDEFRAG—for SQL Server 2000.)
Most people base the decision of which method to use on the value of the avg_fragmentation_in_percent column from the sys.dm_db_index_physical_stats function, using the numbers I came up with for SQL Server Books Online as the thresholds. (See my blog post “ Where do the Books Online index fragmentation thresholds come from?” for more information.)
This is just a quick overview of how to go about performing index maintenance—you also need to consider statistics maintenance. An index rebuild always rebuilds all the index column statistics with the equivalent of a full scan, whereas an index reorganize doesn’t touch statistics. Make sure you’re taking the appropriate action based on your decision to reorganize, rebuild, or do nothing. I would definitely encourage you to move to a more targeted approach to index maintenance rather than the sledgehammer method.