As we all know I love EMC products. I've been using them for years at a variety of companies and have gotten the performance that I need from them.
Recently I was shown a whitepaper from EMC titled "Backup and Recovery for Microsoft SQL Server Using EMC Data Domain Deduplication Storage Systems". Over all the whitepaper isn't bad giving information about how to setup EMC's Data Domain product to be used with Microsoft SQL Server so that the backups can be taken from the SQL database and written to the Data Domain VTL and deduplicated.
However, if you were to scroll down to Appendix A you'll find some rather interesting information. If you don't want to download the PDF, I'll summarize here. The Appendix starts off OK talking about how indexes become fragmented and that hurts performance, and that the indexes are defragmented to fix the SQL Server performance. Under the heading "Addressing the Challenge" EMC writes:
[quote]EMC recommends addressing these challenges with a balanced approach. For instance, instead of defragmenting all indexes based on a schedule, consider defragmentation based on thresholds. Additionally EMC recommends the use of index keys that are less prone to fragmentation in the first place.[/quote]
The statement above starts out really strong and well thought out. And hopefully, that is what most of us are doing anyway, using either a third party product, one of the many free scripts out there, or something that you've written yourself. But that last statement tells me that whoever wrote this white paper doesn't really grasp database design all that well. As database designers we would love to be able to create all our indexes based on ordered values which only increase so that we wouldn't have to worry about index fragmentation. However, we work in the real world where we need to index based on things like Last Name or First Name, or Address, or Tax Id / Social Security Number, etc. Things that aren't values which are likely to simply increment. Typically the clustered indexes will be built on a sequential ID number so there's no worry about that becoming all that fragmented, but all the other indexes (which can easily be bigger than the database) sorry, those won't be sorted by that ID number, pretty much ever (yes there are exceptions).
If onlyand data deduplication could actually go hand in hand. Until then...