In Search of Duplicate Indexes on Your Tables

Make your SQL Server indexes more useful

Downloads
98357.zip

Executive Summary:

I often see too many indexes on a given table that turn out to be redundant or less useful than they might seem to be at first. Use this trick to identify duplicate or redundant indexes on the tables in your databases.

I often see too many indexes on a given table that turn out to be redundant or less useful than they might seem to be at first. If you have more indexes than you really need, those indexes can be potentially harmful in terms of performance during Inserts, Updates, and Deletes, and they can increase the possibility of deadlocks.

I want to explore some of those reasons and show you how to identify the candidates for removal. To do so, I’ll use the DupeIndex script, which will generate a small report when I run it against my copy of the Adventureworks database. To highlight areas relevant to this discussion, I’ve included a few additional indexes in the database. For a little background about how to look for missing indexes, how to use a SQL Server DMV to examine the statistics of your existing indexes, and how to set a handle on which ones are in use, see the Learning Path.

Is It That Simple?


Take a look at Table 1, which shows the second portion of a report that outlines potential duplicate indexes. The report simply observes all indexes for each table and determines whether any have leading columns that are the same. Of course, matching leading columns don’t always mean that a certain index is duplicate or useless. Often, a compound index can have the same leading column as another index but be more selective. Or, the index might be a covering index—that is, an index that includes all the columns necessary to satisfy both the WHERE clause and the SELECT list, and negates the need to access the table itself.

Now, let’s examine a couple situations in which the indexes might be redundant. If you look at the indexes for the Vendor table, you can see that the VendorID column is the Primary Key (PK) and has a Clustered Index. That tells you that the index will never return more than one row if you include the VendorID in the WHERE clause with an equality expression; the fact that it’s clustered means that there’ll never be an additional lookup to satisfy the columns in the SELECT list. Therefore, it’s highly unlikely that you’d ever need the index on VendorID and AccountNumber because the engine can filter via the AccountNumber after it finds the row via the VendorID.

An exception is possible if the combination of VendorID and AccountNumber has been declared as a Unique Constraint. In that case, the combination is serving to enforce uniqueness even though it might never be used in a lookup. Even if this index was useful in our case, we certainly don’t need two of them. As you can see in Table 1, there are two identical non-clustered indexes on VendorID and AccountNumber—clearly unnecessary. Truly duplicate indexes provide no advantages whatsoever and will only add to the overhead associated with changes to the table and index maintenance. Another redundancy—this time with only a single column in the index—is on the BillOfMaterials table and the Unit- MeasureCode column. Again, these two indexes are true duplicates and only one of them might be necessary.

What about the indexes on the CultureID column of the Culture table? One is a clustered index and the other is non-clustered. The clustered index also happens to be the Primary Key constraint. Many people believe that a situation such as this is warranted, that you need an actual index that’s searchable in addition to the constraint. This myth is long-propagated. Constraints such as Primary Key and Unique create an index behind the scenes to enforce the constraints and are usable in searches, just like any other index.

Creating an index on the same expression as the Primary Key constraint is the most common concern I see related to duplicate indexes. The second most common is creating a nonclustered index on the same expression as the clustered index. If you have a clustered index already defined on a particular column, there’s no need for a non-clustered index on the same column.

Put It in Reverse


Table 2 is a sample of what you might see in the Reverse Indexes section of the report. In a reverse index, the columns of two indexes on a table are the same but in reverse order in the index expression. In this case, there are two columns—VendorID and AccountNumber—on the Vendor table. If all the queries that referenced either of these columns in the WHERE clause also included the other, these would be true reverse duplicate indexes. If both columns are specified, the order in the WHERE clause or index expression isn’t a factor because the engine is smart enough to arrange the lookup to match the index expression.

However, if you specified only one of the columns, it would need to be the first column in the index expression in order to do a seek. In this particular example, it might be best to create two single-column indexes on VendorID and AccountNumber instead of compound indexes. Or, if you frequently query on two columns together and sometimes on just VendorID, you can have a single index on VendorID, AccountNumber to satisfy both queries.

You’re the Judge


Remember that this simple report can’t replace a proper understanding of your data model and table usage. It can merely outline potential candidates for removal; you must be the final judge and apply good reasoning before taking action.

 

Discuss this Article 4

sm8680
on May 2, 2008
I'm having a problem finding the code and/or the report. Please advise. Thanks
AnneG_editor
on May 2, 2008
Sm8680, I'm checking into the missing-script problem and will get it resolved ASAP. Thanks for letting us know about it. Anne Grubb, Web site editor, SQL Server Magazine
gunneyk
on May 15, 2008
I just wanted to comment on one statement that I made in the original article. >>>If you have a clustered index already defined on a particular column, there’s no need for a non-clustered index on the same column.<<< While this is true for most cases as with most things in SQL Server there are exceptions to the rule. Keep in mind that the output of the report is meant to be used as a guide for potential duplicates. You must as always use good judgement and do what is proper for your enviornment.
AnneG_editor
on May 2, 2008
The code download link is up now (under DOWNLOAD THE CODE at the top of the article).

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.