Understanding how SQL Server's query optimizer makes decisions
DBAs and SQL Server developers who are new to SQL Server tuning are often puzzled by why the query optimizer in SQL Server would prefer to scan an entire table instead of using a nonclustered index that seems to fit the query perfectly. To explain this mystery, I'll use the following query, which retrieves data from the SalesOrderDetail table in the AdventureWorks database:
FROM sales.SalesOrderDetail
WHERE ProductID = 870
If you have any version of the AdventureWorks sample database, you can follow along. If you don't, you can download it from CodePlex. (See also, "Performance Tip: Find Your Most Expensive Queries.")
The Demonstration
To begin, click the Include Actual Execution Plan button on the toolbar in SQL Server Management Studio (SSMS). This will allow you to see the execution plan chosen by the optimizer.
Next, type the following command in the query window:
SET STATISTICS IO ON;
GO
SELECT LineTotal
FROM sales.SalesOrderDetail
WHERE ProductID = 870;
Note that you might need to change the database name (AdventureWorks2008R2) to the version of the database you're using.
Be sure to include the SET STATISTICS IO ON statement. It'll cause SSMS to display the query cost (in logical page reads) on the Messages tab in the results pane. If you're unfamiliar with the SET STATISTICS IO ON statement or the Include Actual Execution Plan option, see the article "SQL Server Query Tuning." This article also explains why logical page reads are the best metric to use to measure query cost.
In its entirety, the SalesOrderDetail table has 121,317 rows of data contained in 1,240 8KB data pages. It has a nonclustered index on ProductID and a multi-column clustered index on SalesOrderID and SalesOrderDetailID. With the value of 870 chosen for the ProductID filter, this query will return 4,688 rows or about 4 percent of the total number of rows in the table.
In light of this information, you'd expect the query engine to execute a seek on the ProductID index. However, if you execute this query and examine the Execution Plan tab in the results pane, you'll find that instead of doing a seek on the ProductID index as you expected, the optimizer preferred a full scan of the clustered index. It preferred to read all the data contained in the table rather than just the 4 percent of the rows where the ProductID equals 870. Looking at the Messages tab in the results pane will confirm that the query is reading all 1,240 data pages in the table.
Is it a mistake? No, it's not a mistake. That's the optimum query plan. If you don't believe it, change the query slightly to force the use of the ProductID index:
FROM sales.SalesOrderDetail
WITH (index=IX_SalesOrderDetail_ProductID)
WHERE ProductID = 870
After you execute this query, look at the Messages tab. You'll see that SQL Server had to read 14,377 data pages to return the same result. Forcing the "right" index was roughly 10 times more expensive than the clustered index scan chosen by the optimizer.
The Tipping Point
The key to this mystery is understanding a crucial difference between clustered and nonclustered indexes. Most DBAs and SQL Server developers know that the clustered index contains the data itself in the leaf level of the index, whereas nonclustered indexes contain pointers to the pages containing the data. However, based on my experiences in the field, it seems to me that the implications of that difference aren't widely understood.
At the end of a seek or scan on a clustered index, the data has been retrieved and nothing further needs to happen. At the end of a seek or scan on a nonclustered index, the query engine has only pointers to the actual data. The data itself has yet to be returned.
When a nonclustered index is used, the data is retrieved one row at a time with a series of single-page I/O operations called Row ID (RID) lookups. A page returned by a RID lookup might contain several qualifying rows. If so, it won't be read just once. It'll be read several times, once for each qualifying row.
The bottom line is that a nonclustered index seek will typically use at least as many page reads as the number of rows returned and, depending on various other conditions, perhaps many more. The point at which the number of page reads required by the RID lookups exceeds the total number of data pages in the table is the point at which a clustered index scan becomes less expensive than the nonclustered index seek. This point is called the tipping point. When the tipping point is exceeded, the optimizer will usually prefer a scan of the clustered index instead of a nonclustered index seek.
It's important to know that nonclustered covering indexes don't use RID lookups, so they don't have this problem. For this reason, they can be lifesavers in some situations. Covering indexes contain all the data required by a specific query or set of queries. The data doesn't need to be retrieved from the base table because all the necessary data is contained in the index itself. If you aren't familiar with covering indexes, you might want to check out the article "SQL Server Tuning - The Covering Index."

