Last month, I told you about changes to the query optimizer's behavior in SQL Server 2000 and 7.0 and the huge increase in the optimizer's complexity. The optimizer can evaluate so many more possible plans than in previous releases that there's no guarantee that the optimizer will always find the best plan; the optimizer's goal is to find a plan that's "good enough." This month, let's look at a couple of the new query-processing techniques that the optimizer can include in a query plan.
Will You Join Me?
Microsoft added the first new technique to the SQL Server 7.0 initial release. Before SQL Server 7.0, a general rule was that SQL Server could use only one index per table for any given query (with one exception, which I'll tell you about in a later article). The optimizer's job was to find which index SQL Server could traverse during query execution to find the desired result rows with the fewest page accesses. After comparing the cost (number of page accesses) to the number of pages in the table, the optimizer might decide that scanning the whole table would be cheaper than using any of the available indexes.
Because of changes in the way that SQL Server stores indexes internally, the optimizer in SQL Server 2000 and 7.0 checks the possibility of using more than one index on the same table to process a query. Let's look at an example, then I'll tell you some details about the new processing techniques. The script in Listing 1 creates a copy of the Northwind database's Order Details table, which I call OrderDetails (no space in the name). The script then builds two nonclustered indexes on Order-Details.
Each of the queries in Listing 2 contains a search expression for a key column in one of the indexes. The first query has a search argument on the Quantity column, and the second query has a search argument on ProductID. If you execute Listing 1's script, then look at the query plans for the two queries in Listing 2, you'll see that neither of the plans uses the nonclustered index on the column in the WHERE clause; both query plans use a table scan.
If you run the command SET STATISTICS IO ON, then execute Listing 2's queries, you'll see that the queries each take 10 logical reads—one for each page in the table. The first query returns 58 rows. If the optimizer had decided to use the nonclustered index on Quantity, SQL Server would have had to perform 58 bookmark lookup operations, a much higher cost than the 10 logical reads of the table scan. The second query returns 33 rows, so it, too, would have cost more than 10 logical reads if the optimizer had decided to access the nonclustered index and perform bookmark lookups.
Now look at the following query, which combines the two search arguments from Listing 2's queries:
WHERE ProductID = 10
AND Quantity = 24
This query returns three rows at a cost of seven logical reads, which is less than the cost of a table scan. The query plan looks like SQL Server is performing a JOIN operation, even though it's accessing only one table. SQL Server performs this SELECT operation by first using the index on ProductID to find all the rows that meet the first condition and storing information from the ProductID index in an internal worktable. SQL Server uses the index on Quantity to find all the rows that meet the second condition, then stores information from that index in another worktable. SQL Server needs to join the two worktables to find the rows that exist in both worktables, which is why a JOIN operator appears in the query plan.
For SQL Server to be able to join tables, they must have some field in common. As you know, besides holding an index key value, each row in a nonclustered index's leaf level has a pointer (aka a bookmark) telling where to find the data row. So, SQL Server can "join" the two worktables on this bookmark. In fact, if you look at the graphical showplan or enable SET SHOWPLAN_TEXT ON before running the query, you'll see the following details of the join:
Microsoft developers were able to add the technique of using multiple indexes on one table to SQL Server 7.0 partly because of another new ability, which lets SQL Server process JOIN operations in several different ways. This new ability, the Hash Match operator for joins, was also new in SQL Server 7.0; I'll discuss hashing next month.
If a nonclustered index supports more than two search arguments, the query optimizer can decide to use more than two indexes on the same table. In the small, simple Northwind database, it's hard to find an example of a query that uses this kind of a plan, but you might see such a plan in more-complex production databases.
A Little Knowledge
The second new query-processing technique that the optimizer uses is one that I mentioned last time—it took me by surprise when I noticed it. In "Time for a Tune-Up," August 2001 and "The Big Cover-Up," September 2001, I discussed a situation in which all the columns a query needs are part of a nonclustered index's keys. Such a query is called a covered query, and the index is a covering index. In those articles, I showed this example:
SELECT DISTINCT customerID
WHERE customerID LIKE 'C%'
The Northwind database has a nonclustered index on customerID, so every customerID value is in that index's leaf level. The optimizer can determine that scanning the nonclustered index's leaf level is much more efficient than scanning the entire table because the leaf level of a nonclustered index has substantially fewer pages than the table.
The ability to use a covering index has been available since the first version of SQL Server. However, through SQL Server 7.0 Service Pack 1 (SP1), the only time SQL Server would scan a nonclustered index's leaf level was when the index was also a covering index. Before SQL Server 7.0 SP2, a query plan would never show that SQL Server was performing both a nonclustered index scan and a bookmark-lookup operation. As I mentioned last month, using the bookmarks at the leaf level to access many data pages after examining many leaf-level nonclustered index rows soon becomes more expensive than scanning the entire table.
But what if SQL Server needs to access only a few data pages? SQL Server 7.0 SP2 introduced a new query-processing technique that takes advantage of statistics on nonindexed columns. To use this technique, you need a table that's larger than the original 2155 rows. Listing 3's code copies the Northwind Order Details table into itself three times, resulting in an OrderDetails table of 8620 rows and 35 data pages. The code also changes the UnitPrice of one row, then builds a composite nonclustered index on OrderID and UnitPrice.
Now copy this SELECT statement into Query Analyzer and look at the query plan:
WHERE UnitPrice = 1.99
The optimizer scans the leaf level of the composite index, then performs a bookmark-lookup operation. The query is looking for rows with a UnitPrice value of 1.99, and only one such row exists. However, because UnitPrice isn't the leftmost column of the index, SQL Server can't perform an index seek to access the row for that unit price directly in the nonclustered index. Before SQL Server 7.0 SP2, the only alternative strategy is to scan the entire table. But in later releases, the query optimizer can use the statistics on the UnitPrice column to determine that only a few rows meet the search condition—in this case, exactly one row. The chosen plan—which scans the nonclustered index leaf level, then accesses the data pages by using the bookmark-lookup operation for one row—is cheaper than scanning the entire table.If you turn on STATISTICS IO, you'll see that this plan needs only 28 page accesses, whereas a table scan would have to read all 35 pages in the table. This difference might not seem substantial, but for tables of millions or hundreds of millions of rows, the difference could be huge and noticeably affect performance.
Keep in mind that you don't need to do anything for SQL Server to take advantage of these enhancements. They're just a few examples of how Microsoft continues to improve SQL Server.