The latest service pack could introduce undocumented internal optimizer changes
After an uncertain start, SQL Server 2005 SP3 is now available. First the uncertainty had to do with the fact that with all the attention focused on Ask for a Service Pack and You Shall Receive), and the company didn’t release a SQL Server 2005 SP3 beta until the end of October. Additional uncertainty was generated by the fact that some of the initial SQL Server 2005 SP3 links posted on the MSDN website were truncated, so clicking them generated errors. However, you can download SQL Server 2005 SP3 from http://www.microsoft.com/downloads/details.aspx?FamilyID=ae7387c3-348c-4faa-8ae5-949fdfbe59c4&displaylang=en. Make sure to check the supporting documents listed on that page, such as the ReadMe file and the What’s New documentation.
Even if you read all the supporting documents you might still be confused, especially if you’ve previously applied hotfixes or recent Cumulative Updates (CUs) for SQL Server 2005 SP2 (CU12 or CU11). The most succinct explanation of the relationship between SQL Server 2005 SP3 and the various CUs, including SQL Server 2005 SP3 CU1, which will be released in a week, can be found on Aaron Bertrand’s blog. You can find a list of bugs that have been fixed in SQL Server 2005 SP3 at http://support.microsoft.com/?kbid=955706.
The purpose of a service pack is to provide a tested intermediate-level upgrade that contains fixes to bugs encountered since the last version release. Service packs don’t typically add new features to the product, although they might provide enhanced behavior to features that were new in the most recent version but weren’t completely ready by the release to manufacturing date. However, Microsoft doesn’t always abide by this definition of service pack, and there have been SQL Server service packs that have added new features, new columns to existing metadata objects, or even entirely new metadata views or functions.
Another area in which SQL Server service packs offer enhancements is in the hidden internal engine behavior. The engineers at Microsoft are constantly trying to improve the optimizer by giving it more capabilities to explore possible query plans and come up with better performing plans for your queries. Most people would think adding these capabilities is a good thing because they like their queries to run fast and they don’t mind that Microsoft doesn’t list internal optimizer changes in the What’s New documentation for each service pack. However, there are some of us who teach classes and give presentations on query tuning that expect the optimizer to behave a certain way and aren’t 100 percent thrilled when these little changes sneak in.
For example, I came across one of these behavior changes when I was demonstrating in a SQL Server 2000 class that there were two ways a nonclustered index could be used. SQL Server could perform an index seek and follow the index tree to the nonclustered leaf, and from there find a bookmark into the base table to retrieve the rest of the information requested. You can see an example of this behavior in the following SQL Server 2005 AdventureWorks database query:
SELECT * FROM Sales.SalesOrderDetail<br>WHERE ProductID = 838
There’s a nonclustered index on SalesOrderDetail.ProductID. The optimizer determined that few enough rows would have a value of 838 that it would be an efficient query plan to find all the bookmarks for those rows in the nonclustered index, and then find the corresponding rows in the base table.
Or, if the query was what we call a "covered query," SQL Server could just scan the leaf level of the nonclustered index and get all the necessary information. The following is an example of this behavior:
SELECT SalesOrderID, count(*)<br>FROM Sales.SalesOrderDetail<br>GROUP BY SalesOrderID
The nonclustered index leaf level contains every single ProductID value, so SQL Server can find the counts of all those values by scanning the nonclustered index.
Those plans (i.e., seek through a nonclustered index to find a few specific rows or scan the entire leaf level and never touch the data) used to be the only two choices. So I was surprised to find, in the middle of a class right after a new service pack for SQL Server 2000 had been installed, that a query I was demonstrating gave me a completely different plan.
SELECT * FROM Sales.SalesOrderDetail<br>WHERE SalesOrderDetailID < 100
For the preceding query, the optimizer realizes that the clustered index keys are included in every nonclustered index row and that the second column of the clustered index is SalesOrderDetailID. There’s no index specifically on that column, but it’s part of the leaf level of the ProductID index. If SQL Server scans that index, it can find all the rows with the requested values for SalesOrderDetailID, and then it can find the corresponding rows in the table to retrieve the rest of the columns. It turns out SQL Server can do this only if it has column statistics on the SalesOrderDetailID column so that it knows that very few rows will be returned. If you turn off auto update statistics or change 100 to a much larger value, you most likely will get a different plan.
This difference with the query plan is actually a very nice improvement in the optimizer that I wouldn’t want to have taken out of the product. Although I don’t like to be surprised by new behavior while presenting a demo in class, I’ll trade a little bit of embarrassment for the knowledge that SQL Server is just getting better and better with each new service pack. I wonder which of my example queries will work differently now?