In Optimizing a Suboptimal Query Plan (InstantDoc #94775) I described a query for which the default plan that the optimizer produced was slower than one that was forced with a hint. Recently I received comments about the article from Will Alber from the UK who attended one of my classes. Will had very interesting findings about the scenario that lead to a more broad discussion about optimization and caching in general. Will found a certain pattern in my sample data that affected the efficiency of the plan, and his findings are quite intriguing. I’ll first briefly describe the scenario in the aforementioned article, then introduce Will’s findings, and then discuss the implications.

Scenario in Article Optimizing a Suboptimal Query Plan (InstantDoc #94775)

The aforementioned article described the following scenario:

You have a table T1 with columns col1 (INT), col2 (INT), and filler (CHAR(200)). The table has a clustered index on col1 and a nonclustered index on col2. The table is populated with 1,000,000 rows using the following code:

SET NOCOUNT ON;

IF DB_ID('testdb') IS NULL

  CREATE DATABASE testdb;

GO

USE testdb;

GO

 

-- Create the function fn_split

IF OBJECT_ID('dbo.fn_nums', 'IF') IS NOT NULL

  DROP FUNCTION dbo.fn_nums;

GO

 

CREATE FUNCTION dbo.fn_nums(@n AS INT)

   RETURNS TABLE AS RETURN

 

WITH

C0 AS(SELECT 0 AS const UNION ALL SELECT 0),

C1 AS(SELECT 0 AS const FROM C0 AS A, C0 AS B),

C2 AS(SELECT 0 AS const FROM C1 AS A, C1 AS B),

C3 AS(SELECT 0 AS const FROM C2 AS A, C2 AS B),

C4 AS(SELECT 0 AS const FROM C3 AS A, C3 AS B),

C5 AS(SELECT 0 AS const FROM C4 AS A, C4 AS B),

C6 AS(SELECT 0 AS const FROM C5 AS A, C5 AS B)

SELECT TOP(@n) ROW_NUMBER() OVER

   (ORDER BY const) AS n

FROM C6;

GO

 

-- Create the table T1

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL

  DROP TABLE dbo.T1;

GO

CREATE TABLE dbo.T1

(

  col1 INT NOT NULL,

  col2 INT NOT NULL,

  filler CHAR(200) NOT NULL

);

 

CREATE CLUSTERED INDEX

   idx_cl_col1 ON dbo.T1(col1);

CREATE NONCLUSTERED INDEX

   idx_nc_col2 ON dbo.T1(col2);

GO

 

-- Populate T1 with 1,000,000 rows

DECLARE @numrows AS INT;

SET @numrows = 1000000;

 

TRUNCATE TABLE dbo.T1;

 

INSERT INTO dbo.T1 WITH (TABLOCK) (col1, col2, filler)

  SELECT n, 1 + ABS(CHECKSUM(n))

      % 1000000000, 'a'

  FROM dbo.fn_nums(@numrows);

GO

In the article I discussed the optimization of a query that returns all rows from the table sorted by col2, starting with cold cache:

-- First, clear the data cache

USE testdb;

CHECKPOINT;

DBCC DROPCLEANBUFFERS;

GO

-- run with results discarded in order not to include the time it takes

-- to produce the output

SELECT * FROM dbo.T1 ORDER BY col2;

I explained that in the default plan, the optimizer chose to perform a full Clustered Index Scan Ordered:False, and then applied a Sort operator to sort the rows by col2. The data was scanned only once, but the sort operation was quite expensive. I showed that if you specified the FASTFIRSTROW hint (similar to FAST 1) the optimizer created a plan where it scanned the nonclustered index in order, and for each entry did a key lookup of the data row (clustered index seek). There was no need to sort the data since the entries in the nonclustered on col2 were already sorted, but this plan required much more I/O. On my test machine (basic machine with a dual core processor, single hard drive, 4GB RAM) the nondefault plan performed better than the default plan.

Will’s Findings—Scenario has Specialized Sample Data

Will observed that my sample data was—as it turned out by mistake—specialized. Observe in the population script above that col1 is assigned with the value n from an auxiliary table of numbers, and col2 is assigned with the expression 1 + ABS(CHECKSUM(n)). My intent was that the values in the two columns would not have any similar patterns, but forgot that CHECKSUM(<int_value>) is actually equal to <int_val>, hence all values in col2 were 1 greater than the col1 value in the same row. This resulted in an alignment in the way the values in the two columns increment. When populating the table with random values in col2 (e.g., with the expression 1 + ABS(CHECKSUM(newid())) % 1000000), such that there won’t be any alignment in the way col1 and col2 values increment, the default plan suddenly performs better than the nondefault one in a cold cache scenario which was the one tested in the article.

Cause and Implications of Findings

Remember that in the nondefault plan (the one with the hint) the nonclustered index is scanned in order, and for each entry the plan does a key lookup which is a seek in the clustered index. Due to the alignment between the way the keys in the nonclustered index on col2 and the keys in the clustered index on col1 increment, the key lookups end up being done in clustered index key order, resulting in optimized physical read activity. When lookups are done in clustered index key order SQL Server does fewer physical reads compared to when there is no such alignment and the key lookups are done in random order. When the cache is warm to begin with, the nondefault plan performs better than the default one. The reason is that logical reads are very cheap, and a strategy with a larger number of reads from cache but no sorting is faster than a strategy with a smaller number of reads but with sorting.

So when there’s no alignment between the way the values in col1 and col2 increment, if you run the query against cold cache, the lookups are done in random order, resulting in more physical reads. With cold cache as a starting point, the nondefault plan ends up performing worse than the default one. Of course the results may vary in different disk systems with different numbers of spindles. Still, with warm cache, the nondefault plan performs better than the default one for the aforementioned reasons.

Conclusion

One of the important lessons from this exercise is that you need to give a lot of attention to the sample data that you’re using when testing query performance. You need to make sure that the sample data adequately represents the patterns you expect in production. Extra care should be made not to have any specialized patterns that might be handled in a special way by the optimizer, unless such are the patterns in production, of course.

These findings are very interesting in the broader sense as well. It would be nice if the optimizer took into consideration things like ratios between cached vs. noncached data and optimized accordingly. It’s quite interesting to see that the caching scenario can impact which plan would be more efficient. I don’t think that the optimizer takes such things into consideration nowadays, but hope that the optimization team does think about it for future versions of the product.

 

Cheers,

BG