Sometimes you find yourself using a tool designed to solve one problem helpful in solving other problems as well. Recently I found by accident that the new RECOMPILE query hint in SQL Server 2005 can be helpful in solving parameter sniffing problems.

Related: The Pros and Cons of Parameter Sniffing

I’ll first describe the original intended use of the RECOMPILE query hint, then I’ll describe the parameter sniffing problem and the traditional solutions for the problem, and then I’ll explain how you can use the RECOMPILE query hint to solve parameter sniffing problems.

RECOMPILE Query Hint

The RECOMPILE query hint was originally designed to give you more granular control over what would be treated as a recompilation unit. Instead of specifying the RECOMPILE procedure option, you can now specify a RECOMPILE hint at the query level.

Related: Conditional Recompile

As to why you might want to have a procedure or query recompile every time you run it, this has to do with situations where it might be inefficient to reuse previously cached execution plans. Suppose you have a stored procedure that accepts input parameters that are used in the procedure’s queries. Suppose also that users invoking the stored procedure provide inputs with varying selectivity such that for different inputs different plans are considered optimal. In such a case, SQL Server’s default behavior of reusing previously cashed execution plans might render bad performance; this happens when the procedure ends up reusing a previously cached plan that is not optimal for the current input parameter values. In SQL Server 2000 you could specify the RECOMPILE option only at the procedure level:

CREATE PROC proc_name input_parameters WITH RECOMPILE AS …

This means that even if you have multiple queries within the stored procedure, all of them get reoptimized every time you run the procedure. But what if some of the queries in the stored procedure can benefit from reuse of previously cached plans while the other queries in the procedure need to be reoptimized every time you run the procedure? In SQL Server 2000 you didn’t have this level of control; the only way to solve the problem was to separate the queries that need to be reoptimized every time to a different procedure created with the RECOMPILE option, and invoke that procedure from the original procedure. SQL Server 2005 supports statement-level recompilations (as opposed to supporting only batch-level recompilations in SQL Server 2000). So now you can specify the RECOMPILE option as a query hint and have only that query recompile. So suppose you have five queries in a stored procedure, and two of them (say Query2 and Query5) need to be reoptimized every time the procedure is run. You can specify the RECOMPILE query hint only in those two queries, and allow the other three queries to reuse previously cached plans:

CREATE PROC proc_name input_parameters AS
Query1
Query2 OPTION(RECOMPILE)
Query3
Query4
Query5 OPTION(RECOMPILE)
GO

Parameter Sniffing Problem

As for the parameter sniffing problem; this problem has to do with the fact that when SQL Server optimizes a query that refers to local variables (as opposed to stored procedure’s parameters), the optimizer doesn’t know what the values in the local variables are. So it has to use a hard coded guess, which might render an inefficient plan. Take the following procedure for example:

USE Northwind;
GO
IF OBJECT_ID('dbo.usp_getorders', 'P') IS NOT NULL
  DROP PROC dbo.usp_getorders;
GO
CREATE PROC dbo.usp_getorders
  @days_back AS INT
AS

DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;

SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt;
GO

Turn on the STATISTICS PROFILE option (or the graphical actual execution plan), and run the procedure with the input value 0, meaning you are after all orders placed 0 days back from the maximum date:

SET STATISTICS PROFILE ON;
EXEC dbo.usp_getorders @days_back = 0;

Ignore the execution plan produced for the first query in the procedure (obtaining the maximum order date). Observe the execution plan produced for the main query in the procedure (the second query). Here’s the output of STATISTICS PROFILE (abbreviated to focus on the important parts):

Rows EstimatedRows StmtText
---- ------------- --------
4    249           |--Clustered Index Scan
                      (OBJECT:(\[PK_Orders\]),
                       WHERE:(\[OrderDate\]>=\[@dt\]))

The input parameter value is very selective so naturally you would expect the optimizer to use the index created on the OrderDate column, but it doesn’t; instead the optimizer opts for a table scan (Clustered Index Scan). The reason is that at query optimization time the optimizer did know what the value in the local variable @dt was, so it used a hard coded guess of 30 percent selectivity. Notice that the estimated number of rows was 249, and 249 divided by 830 (total number of rows in the table) is 30 percent. In practice, the query was very selective; only 4 rows were returned, so obviously you got a suboptimal plan (table scan) compared to using the index on OrderDate.

OPTIMIZE FOR Hint

SQL Server 2005 introduces a new query hint called OPTIMIZE FOR that allows you to specify a constant that you want the optimizer to use instead of the variable when optimizing the query. For example, if you know that users typically need to query recent orders, you can specify a selective value for optimization like so:

ALTER PROC dbo.usp_getorders
  @days_back AS INT
AS

DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;

SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt
OPTION(OPTIMIZE FOR (@dt = '99991231'));
GO

EXEC dbo.usp_getorders @days_back = 0;

Rows EstimatedRows StmtText
---- ------------- --------
4    1             |--Nested Loops...
4    1               |--Index Seek
                        (OBJECT:(\[OrderDate\]),
                         SEEK:(\[OrderDate\] >= \[@dt\])
                         ORDERED FORWARD)
4    1               |--Clustered Index Seek
                        (OBJECT:(\[PK_Orders\]),
                         SEEK:(\[OrderID\]=\[Orders\].\[OrderID\])
                         LOOKUP ORDERED FORWARD)

As you can see, you got a good plan for a selective filter. The estimated number of qualifying rows based on the specified input was 1, so the optimizer naturally decided to use the index on OrderDate.

But what if different users can specify inputs with varying selectivity, such that for invocations of the procedure with selective inputs the optimal plan would be to use the index on OrderDate, and for invocations with non-selective inputs the optimal plan would be to use a table scan? If you hadn’t used a local variable rather simply referred directly to the stored procedure’s parameter in the query the solution would have been simple; use the RECOMPILE query hint. But you need to refer to a local variable in the query, and there’s no one constant that is adequate for all scenarios that you can specify in the OPTIMIZE FOR hint.

Using the RECOMPILE Query Hint to Solve Parameter Sniffing Problems

Now to the main point that I want to make…

Recently by accident I tried something that solved the problem. I specified the RECOMPILE query hint in a query that referred to a local variable, and got an optimal plan. First I was surprised; how does the optimizer know what the value in the local variable was? But then it made perfect sense. Apparently when you specify the RECOMPILE query hint, the optimizer IS aware of the values of local variables in the query. The reason this makes perfect sense is that SQL Server 2005 supports statement level recompilations. With the RECOMPILE query hint, the query gets optimized AFTER the rest of the code was already run, including the assignments of values to variables.

To see this in action, alter the procedure’s definition to include the RECOMPILE query hint in the query:

ALTER PROC dbo.usp_getorders
  @days_back AS INT
AS

DECLARE @dt AS DATETIME;
SET @dt = (SELECT MAX(OrderDate) FROM dbo.Orders) - @days_back;

SELECT OrderID, OrderDate, CustomerID, EmployeeID
FROM dbo.Orders
WHERE OrderDate >= @dt
OPTION(RECOMPILE);
GO

Run it with a selective input, and you will see in the plan that the index on OrderDate is used:

EXEC dbo.usp_getorders @days_back = 0;

Rows EstimatedRows StmtText
---- ------------- --------
4    4             |--Nested Loops...
4    4               |--Index Seek
                        (OBJECT:(\[OrderDate\]),
                         SEEK:(\[OrderDate\] >= \[@dt\])
                         ORDERED FORWARD)
4    1               |--Clustered Index Seek
                        (OBJECT:(\[PK_Orders\]),
                         SEEK:(\[OrderID\]=\[Orders\].\[OrderID\])
                         LOOKUP ORDERED FORWARD)

Run it with a non-selective input, and you will get a table scan:

EXEC dbo.usp_getorders @days_back = 365;

Rows EstimatedRows StmtText
---- ------------- --------
550  549.3333      |--Clustered Index Scan
                      (OBJECT:(\[PK_Orders\]),
                       WHERE:(\[OrderDate\]>=\[@dt\]))

You can also observe that the optimizer makes much better selectivity estimations; hence you get an optimal plan for each case.

Conclusion

I hope that your conclusion from these findings will not be to always use the RECOMPILE query hint to solve parameter sniffing problems. When a query in your stored procedure refers to a local variable, and you know that the variable will have similar selectivity in different invocations of the procedure, it’s better to use the OPTIMIZE FOR query hint; this way you can benefit from reusing previously cached plans. But if with each invocation of the procedure the local variable ends up with different selectivity, and the query can benefit from reoptimization upon each invocation, now you have a solution; you can use the RECOMPILE query hint and enjoy optimal plans even though the query refers to local variables, at the cost of recompilations.

Learn more: Do You Need a SQL Server Query Hint?

--
BG