Don't be reluctant, just be careful
One of the most fundamental aspects of T-SQL querying is filtering data. Besides logically limiting the rows that a query returns, filters can affect a query's performance. A filter is one of the most important query elements that SQL Server's optimizer examines to determine whether to use an index and to determine the access method to apply in the execution plan.
Some T-SQL programming tasks might seem basic when all you care about is writing code that returns the correct result. However, when aiming at good performance, the solution might be far from basic. Such is the case when you need to write a stored procedure that supports dynamic filters—that is, when your stored procedure accepts multiple input arguments and is supposed to determine which columns to filter, if at all, based on that input.
Some of the most frequently implemented solutions for filtering data in stored procedures have serious performance problems. So, let's look at two common implementations, discuss those implementations' performance problems, then explore a better implementation. But first, let's look at the data and tables I use in the examples that follow.
About the Examples
For my examples, I use the Orders table, which you can create and populate by running the script in Listing 1. This script creates four indexes to support queries that filter the orders by order ID, order date, customer ID, and employee ID. I added an extra column to the Orders table called filler, which consumes 2000 bytes. There are two reasons for adding this column. First, without it, the table is too small to demonstrate performance problems. Second, it demonstrates the use of noncovering indexes because I don't include the filler column in the indexes but include it in the queries.
Suppose you get a task to write a stored procedure that accepts four optional arguments: @OrderID, @OrderDate, @CustomerID, and @EmployeeID. To focus on the performance aspects of the different implementations of filters, the request is simplistic: Return all rows from the Orders table filtered by the given inputs. If a value isn't supplied for an argument, your query shouldn't filter the corresponding column. For example, if you execute the stored procedure without specifying any input arguments, the stored procedure should return all orders. If you specify the @OrderDate = '19970101' argument and no input values for the other arguments, the stored procedure should return all orders placed on January 1, 1997.
In practice, filtering requirements for stored procedures that implement dynamic filters are usually sophisticated (e.g., support LIKE predicates, include range filters), but for simplicity's sake, my examples deal only with equality operators. Also, when you use dynamic execution in production code, you need to perform input validation and protect your environment from SQL injection. For the sake of brevity, my examples don't deal with these aspects.
The stored procedure should be able to use indexes efficiently and reuse execution plans when adequate. Discussing efficient use of indexes and reuse of execution plans requires a lot of background and is outside the scope of this article. I'll assume that you already have this background; if not, check out Kalen Delaney's Inside SQL Server columns for details.
Common Problematic Implementations
Let's look at two filter implementations commonly used in stored procedures. These implementations and their performance problems are as follows:
Static query that uses the filter (column = @Parameter OR @parameter IS NULL). The script in Listing 2 creates the first version of the usp_GetOrders stored procedure that I'll discuss. This stored procedure uses static code with the following expression in the WHERE clause for each input parameter:
column = @Parameter OR @parameter IS NULL
All input arguments have a default value of NULL, which means that specifying inputs for the arguments is optional. If an input value isn't specified for a parameter, the parameter will be NULL. The @parameter IS NULL part of the expression always yields TRUE regardless of the result of the column = @Parameter part of the expression. Hence, when the @parameter is NULL, the whole expression always yields TRUE and doesn't filter out any rows.
If an input value is provided for a parameter, the @parameter IS NULL part of the expression always yields FALSE. Thus, the outcome of the column = @Parameter part of the expression determines whether a row is returned.
You can use variations of the code in Listing 3 to test the stored procedure. Feel free to play with the arguments. For example, you might invoke the procedure without specifying any arguments, specify more than one argument, and change the values. In your tests, examine the execution plans and STATISTICS IO, and query the master.dbo.syscacheobjects table to check whether plans are reused. Run DBCC FREEPROCCACHE when you want to clear all execution plans from the cache.
Disappointingly, regardless of which input values you specify when you invoke the stored procedure after clearing the cache, the same inefficient plan is generated. Figure 1 shows a sample execution plan. At this point, a discussion about whether the execution plan is reused (and it is reused in this case) is moot because the plan is inefficient and this problem must be resolved first.
Static query that uses the filter (column = COALESCE(@Parameter, column)). The script in Listing 4 creates a second version of the usp_GetOrders stored procedure. This script again invokes a static query. The following form of a logical expression is specified for each input argument in the filter:
column = COALESCE(@Parameter, column)
The trick here is that the expression compares the column with the parameter value when the parameter isn't NULL. When the parameter is NULL, the expression compares the column with itself—and such a comparison always yields TRUE, assuming the column was defined as NOT NULL.
Logically, this solution produces correct results. However, this implementation also suffers from inefficient plans. Regardless of the input you specify, you always get a table scan, as Figure 2 shows. But all hope isn't lost.
A Better Implementation
The script in Listing 5 creates a third version of the usp_GetOrders stored procedure. This script relies on dynamic execution rather than static code. The stored procedure constructs the query string dynamically within the @sql variable. For each input argument, the code concatenates a filter expression to the WHERE clause when the input isn't NULL.
The first logical expression that appears in the WHERE clause is 1=1. This trick saves you from having to figure out whether the WHERE clause should be specified at all (in case no input is specified) and whether to prefix a filter expression with the AND operator. The optimizer simply ignores the 1=1 expression because it has no effect on the query.
The third version of the usp_GetOrders stored procedure executes the sp_executesql system procedure to invoke the query that was constructed dynamically within the @sql variable. Because the invoked code contains only relevant filters with no special manipulation, efficient plans are generated as Figure 3 shows.
The dynamic code uses its own input arguments. These arguments are declared in the second argument of sp_executesql and assigned with values in the section following the parameters declaration section. In the dynamic code, you should use input arguments rather than concatenating literals. Repeatedly invoking the dynamic code using the same list of parameters reuses the execution plan. Thus, only one plan will be created for each distinct list of parameters, even when the input values differ.
For example, suppose you invoke the stored procedure first by specifying a filter for only the order date:
EXEC dbo.usp_GetOrders @OrderDate = '19970101';
In the generated plan, the clustered index on the order date is used efficiently. If you invoke the procedure again with an order date filter, even with a different value such as
EXEC dbo.usp_GetOrders @OrderDate = '19970102';
SQL Server reuses the plan. The first time you invoke the procedure with a filter for the order ID, with code such as
EXEC dbo.usp_GetOrders @OrderID = 10248;
SQL Server generates a plan in which the order ID index is used efficiently. If you invoke the procedure again with a filter for the order ID
EXEC dbo.usp_GetOrders @OrderID = 10249;
SQL Server reuses the plan.
Besides examining an execution plan for efficiency, you can check whether a plan is reused by querying the syscacheobjects table. When you run the query
SELECT * FROM master.dbo.syscacheobjects WHERE sql NOT LIKE '%syscacheobjects%' AND sql NOT LIKE '%sp_executesql%';
you'll notice that repetitive invocations of the stored procedure with the same parameter list don't add execution plans but rather just increment the plan's usecounts value.
Be Careful of Rules of Thumb
Many T-SQL programmers are reluctant to use dynamic execution, probably because experts keep warning them against it. Sure, there are downsides and dangers, including:
- The user executing the stored procedure must be granted direct permissions for the code invoked dynamically.
- There's a risk of SQL injection.
- The dynamic code is invoked in a separate batch and is optimized separately.
- The code is usually lengthy, nonstructured, and hard to follow.
However, I usually try to not blindly adopt rules of thumb like never use dynamic execution. In this case, dynamic execution isn't so evil. As long as you're aware of the limitations, know how to protect your environment against security risks, and are willing to compromise some aspects of your solution, you can enjoy some significant benefits. In this example, all the static solutions suffer from performance problems because inefficient plans are generated. The solution that uses dynamic execution, however, produces efficient plans and even reuses the plans for the same parameter list.