One of SQL Server's most highly touted benefits has always been its ability to use stored procedures. SQL Server can save and reuse execution plans, so once you've compiled and stored a procedure, you can save time during subsequent executions. This ability can be quite useful in SQL Server 7.0 and later, in which the time needed to optimize and compile procedures can be a measurable percentage of the total execution time and frequent recompilations can greatly reduce overall system performance. Using stored procedures has other benefits. The process lets you modularize or encapsulate code. And stored procedures can reduce network traffic: Only the call to execute the procedure needs to be sent to the back-end SQL Server, rather than all the code that makes up the procedure definition. However, reusing saved plans isn't always the best choice.
This month, let's examine a few important terms, examine how and why to keep track of compilation, and find out how to tell when a plan is—or isn't—being reused. (Next time, we'll go into how to decide whether recompiling is the right choice.)
Compile vs. Optimize
You might be accustomed to seeing the terms optimize and compile used interchangeably. However, when we talk about the cost of compiling or recompiling a query, we're actually talking about the cost of optimization, which is only a subset of compilation.
Optimization is the most expensive part of the compilation process because it takes the most time. As part of the optimization process, when SQL Server compiles queries, procedures, or batches, it translates them into an internal form that it can more easily execute. This form isn't a machine-readable form (like machine code), but it's close. Every T-SQL statement—including the data definition statements you use to create or alter objects, your variable declarations, your flow control statements (such as IF and WHILE), and your data-access and -manipulation statements—must be compiled.
The optimizer is the part of the SQL Server engine that determines the best way to carry out your commands. The optimizer decides which indexes (if any) to use, which join algorithm to use for each pair of tables in a query, in which order to join your tables, and how to process grouping operations. In general, the optimization process applies to only four statements: SELECT, INSERT, UPDATE, and DELETE. These four statements are the only candidates for optimization because they're the only ones for which more than one possible plan exists.
Compile vs. Recompile
When we talk about avoiding recompilation, we're really talking about avoiding the reoptimization of our batches. The unit of compilation is the batch or stored procedure, so when SQL Server needs to recompile a statement, it actually recompiles the entire batch or procedure. (SQL Server 2005 will sometimes be able to perform partial compiles; I'll discuss that feature in a future article.) For now, let's look at the compilation and recompilation of stored procedures.
In "SET Options and Recompilation," February 2001, I discussed a situation in which SQL Server will recompile a previously compiled procedure: when the values of certain SET options have changed. SQL Server also automatically recompiles stored procedures when certain changes occur to the tables referenced in the stored procedure; these changes include updating of statistics, dropping an index, or other metadata changes. You can force SQL Server to recompile a stored procedure on its next execution by running the procedure sp_recompile. In addition, if you pass sp_recompile a table name instead of a procedure name, SQL Server will recompile every procedure that references the given table the next time the procedure is executed.
Reusable vs. Re-Entrant
In SQL Server 6.x, stored procedure plans were reusable but not re-entrant (i.e., capable of being used simultaneously by multiple processes). When a process was executing a procedure, any other processes that tried to run the same procedure got a completely new plan. This meant that you could have multiple identical plans in cache, thus wasting memory. But starting in SQL Server 7.0, Microsoft divided into two parts the in-memory data structures that SQL Server uses to maintain plan information. The compiled plan contains the plan's actual steps and is both reusable and re-entrant; the execution context contains specific information (e.g., specific values for passed parameters, local variables, internal worktables) that a process needs when executing the procedure. Multiple simultaneous executions of a procedure require multiple execution context structures. After those executions are completed, subsequent calls to execute the procedure can reuse an existing execution context's data structures, but each execution context will be re-initialized so that previous parameter values and local variables no longer exist.
Tracking Plan Reuse
A wonderful tool is available to keep track of plan reuse on an object-by-object basis: the pseudo system table syscacheobjects. (A pseudo system table doesn't exist on disk; when queried, the table materializes from information that SQL Server maintains in its memory cache. This table contains some useful columns for detecting when SQL Server has reused a stored procedure plan.
- The cacheobjtype column indicates whether the plan is the compiled plan or the execution context (which syscacheobjects lists as executable plan).
- The bucketid column is the value that lets SQL Server quickly find a plan for a procedure in cache. Multiple entries for a procedure, including all applicable compiled plans and executable plans, have the same bucketid value.
- The objtype column indicates the type of object associated with the cached plan. SQL Server 7.0 and later can cache plans for ad hoc queries, prepared queries, views, and other objects, in addition to plans for stored procedures. Syscacheobjects lists procedure plans as Proc.
- The usecounts column tells how often a particular plan has been used.
- The sql column contains the text of the executed SQL statement. In the case of a stored procedure, this is the procedure name.
Let's run through a few examples that illustrate how to determine whether a plan is being reused. If you want to experiment with similar scripts, be sure to run them on a test server.
First, you need to clear SQL Server's memory of existing plans. You can run DBCC FREEPROCCACHE, which removes all plans from all databases on your SQL Server test system, or you can clear out the plans for a specific database by running DBCC FLUSHPROCINDB and passing a parameter specifying the database ID number. The database ID can't be a variable or a function, but you can use dynamic execution to programmatically specify the database ID of any database. The code that Web Listing 1 (which you can download at the top of the page) shows creates a new database called testdb, copies into testdb the orders table from the Northwind sample database, then creates a stored procedure that returns a subset of the rows in the orders table, based on the value of a parameter. Finally, the script removes all the procedure plans in testdb.
Next, call the stored procedure:
Then, run the code that Web Listing 2 shows, to examine the syscacheobjects rows for the procedure's plans. Web Figure 1 (which you can download at the top of the page) shows the results.
If you call the procedure a second time, using a different parameter:
then run the code in Web Listing 2, the only change is that the usecounts value for the executable plan is now 2 instead of 1. Each time you run the procedure then run the code in Web Listing 2, the usecounts value increases: The plan is reused, but no recompilation takes place.
You can change certain SET options to force SQL Server to compile a new plan. For example, if you change the value of CONCAT_NULL_YIELDS_NULL to OFF, run the procedure again,
EXEC getorders 'HANAR'
then run the code in Web Listing 2, you'll see a new pair of syscacheobjects rows for a whole new plan, based on the changed SET options, as Web Figure 2 shows. For future executions, any session that sets CONCAT_NULL_YIELDS_NULL to ON will use the second executable plan, and any session that sets it to OFF will use the first executable plan. SQL Server can tell the difference by inspecting the setopts column in syscacheobjects. This column is a bitmap, with one bit for each SET option for which a change in the value requires a new plan. The setopts value is 4347 when CONCAT_NULL_YIELDS_NULL is set to ON and is 4339 when CONCAT_NULL_YIELDS_NULL is set to OFF. (The bit for this particular SET option is the 8 bit.)
You can also force SQL Server to generate a new plan when executing the procedure, by running the command
Any plan generated when you execute the procedure WITH RECOMPILE is used for that execution only and isn't saved in cache for reuse. Note that if you execute the procedure WITH RECOMPILE, then run the code in Web Listing 2 again to inspect the cache, you'll see no additional plans and the usecounts values of the existing rows won't change.
You can also recreate the procedure and include the WITH RECOMPILE option to force SQL Server to generate a new plan every time it executes the procedure. In this case, SQL Server knows that it will never reuse a plan, so it never saves any of them. The act of altering the procedure invalidates all plans currently in cache, and running the code in Web Listing 2 shows that no plans exist for getorders. You can also invalidate all plans in cache by running sp_recompile on the getorders procedure, which instructs SQL Server to recompile the procedure the next time it runs.
Good or Bad?
The pseudo table syscacheobjects is my favorite tool for determining when SQL Server is reusing a plan, but it doesn't tell me how much effort recompiling will require and it doesn't tell me whether recompiling is a good or bad idea. In my next column, I'll explore some ways to answer these questions, and I'll show you some tools that SQL Server provides to give you a bigger picture of how often recompilation is occurring and how much cost frequent recompiles incur.