A few months ago, one of my customers began experiencing performance problems with their new reporting environment. After rolling out an ad hoc reporting tool, some of the business users began building reports that were very complex and, in some cases, resulted in tens of thousands of rows being returned. My customer wanted to prevent these types of reports from being executed. With this requirement in mind, I began investigating whether SQL Server execution plans could provide a solution.
Let’s take a look at how execution plans can be used to estimate both the complexity of queries and the size of query result sets. I’ll first show you how to use execution plans within SQL Server Management Studio (SSMS), then show you how execution plans can be used from a client application.
Execution Plan Overview
When you run a query against the relational database engine, SQL Server carries out two major steps. The first step is to compile the query into a plan. The second step involves executing the plan. SQL Server 2005 can return a query execution plan in three formats: text, XML, and graphical. The easiest way to look at an execution plan is to view its graphical representation. With SQL Server 2005, you use SSMS to do so. For example, after opening a new query window in SSMS, select Query, Include Actual Execution Plan and run the following query against the AdventureWorks sample database:
ST INNER JOIN
Sales.SalesPerson SP ON
SP.TerritoryID = ST.TerritoryID
ORDER BY ST.Name;
This query is in the Showplan_Sample1.sql file ("Downloads" file at top of page).
You can see the result set and execution plan for this query in Figure 1 and Figure 2, respectively. Although an in-depth discussion of understanding execution plans is out of this article’s scope, you can find more information about execution plans by reading “Optimizing a Suboptimal Query Plan,” February 2007. However, let me point out a few basics. First, an execution plan is returned for each query submitted in a query batch. (In our example, we submitted only one query.) An execution plan is read from top to bottom, then right to left. For example, the first step in this execution plan is to perform an Index Scan of the AK_SalesTerritory_Name index to return the Territory name and ID columns. The results of this step are combined with the next step (a Cluster Index Scan against the SalesPerson clustered index). Then, working right to left, a nested loop is used to join the two tables together. Finally, the results of this nested loop are fed into the final SELECT operation.
Each step lists a cost value, which represents the cost of a step relative to the entire query. In our example, the AK_SalesTerritory_Name Index Scan represents 44 percent of the total query cost. Similarly, if you run a batch of queries, you’ll see the query cost relative to the entire query batch listed inside the top of each execution plan. (I’ll discuss more about what this cost value actually measures later.) Connecting each step is an arrow, whose thickness denotes the relative number of rows coming from each step. As you can imagine, information about each step in a plan can be very useful when trying to troubleshoot and optimize a query.
Note that you can view additional details for any step in an execution plan by selecting or hovering over the step with your cursor. In Figure 2, you can see the step details for the AK_SalesTerritory_Name Index Scan. Several columns of information are displayed here. Note that if you want to see an explanation of each column, click a step, then make sure the Properties window is visible within SSMS. To make this window visible, select View, Properties Window in SSMS. Note that many estimated values (e.g., Estimated CPU Cost) are displayed, as well as several actual values. Estimated values show what the query engine expects to happen, which leads us into a discussion of estimated execution plans.
Estimated Execution Plans
To view an estimated execution plan, select Query, then choose Display Estimated Execution Plan in SSMS. Unlike an actual plan, the query isn’t run; instead, only an estimated plan is returned. For large, complex queries, you’ll notice the estimated plan is returned very quickly (relative to running the actual query and viewing the actual plan). Also, when you look at the step details of an estimated plan, you’ll see only estimated values (i.e., none of the actual values associated with actual plans are returned).
As you can imagine, estimated values might not always match the actual values because several conditions, such as the cost of applying these filters and the number of rows returned, won’t be known until after the query is executed. The query engine calculates these estimated values based on statistics (SQL Server maintains statistical information about indexes and column data stored in the database), so the accuracy of the estimates can be affected if the statistics are out of date. For more information about statistics, see “Making the Most of Automatic Statistics Updating,” October 2007.
To help meet my customer’s requirement of identifying complex queries and/or queries that return large numbers of rows, we’re going to focus on two estimated values: Estimated Number of Rows and Estimated Subtree Cost. The Estimated Number of Rows value is self-explanatory. Estimated Subtree Cost is defined as the estimated cumulative cost of an operation and all child operations. In Figure 3, I’ve highlighted the SELECT operation, which is the final step of our query. The estimated number of rows for this query is 14. The estimated subtree cost is 0.0075236. But how is this cost measured?
According to SQL Server 2005 Books Online (BOL), cost is measured as the “estimated elapsed time, in seconds, required to complete a query on a specific hardware configuration.” This estimate is based on a test machine that the SQL Server developers used in their lab. Therefore, this cost value is a loose estimate. It’s important to note that the actual execution plan cost can vary greatly depending on your environment at the time the query is running.
I encourage you to run several different queries on your machine and examine the estimated values versus the actual values, as well as the actual time it takes to run the query. You’ll then have a better idea of what type of cost threshold is realistic for your environment.
Using Estimated Execution Plans in an Application
You should now have an understanding of how execution plans work and how to view the graphical form of an execution plan within SSMS. Now I’ll walk you through how to use an execution plan within an application. I mentioned that execution plans can be returned in text, XML, and graphical format. There are two types of text plans: one that’s generated using the SET SHOWPLAN_ TEXT command and one that’s generated using the SET SHOWPLAN_ALL command. SET SHOWPLAN_ TEXT returns a limited amount of information, which doesn’t include the estimated values we need. Using SET SHOWPLAN_ALL returns estimated values in a tabular format. It’s a possible candidate for consumption in an application. However, the XML version of a plan provides the richest level of detail (graphical plans are actually built from the XML version of a plan) and produces a single XML document for the query batch. There are existing examples of extracting data from an XML plan, which we can use as a starting point. In particular, I have developed samples using some of the code found in the Microsoft article “Processing XML Showplans Using SQLCLR in SQL Server 2005” (msdn2.microsoft.com/en-us/library/ms345130.aspx).
In my sample project, EstimatedExecutionPlan_ WinApp.vbproj, I have built a .NET Windows application that passes a query batch to a SQL Server 2005 database, which you can see in Figure 4. Let’s briefly walk through the application logic. In the top part of the main window, there’s a text box where you can enter a connection string for a SQL Server 2005 database. The large text box in the middle is used to specify a query or a query batch. There are also two text boxes near the bottom left side where you can enter values to represent the maximum desired cost and estimated rows for a query batch. When you click Run, the code shown in Web Listing 1 is executed. This code establishes a connection with the database, then executes the SET SHOWPLAN_XML ON command. Next, the code passes the query batch to SQL Server. SQL Server returns an XML version of the estimated execution plan. The plan is returned in a SQLDataReader object with one column of data, so the code iterates through the SQLDataReader object to concatenate the plan back into a single string. After executing the SET SHOWPLAN_XML OFF command, the code uses XML Path Language (XPath) to parse the string and calculate the total cost and total estimated rows for the query batch. (Note that a user must be granted the SHOWPLAN permission, which is a database-level permission, to generate an XML showplan.) These values are displayed in the two text boxes near the bottom right side of the form and are compared to the maximum desired values. The status text box at the very bottom of the form indicates when one of the estimated values exceeds the corresponding maximum desired value. Although all of the code resides within the client application, you could alternatively use a SQL Server CLR stored procedure to house this logic in an instance of SQL Server.
The sample application simply lets you examine the estimated cost and row count of a query batch, and compare those values to a maximum desired set of values. However, in your application, you could use this logic to examine and potentially prevent complex queries from running.
A Custom Data Processing Extension for SSRS
The customer I mentioned at the beginning of this article was using SQL Server 2005 Reporting Services (SSRS) and rolled out Report Builder for ad hoc reporting. (For more information about Report Builder, see my article “No-Nonsense Reporting Tool,” June 2006.) To leverage estimated plans in SSRS, a custom data processing extension must be built. In an upcoming article, I discuss how I built this extension.
Prevent Problematic Queries from Running
Execution plans are a great tool for developers and DBAs alike. Typically, they’re used to better understand, troubleshoot, and tune queries. However, I also think execution plans have the potential to help prevent problematic queries from ever running. This kind of “proactive avoidance” can be especially useful in ad hoc reporting environments.