T-SQL is a declarative querying language that lets you phrase your request in a logical English-like manner. The SQL Server database engine then processes the query. The relational engine (the query processor) uses the query optimizer to translate the logical request into a physical plan known as the query execution plan. The plan holds the instructions regarding which physical structures to access, how to access them, in what order, which join algorithms to use, whether to sort data, whether to spool data, and so on.
SQL Server gives you the means to analyze an execution plan using graphical as well as XML forms. Understanding what you see in a plan can help you determine the reasons for poor query performance and therefore is a key element in query tuning. The topic deserves much more than a single article, and fortunately there are entire books dedicated to the subject. For this article, I assume that you have some basic knowledge about query plans and are looking for the next step -- tips on how to analyze query plans efficiently.
For sample data, I used a database called Performance. The source code is available for download.
Estimated vs. actual plans
As you probably know, SQL Server Management Studio (SSMS) gives you the option to obtain both an estimated and an actual query plan. For the former, highlight the query in question and click the Display Estimated Execution Plan (Ctrl+L) button in the SQL Editor Toolbar. For the latter, click the Include Actual Execution Plan (Ctrl+M) button and execute the query. Either way, SQL Server has to create the execution plan before executing it. But with the estimated plan, SQL Server won't run the query, and for some measures (e.g., number of rows returned, number of executions of the plan iterator), SQL Server shows only estimates. With the actual plan, SQL Server will create a plan, make some runtime decisions (e.g., actual degree of parallelism -- DOP, memory grants), collect runtime information (e.g., actual numbers of rows, actual numbers of executions), and present all those along with the estimates in the actual execution plan. When you analyze query plans, you'll typically find that actual plans are much more informative and interesting. In fact, often what can help you investigate suboptimal plans is comparing estimated measures with their actual counterparts.
Let's take the following query (call it Query 1) as an example to demonstrate an actual execution plan (which Figure 1 shows):
ORDER BY orderid;
There are two ways to follow the logic in the plan. One is based on data flow order, which is right-to-left, top-to-bottom. The other is the order of the API calls, which actually start with the root node (left-most, top-most). The data flow order is typically the one you want to follow. So in our case, the plan uses an Index Scan iterator to scan the first 25 rows in the nonclustered index PK_Orders (defined on orderid as the key). For each of the 25 rows, the Nested Loops iterator invokes a Key Lookup iterator to look up the related data row to obtain the rest of the attributes the query needs to return. The Top iterator is the one that causes the scan to stop after 25 rows.
Sometimes though, like with our plan, if you follow data flow order, some things can be confusing. For example, how does the Index Scan iterator know to stop after 25 rows if the Top iterator that tells it to stop appears later in the plan? The answer is that the internal API calls start with the root node (SELECT iterator in our case). This root node invokes the Top iterator. The Top iterator invokes a method 25 times that asks for a row from the Nested Loops iterator. In turn, the Nested Loops iterator invokes a method 25 times that asks for a row from the Index Scan iterator. Hence, the Index Scan iterator doesn't proceed beyond the 25 first rows that it scans. In short, although it's usually more intuitive to follow data flow order to interpret the plan, understanding the API call order can help you better understand what's really going on.
Notice in the plan in Figure 1 that the arrows represent the data flow. The yellow tooltip box that you get when you place your mouse pointer on an arrow shows you the estimated versus actual number of rows. Similarly, in the properties of an iterator, such as the Key Lookup iterator, you get information about estimated versus actual number of executions of the iterator, among other things. Note that there's an important difference between the estimated and actual number of rows for an iterator that appears in the inner (bottom) branch of the Nested Loops iterator, such as the Key Lookup iterator in our case. In such a case, the estimated number of rows is for a single execution, whereas the actual number of rows is for all executions. Observe in Figure 1 that the number of rows estimated to be returned from the Key Lookup iterator is 1 and the actual number is 25. This isn't a bad estimate; in fact, it's a pretty accurate one.
For each iterator, the yellow tooltip box provides a subset of the properties. Right-clicking an iterator and selecting Properties exposes the Properties window (if not already visible), showing all iterator properties. For example, observe in Figure 1 that the tooltip box for the root node SELECT shows some properties, such as the query cost, actual DOP, and plan size. The Properties window shows more properties, such as the optimization level (e.g., FULL, TRIVIAL), compile CPU and memory, whether the plan was retrieved from cache, reason for early termination of optimization (e.g., good enough plan was found), and so on.
What to look for in the plan
If you're analyzing a plan for a poorly performing query, what should you look for? As I mentioned, one of the important things to look for when you suspect that the optimizer made suboptimal choices is bad estimates. You can identify bad estimates when there's a big discrepancy between estimated numbers and actual numbers (e.g., number of rows, number of executions). Consider the following query as an example (call it Query 2) and its plan (which Figure 2 shows):
SELECT orderid, custid, empid, shipperid, orderdate, filler
WHERE orderid <= @i;
Notice how different the estimated number of rows is from the actual number of rows. The optimizer made a choice to do a full scan of the clustered index and not use a nonclustered index with a few lookups because of the estimated low selectivity of the filter (large percent of qualifying rows).
Now that you know the cause of the suboptimal choice, try to figure out the cause of the bad estimate. In this particular example, it has to do with the fact that the query uses a variable, and a variable value can't be sniffed when the optimizer does batch level optimization, which is what it does initially by default. With batch level optimization, the optimizer doesn't first assign the value to the variable and then optimize; instead, it optimizes before the assignment is actually executed. So the optimizer relies on a hard-coded estimate of 30 percent selectivity, which translates to 300,000 rows in our case.
After you identify the problem, you can evaluate possible solutions. For example, you can force a statement level recompile by using the RECOMPILE statement hint, forcing SQL Server to optimize the query after assigning the value to the variable. The following query (call it Query 3) implements this option, generating the plan that Figure 3 shows:
SELECT orderid, custid, empid, shipperid, orderdate, filler
WHERE orderid <= @i
Now the estimate is much better -- hence an optimal choice of using a nonclustered index with a few key lookups.
Another important thing to notice is that each iterator in the plan has a percent associated with it reflecting the relative cost of the iterator out of the entire query cost. You should be aware, though, that the cost is always based on estimates and not actuals. So if there's a big discrepancy between the two, you need to remember that the percent isn't a good representation of reality. Often the plans are very large and detailed, and it's difficult to see the forest for the trees. Focusing your attention on the iterators with the high percent lets you be more efficient with your tuning efforts. If you manage to reduce or eliminate those costs, the effect is likely to be greater than elsewhere.
It's also important to notice the thickness of the arrows. The more rows that flow from one iterator to the next, the thicker the arrow. What's interesting here is that in an actual plan, the thickness is based on actual number of rows rather than estimated number of rows (contrary to the percentages, for example). Therefore, you should pay special attention to areas in the plan where thick arrows appear before expensive iterators such as Sort, Spool, and others. Consider the following query as an example (call it Query 4) and its plan (which Figure 4 shows):
FROM dbo.Orders AS O
INNER JOIN dbo.Customers AS C
ON O.custid = C.custid
ORDER BY O.empid;
The two most expensive parts of the plan are the Clustered Index Scan and the Sort. Also, there's a thick arrow with 1,000,000 rows going into the expensive sort. With your attention focused on these two iterators, you realize that you don't have an index on empid that could support the query, and therefore the optimizer had to fully scan the table and apply a sort. What's even worse is that the Sort spilled to disk (couldn't fit in memory). If you create an index on empid, you'll get a much more efficient plan, but what led to this realization was understanding how to analyze the plan.
Other interesting things to look for
There are several other interesting things to observe when analyzing query plans, such as the difference between rebinds and rewinds. For some iterators, such as Spool and Sort, when they appear in the inner part of a Nested Loops join, the ActualRebinds and ActualRewinds have meaningful values. The rebinds measure represents a case in which at least one of the correlated parameters (e.g., the join column value) changed and requires the inner iterator to be reevaluated. The rewind measure means that correlated parameters didn't change -- so instead of reevaluating the iterator, the plan can reuse the previous result. Rebinds represent actual work, whereas rewinds reuse a previous result. Consider the following query as an example (call it Query 5) and its plan (which Figure 5 shows):
(SELECT COUNT(*) AS cnt
FROM dbo.Orders AS O2
WHERE O2.shipperid = O1.shipperid) AS shippercnt
FROM dbo.Orders AS O1 TABLESAMPLE(1000 ROWS);
This query uses the TABLESAMPLE option to sample about 1,000 rows (1,120 in my execution) from Orders (aliased as O1) and uses a subquery against a second instance of Orders (aliased as O2) that scans and counts how many orders the current shipper handled. The inner branch of the Nested Loops join is responsible for scanning and counting the orders in O2 that are associated with the current shipper ID from O1. There are only five distinct shipper IDs in the table, so it would be a waste to repeat the work for the different occurrences of the same shipper ID. So the optimizer sorts the approximately 1,000 rows from O1 by shipper ID to optimize the use of rewinds. The optimizer needs to rebind only for each new shipper ID (only five times) and rewind in all other cases (1,115 times). This is important to know so that you can understand how many times the bottom branch did actual work, which in this case is only five.
Another interesting aspect of query plans is understanding the distinction between a Seek Predicate and a Predicate in index access methods, such as an Index Seek and Index Scan. It's important to note that when the index tree and its key ordering are relied on to actually filter the rows, the predicate will show up in the iterator's properties as a Seek Predicate (which is a good thing). However, it could be that out of the rows that do get physically scanned, the iterator will need to evaluate other predicates to determine whether to return the row or discard it. Such a predicate appears as a Predicate and not a Seek Predicate. In such a case, the index tree and its key order aren't relied on to go straight to qualifying rows; instead, the rows do get physically accessed. Consider the following query as an example (call it Query 6) and its plan (which Figure 6 shows):
WHERE orderdate >= '20080101'
AND shipperid = 'A';
There's a clustered index on the table with orderdate as the key. The plan performs a seek in that index based on the seek predicate orderdate >= '20080101'. This means that only rows that satisfy this predicate are physically accessed. However, the index tree and its key ordering don't address the filtering of the shipper ID, hence the remaining rows hold orders for all shippers in the qualifying range of dates. So all the remaining rows are physically accessed, and only those that satisfy the predicate shipperid = 'A' are returned, and the rest are discarded. There are five distinct shipper IDs in the Orders table, with a fairly even distribution. This means that the plan physically scans five times the data that it actually needs to return. So here you can identify potential performance improvement by creating a covering index with (shipperid, orderdate) as the key list.
The last thing I want to mention relates to a very common optimization problem known as parameter sniffing. I assume that you have a basic understanding of plan caching, reuse, and what parameter sniffing means -- so I'll focus on identifying the problem in the plan. Briefly, SQL Server has the ability to reuse previously cached plans by default. However, it creates a plan based on the parameter values that were sniffed when the plan was created. If subsequent executions provide similar types of inputs, plan reuse is beneficial, saving the time and resources associated with reoptimizing every time you run the code. But when the inputs are atypical, reuse can lead to very inefficient plans. So, how can you identify this when analyzing a plan?
As an example, consider the following procedure:
@oid AS INT
SELECT orderid, custid, empid, shipperid, orderdate, filler
WHERE orderid <= @oid;
For a selective input,
It makes sense to use the nonclustered index that exists on orderid along with some lookups. For a nonselective input, it makes sense to do a full clustered index scan. However, the type of plan that will be cached and subsequently reused depends on the sniffed input parameter value when the plan needed to be created (e.g., there was no plan in cache, recompile happened).
For example, consider the following code (call it Query 7), which executes the procedure for the first time with a selective input and generates the plan that Figure 7 shows:
The optimizer chose to use the nonclustered index and some lookups because of the high selectivity estimate. Notice that the estimated and actual numbers of rows are the same, telling you that the plan is probably suitable for this case. However, this is the plan that's now cached. Try executing the procedure again with a nonselective input (call it Query 8), and examine the plan in Figure 8:
It's the same plan because the cached plan got reused. But for the current input, it's horribly inefficient. Notice that the estimated number of rows is 25, whereas the actual number of rows is 1,000,000. As you might have guessed, the estimated number of rows is based on the compiled value -- not the runtime value. The actual number is based on the runtime value. Remember also that the thickness of the arrows reflects the actual number.
Examine the XML form of the plan (right-click an empty area in the plan and select Show Execution Plan XML). Look near the end of the XML for ParameterList. You'll find the following data:
<ColumnReference Column="@oid" ParameterCompiledValue="(25)" ParameterRuntimeValue="(1000000)" />
This is a great way to identify parameter sniffing problems. Now you can evaluate your options, such as using the recompile procedure or query options.
Use the information at hand
Understanding query plans is a skill that's crucial for tuning queries efficiently. SQL Server provides rich information about query plans in the graphical as well as XML forms of plans. It's up to you to know how to grab that information.