Use waits and queues to improve SQL Server performance
SQL Serveris a large, complex topic. The key to successful performance tuning is breaking the job into smaller, simpler tasks. In Microsoft's SQL Server customer lab, the SQL Server Customer Advisory Team uses SQL Server waits and queues information as part of a repeatable methodology—a performance-tuning "toolbox"—to systematically track down and resolve customers' SQL Server performance problems.
We periodically bring customer applications into our SQL Server customer lab to evaluate application and database designs, measure performance, and make scalability recommendations. In only a week's time, we install the customer's application and databases, configure them for performance testing, and use the waits-and-queues methodology to pinpoint application bottlenecks, often revealing architecture and design problems. Although this methodology isn't a replacement for good physical design, proper indexing, and well-written SQL, it provides valuable insight into possible performance problems related to cache management, query-plan reuse, recompilation, transaction management, and resource utilization.
Together, waits and queues paint a detailed picture of an application's performance. SQL Server tracks wait statistics that include more than 50 reasons, called wait types, that a user connection might wait for a specific resource. You can see the wait information for user connections in the waittype and waittime columns in the master database's sysprocesses system table. SQL Server also aggregates waits across all user connections, providing a performance profile for a given workload. Thus, SQL Server wait types identify and categorize user (or thread) waits from an application-workload or user perspective.
Performance Monitor counters and other information sources provide the queues part of the performance methodology. When SQL Server executes an SQL statement for a user connection, it uses system resources such as memory, I/O, and CPU. If a request for system resources can't be immediately satisfied, the resource subsystem puts the request in a queue until the resources become available. Performance Monitor provides system counters that report the performance of these system resources.
To get useful information from the waits and queues performance data, you have to correlate wait types, Performance Monitor counters, and specific ratios of these counters to draw conclusions about performance and application bottlenecks. Here's how to evaluate the information you get from wait types.
Each user has an associated row in the master..sysprocesses system table. If a SQL Server user connection or thread isn't running for any reason, SQL Server records a wait type reflecting the reason and associated wait time in master..sysprocesses' waittype and waittime columns, respectively. Another sysprocesses column, lastwaittype, contains a character description of the last wait type for a given thread; SQL Server doesn't reset the lastwaittype description until another wait state occurs. Thus, a non-blank lastwaittype means the thread had at least one wait state. The waittype column is a varbinary column containing the current wait type. Together, a wait type of 0x0000 and a wait time of 0 means the thread isn't waiting—it's running or runnable. For any blocked user connection, sysprocesses also contains a wait type showing that the connection's execution is blocked pending the release of locks. (For tips about finding blocking processes, see Kalen Delaney's article "Track Down Troublemakers," August 2003, InstantDoc ID 39453.)
By itself, the sysprocesses table isn't a great source of information for evaluating overall system performance for several reasons. First, user connections are transient—they come, they go, they change. Second, during its life, a user connection can wait many times for different reasons. And finally, because the waittime column in sysprocesses isn't cumulative, it reflects only the current (or last) wait. Fortunately, an undocumented DBCC command provides a source of wait information that can be very useful for our waits-and-queues methodology.
The DBCC SQLPERF(waitstats) command contains aggregate waits for all user connections. You can use this command to identify Performance Monitor counters that affect performance. This DBCC option reads the part of SQL Server memory that contains cumulative wait information without adversely affecting system performance. However, using DBCC SQLPERF(waitstats) for waits and queues analysis has two disadvantages. First, you can't see the relative importance or ranking of waits by percentage (I explain the usefulness of breaking down waits by percentage later). Second, DBCC SQLPERF(waitstats) returns wait types such as synchronization objects over which you have no control. Although SQL Server uses synchronization objects (such as the WAITFOR object in the track_ waitstats stored procedure that I discuss in a moment) internally to guarantee the integrity of the SQL Server process and user data, you should exclude them from waits analysis because they can skew your results. To overcome these disadvantages, I wrote a stored procedure called track_waitstats that ranks wait types by percentage and eliminates wait types such as synchronization objects that we have no control over.
Track_waitstats Stored Procedure
The track_waitstats stored procedure in Listing 1 captures wait statistics from DBCC SQLPERF(waitstats) and ranks the waits in descending order according to how often they occur. The procedure displays each value as a percentage of the total waits. Solving the top problems provides the greatest improvement, and this ranking helps us identify the greatest opportunities for performance improvements. After track_waitstats gathers samples from DBCC SQLPERF, the stored procedure get_waitstats (which Listing 2 shows) provides a report that shows the wait type, wait time, and percentage of total waits for a workload. For example, the sample report in Figure 1 shows that network I/O waits cause the lion's share of wait time—48 percent. So improving network I/O is our best chance to improve this application's performance. You can also run get_waitstats during the execution of track_waitstats or even after track_waitstats completes.
Figure 1 shows that other performance-improvement opportunities for this application include reducing the number of exclusive locks (LCK_M_X) and the number of waits that slow transaction-log entries cause (WRITELOG). Exclusive-lock waits account for almost 13 percent of total wait time. Looking at transaction management might tell us whether we can make improvements in this area (e.g., we might be able to shorten lock durations). The WRITELOG wait type shows that threads are waiting for physical writes to complete to the transaction log. Because WRITELOG accounts for 11 percent of waits, we need to analyze Performance Monitor disk queues. Steady and high disk queues for the transaction log would indicate that the transaction-log drives might not have the I/O capacity to keep up with write requests.
Table 1 is an excerpt from the wait types document that our team uses to analyze waits. The document contains wait descriptions, probable causes, and other related considerations. (You can download the full document, "Wait Types, Perfmon Counters, and Correlations.") Table 1 shows wait-type descriptions and correlations to other performance information such as physical disk counters and buffer cache counters. The table isn't exhaustive, but it lets you see that if the information that track_waitstats returns shows a high percentage of a given wait type, you should look for correlated information to determine what might be causing those waits and to figure out how to solve the problem. "WaitTypes, Perfmon Counters, and Correlations" also contains the document we use to correlate Performance Monitor counter information with information from the track_waitstats stored procedure. The Performance Monitor counter document contains descriptions, additional correlations, and possible actions that our team can use to diagnose and correct performance problems.
Interesting Performance Monitor Ratios and Comparisons
Because Performance Monitor uses hundreds of counters, we must ask which counters are most relevant for each situation. Waitstats helps us identify performance-related counters. But waits information by itself isn't enough to accurately diagnose performance problems. The queues component of our methodology comes from Performance Monitor counters, which provide a view of system performance from a resource standpoint. Although the "Wait Types, Perfmon Counters, and Correlations" document is a useful resource, in complex cases, you have to include other counter information in your analysis. Table 2, which describes some of SQL Server's effects on CPU resources, shows how waits relate to CPU utilization and how utilization requirements affect database design choices and coding practices. Let's look at examples of problems that require extra information.
High ratio of compilations to batch requests. When you're working with online transaction processing (OLTP) applications, you usually want SQL Server to reuse query plans as much as possible to reduce the length of queues. When SQL Server reuses query plans, it doesn't need to compile the query before execution, thus reducing CPU utilization (which the System:Processor Queue Length counter shows) and shortening query-processing times. To get a ratio of compilations to batch requests, use the Performance Monitor counters SQLServer:SQL Statistics:Batch Requests/sec and SQLServer:SQL Statistics:SQL Compilations/sec.
When you find that SQL Server isn't reusing query plans efficiently, poor memory utilization (i.e., memory pressure) or poor coding practices might be to blame. To diagnose the cause, you can look at the SQL Server Profiler events SP:CacheMiss, SP:CacheInsert, SP:CacheRemove, SP:Recompilation, and SP:CacheHit. These events report plan activity in the cache. SP:CacheHit events show plan reuse. If a compile occurs, Profiler records the SP:CacheMiss and SP:CacheInsert events. SQL Server sometimes discards query plans to free memory for other activities. In such cases, the SP:CacheRemove event occurs. SP:Recompilation indicates that a recompile has occurred during the execution of a stored procedure. And if you notice a low value in the SQL Buffer Mgr:Page Life Expectancy Performance Monitor counter, your SQL Server system might be suffering from memory pressure.
To find out if poor coding might be the cause of a high compilation ratio, look at the master..syscacheobjects table to see all the query plans in the cache. High numbers can indicate plan reuse problems. You can use sp_executeSQL to parameterize ad hoc queries. Parameterization essentially means that you use variables instead of literal values as parameters in your queries. If ad hoc SQL statements are parameterized, SQL Server can reuse the query plans.
Ratio of compilations, recompilations, and batch requests. You can use the Performance Monitor counters SQLServer:SQL Statistics:SQL Compilations/sec and SQLServer: SQL Statistics:SQL Re-compilations/sec to diagnose compilation and recompilation problems. SQL Compilations/sec includes both initial compiles and recompiles for all stored procedures. Because SQL Re-compilations/sec includes only recompiles, subtracting SQL Re-compilations/sec from SQL Compilations/sec yields the number of initial compiles, as the following formula shows:
Compilations/sec - SQL Re-
If the number of initial compiles is low compared to recompilations, you might have a recompilation problem. In contrast, a low recompilation number means that most compilations are initial compilations and that SQL Server isn't effectively reusing query plans.
If initial query-plan creation is high (when compared to the number of batch requests), SQL Server might not be reusing query plans effectively. You can use the following formula to aid your analysis:
Requests/sec - Initial
Compiles/sec) / Batch
If Batch Requests/sec are higher than Initial Compiles/sec, you have some plan reuse. If Batch Requests/sec equals Initial Compilations/sec, you have no plan reuse. You can see which query plans SQL Server reuses by looking in the usecounts column in master..syscacheobjects. A usecounts value greater than 1 indicates plan reuse for executable plans.
Disk Queue Length vs. Avg. Disk Seconds/transfer. The Performance Monitor counters PhysicalDisk:Avg Disk Queue Length and PhysicalDisk:Avg Disk Seconds/transfer provide measures of I/O subsystem performance. A queue results when SQL Server can't immediately satisfy requests for I/O. As disk queue length increases, so does the Avg Disk Seconds/transfer value. Although many people rely on queue-length information to determine whether an I/O subsystem bottleneck exists, the best source of bottleneck information is the Avg Disk Seconds/transfer counter because it factors in the effects of queues. From an end-user point of view, the average time it takes to complete a read or write is a true reflection of I/O performance. For example, if you bulk-load data into SQL Server and both the bulk-load file and SQL Server database file are on the same drive, you could get a sustained disk queue length of 1 and poor Disk Reads/sec and Disk Writes/sec performance. The queue length of 1 might not seem to be a problem; but the Disk Reads/sec value might be as high as 25ms, indicating an I/O subsystem bottleneck. Table 3 shows other effects SQL Server might have on I/O resources.
Low page-life expectancy with high checkpoint pages and lazywrites. You can use the Performance Monitor counters Page Life Expectancy, Checkpoint Pages/sec, and Lazywrites/sec to find out how long data pages stay in cache and how much cache activity (checkpoints and lazywrites) is required to keep memory available for current activities. Low page-life expectancy combined with high checkpoint pages and lazywrites causes memory pressure. To improve performance, implement one or more of the following changes:
- Add more memory to the Windows server.
- Increase SQL Server memory.
- Avoid table and index scans by using proper indexing.
- Eliminate bad query plans.
These interesting ratios can help you find ways to diagnose more complex performance problems. If a performance problem persists, you can keep extending your search by looking at other factors that might affect your system, such as network limitations. The sidebar "Diagnosing Network Problems" describes a method for determining how big a part network limitations play in performance.
Application and Database Design Problems
Application and database design affect performance in a variety of ways depending on the nature of your business and how you use your database. When you use the waits-and-queues methodology, you have to consider the design constraints of the application or database you're using. Table 4 describes some application-design implications.
For example, in an OLTP environment, plan reuse is typically desirable when you're performing highly predictable inventory lookups or order entry. In these cases, result sets are uniform and plan reuse is important for scalability. In other environments, such as those that extensively use ad hoc queries and reporting, plan reuse might not be desirable. Plan reuse isn't good when result-set sizes can vary greatly depending on parameter values or when intermediate work tables vary in size. For example, finding out how many widgets your company sold yesterday in California is different from finding the number of widgets sold last year, and you don't want to reuse the same query plan to find these two values.
Database design problems can range from improper indexing to overly normalized database designs in an OLTP environment. The kind of correlations you discover in your waits and queues analysis will depend on how you use your database.
The key to using the waits-and-queues methodology is to corroborate wait statistics with queues information to draw meaningful conclusions about performance problems. For example, a high percentage of a certain SQL Server wait type signals the need for further Performance Monitor investigation of the usage of underlying resources such as the processor, I/O subsystem, and network.
In some cases, an experienced performance expert must look beyond the symptom to find the root problem, and the "Wait Types, Perfmon Counters, and Correlations" document can help you pinpoint causes of performance problems. The document's correlated performance information, possible conclusions and actions, and interesting ratios and comparisons sections shed light on root problems for given symptoms. Correlating wait types information with performance counters and related counter ratios provides a broad view of application performance.