Last month, in "Getting to Know Virtual File Stats," I showed you how to use a particular dynamic management view (DMV) to monitor the usage of physical I/O with respect to SQL Server 2005. This month, I take a similar approach with another DMV that lets you monitor wait stats on your SQL Server instance.
Wait Stat Basics
Whenever a request is made within SQL Server that—for one of many reasons—can't be immediately satisfied, the system puts it into a wait state. The SQL Server engine internally tracks the time spent waiting, aggregates it at the instance level, and retains it in memory. Using a new DMV in SQL Server 2005 called sys.dm_os_wait_stats, you can read these metrics (or statistics) at any time.
Sys.dm_os_wait_stats materializes the in-memory values of the wait stats as a typical result set that represents the total waits for all processes that have finished since the counters were last reset. Be aware that SQL Server resets these counters to 0 at each restart; alternatively, you can manually reset them by executing the command
Keep in mind that each time you query this view, you'll get a snapshot from that point in time, which includes the total waits on the instance. Ideally, you want to see a differential or delta of two distinct snapshots so that you can better gauge the amount of waits during a particular time period. Table 1 shows the output of the DMV.
At the time of this writing, there are just over 200 different wait types available for SQL Server to
monitor. Microsoft will add more over time, with each new version and service pack. You'll probably never have to deal with most of these wait types, so you don't have to worry too much about knowing what they all mean. If you want details about each wait type, check out the Microsoft article "SQL Server 2005 Waits and Queues." The article includes some scripts for capturing and reporting on the waits in your system. In the SQL Server Magazine Performance Tuning and Optimization subforum, I'll add my own rendition of how to capture and report on the wait stats in both SQL Server 2005 and SQL Server 2000.
Digging into It
Table 1's waiting_tasks_count column gives you an indication of how many times the processes have begun to wait, for that particular wait type. Generally, the most important column is the wait_time_ms column, which tells you how many total milliseconds passed for each wait type. Obviously, the longer the wait time, the less efficient the process becomes. But another aspect of waits that you need to be keenly aware of is signal_wait_time.
Figure 1 helps illustrate the importance of the signal_wait_time column. When a thread can't continue because a resource has become unavailable, the system places the thread into a wait state of one type or another. This scenario can occur for a number of reasons, but for the purpose of this article, suppose you need to read 100 pages of data, but only the first few pages are in the data buffer cache. Also suppose that the request to fetch the other pages from the disk and place them into the cache where you can read them will take 20 milliseconds to complete, and you have lots of users requesting time on the processors. Once the thread goes into the wait state, the scheduler will pull that thread off the processor and place it in the worker queue so that another thread that's ready and able to process data can run.
The point labeled T0 in Figure 1 illustrates this moment, which is the start of the wait time. At this point, SQL Server starts keeping track of how long the thread has been waiting, and for which wait type. In this case, it will most likely be a PageIOLatch wait - typically associated with disk-to-memory transfers. After approximately 10 milliseconds, the I/O request has been satisfied and the page is in cache and ready to be read by your thread, as reflected by the point labeled T1. However, to read this data and continue the processing, the thread must be reloaded back onto the processor from the queue. If other threads are currently running, it might be a while before the processor becomes available again—reflected as the point labeled T2. The time between T1 and T2 is called the signal wait time and really has nothing to do with the original problem that caused the wait but gets reflected in the overall wait time, which appears in the DMV's wait_time_ms column. Lots of signal waits indicate excessive CPU pressure, whereas fewer signal waits place the bottleneck on the specific wait type resource. As you can see, this is an important distinction that—if not accounted for—can lead you down the wrong path.
If you want to know where your SQL Server instance spends most of its time waiting, simply use the built-in DMV to occasionally peek at these statistics. If you order the report in descending order of most waits to least waits, you can quickly see where your largest bottlenecks are and where you might want to investigate problems and determine how to minimize those waits. Keep in mind that there will always be waits in an active system. The key is to know when those waits are excessive and to be proactive in addressing them.