Last month, in "Getting to Know Virtual File
Stats" (InstantDoc ID 96513), I showed you
how to use a particular dynamic management view
(DMV) to monitor the usage of physical I/O with
respect to SQL Server 2005. This month, I take a
similar approach with another DMV that lets you
monitor wait stats on your SQL Server instance.
Wait Stat Basics
Whenever a request is made within SQL Server
that - for one of many reasons - can't be immediately
satisfied, the system puts it into a wait state. The SQL
Server engine internally tracks the time spent waiting,
aggregates it at the instance level, and retains it in
memory. Using a new DMV in SQL Server 2005
called sys.dm_os_wait_stats, you can read these metrics
(or statistics) at any time.
Sys.dm_os_wait_stats materializes the in-memory
values of the wait stats as a typical result set that
represents the total waits for all processes that have
finished since the counters were last reset. Be aware
that SQL Server resets these counters to 0 at each
restart; alternatively, you can manually reset them
by executing the command
DBCC SQLPERF ('sys.dm_os_wait_stats',
CLEAR);
Keep in mind that each time you query this view,
you'll get a snapshot from that point in time, which
includes the total waits on the instance. Ideally, you
want to see a differential or delta of two distinct
snapshots so that you can better gauge the amount of waits during a particular time period. Table 1 shows the output of the DMV.
At the time of this writing, there are just over
200 different wait types available for SQL Server to
monitor. Microsoft will add more over time, with
each new version and service pack. You'll probably
never have to deal with most of these wait
types, so you don't have to worry too much about
knowing what they all mean. If you want details
about each wait type, check out the Microsoft article
"SQL Server 2005 Waits and Queues" (http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/performance_tuning_waits_queues.mspx). The article
includes some scripts for capturing and reporting on
the waits in your system. In the SQL Server Magazine
Performance Tuning and Optimization subforum
(http://sqlforums.windowsitpro.com/web/forum/categories.aspx?catid=256), I'll add my own rendition
of how to capture and report on the wait stats in both
SQL Server 2005 and SQL Server 2000.
Digging into It
Table 1's waiting_tasks_count column gives you an
indication of how many times the processes have
begun to wait, for that particular wait type. Generally,
the most important column is the wait_time_ms
column, which tells you how many total milliseconds
passed for each wait type. Obviously, the longer the
wait time, the less efficient the process becomes. But
another aspect of waits that you need to be keenly
aware of is signal_wait_time.
Figure 1, page 12, helps illustrate the importance
of the signal_wait_time column. When a thread can't
continue because a resource has become unavailable,
the system places the thread into a wait state of one
type or another. This scenario can occur for a number
of reasons, but for the purpose of this article, suppose you need to read 100 pages of data, but only the first
few pages are in the data buffer cache. Also suppose
that the request to fetch the other pages from the disk
and place them into the cache where you can read
them will take 20 milliseconds to complete, and you
have lots of users requesting time on the processors.
Once the thread goes into the wait state, the scheduler will pull that thread off the processor and place it in
the worker queue so that another thread that's ready
and able to process data can run.
The point labeled T0 in Figure 1 illustrates this
moment, which is the start of the wait time. At this
point, SQL Server starts keeping track of how long the
thread has been waiting, and for which wait type. In
this case, it will most likely be a PageIOLatch wait -
typically associated with disk-to-memory transfers.
After approximately 10 milliseconds, the I/O request
has been satisfied and the page is in cache and ready
to be read by your thread, as reflected by the point
labeled T1. However, to read this data and continue
the processing, the thread must be reloaded back
onto the processor from the queue. If other threads
are currently running, it might be a while before the
processor becomes available again - reflected as the
point labeled T2. The time between T1 and T2 is
called the signal wait time and really has nothing to
do with the original problem that caused the wait but
gets reflected in the overall wait time, which appears
in the DMV's wait_time_ms column. Lots of signal
waits indicate excessive CPU pressure, whereas fewer
signal waits place the bottleneck on the specific wait
type resource. As you can see, this is an important
distinction that - if not accounted for - can lead you
down the wrong path.
Find Bottlenecks
If you want to know where your SQL Server instance
spends most of its time waiting, simply use the
built-in DMV to occasionally peek at these statistics.
If you order the report in descending order of most
waits to least waits, you can quickly see where your
largest bottlenecks are and where you might want to
investigate problems and determine how to minimize
those waits. Keep in mind that there will always be
waits in an active system. The key is to know when
those waits are excessive and to be proactive in
addressing them.
End of Article