Tracking Live SQL Server File Stalls

Given that SQL Server workloads are commonly I/O bound, a great way to get a sense for how a given server is keeping up with its current workload is to evaluate how long the I/O subsystem is taking to respond to requests by SQL Server to read or write data.

Related: I/O I/O It's Why My Server's Slow

File Stalls: A Quick and Dirty Overview

A common mechanism to retrieve this information is simply to ask SQL Server for this info (which it keeps tabs on—thanks to the fact SQL Server's own, internal, scheduler keeps tabs on how long I/O requests take to be satisfied by the OS)—which you can do with a query like the following:

SELECT  
        DB_NAME(vfs.database_id) [db_name],
    io_stall_read_ms / NULLIF(num_of_reads, 0) avg_read_latency,
    io_stall_write_ms / NULLIF(num_of_writes, 0) avg_write_latency,
    physical_name [file_name],
    io_stall / NULLIF(num_of_reads + num_of_writes, 0) avg_total_latency
FROM    
        sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
    JOIN sys.master_files AS mf
                ON vfs.database_id = mf.database_id AND vfs.FILE_ID = mf.FILE_ID
ORDER BY
        avg_total_latency DESC;

From such a query (which will run instantaneously on your server), you'll be able to evaluate what are commonly referred to as 'file stalls'—or the amount of elapsed time (expressed in milliseconds) that SQL Server spends waiting on (stalls against) the I/O subsystem before individual read/write operations are complete.

NOTE: The query above is quite simplifiedas there are a large amount of other details that COULD be (and commonly ARE) evaluated to get a better idea of what's going on with IO requests. For more info see Books Online.

In terms of evaluating file stall metrics, an ideally suited server (or I/O subsystem) will show latencies of 0-8ms for reads and writes against data files, and 0-4ms for writes and reads against log files. And while 0ms might seem to be an impossible feat, the reality is that an I/O subsystem with plenty of spindles AND cache can very regularly hold its own at 0-2ms even over long periods of time.

Many production systems, however, don't quite find themselves in the range of the ideal 0-8ms and 0-4ms specified above. And, to that end, servers running in/around 10-20ms can commonly 'handle themselves' just fine in MANY (not all) cases. Otherwise, as numbers start climbing up and above the, say, 20ms range on a repeated basis it's either time to start tuning indexes and code as needed or throwing more hardware at your workload in many cases.

Caveat: Aggregate Numbers Can be Misleading

All that said, I've got some clients with what I'd call finely tuned servers whose file stall metrics commonly climb above 100ms, 500ms, and even—occasionally, 2000ms (or two full seconds) across multiple databases on the same server—yet I'm NOT worried about disk performance on their systems at all.

On the surface it sounds crazy for me to say that average stalls of a tenth-of-a-second, a half-of-a-second, or even two full seconds against multiple databases would be 'fine.' But, when you consider that these are aggregate numbers—or metrics collected over long periods of time, this means that IO saturation at certain points or times during the day can dramatically inflate stall numbers. (Note that I'm NOT saying that the numbers are incorrect or invalid—LONG stalls are, indeed, accurately inflating these metrics. Instead, I'm stating that when carefully analyzed, these numbers MIGHT not be a problem in SOME environments.)

So, for example, with one of these clients in question, a large number of maintenance routines (both in terms of backups, index defrags/rebuilds, stats maintenance, and then a HOST of aggressive 'business jobs' running late at night (on a heavily-used multi-tenant system)) cause some decent, and ugly, bottlenecks against the underlying I/O subsystem. But, in this case, the bottlenecks are all against the data files (as there's a lot of row-by-row processing in larger tables as well as a gob of deletions from larger and older auditing tables) and so the sluggishness really isn't a problem. At all.

Of course, if the jobs took too long to complete or weren't completing within their specified 'maintenance window' then this would require some additional effort or hardware. But until then, if the IO subsystem is getting 'worked' fairly hard every night but stays fast and responsive during the day, that happens to work fine for this particular workload.

Checking on Live File Stall Metrics

To that end, the point of this post is just to call out that looking at aggregate metrics can be misleading—or might not tell the whole story. As such, whenever I find above-ideal metrics for file stalls, I always like to take a peek at 'real time' stalls—to get a better feeling for what the numbers are like when the system is under load or running during 'peak' periods of the day (where 'peak' is defined as when the highest number of end-users are using and depending upon the system and when it needs to be as responsive as possible). And to do that there are two tools that can be used. First and foremost is Performance Monitor, or perfmon, which can be used to look at Physical Disk Avg sec/Read and Avg sec/Write—across all drives. And since this counter tracks milliseconds it's pretty easy to get a quick feel if you are, indeed, seeing 'peak' stalls > 0-4ms or 0-8ms across your drives/workloads.

Then, if you are running into any IO problems (or if you'd prefer to not drop into perfmon), the following query can be very valuable as a means of tracking file stalls over a short amount of time:

DECLARE @Reset bit = 0;
       
IF NOT EXISTS (SELECT NULL FROM tempdb.sys.objects
WHERE name LIKE '%#fileStats%')  
        SET @Reset = 1;  -- force a reset

IF @Reset = 1 BEGIN
        IF EXISTS (SELECT NULL FROM tempdb.sys.objects
        WHERE name LIKE '%#fileStats%')  
                DROP TABLE #fileStats;

        SELECT
                database_id,
                file_id,
                num_of_reads,
                num_of_bytes_read,
                io_stall_read_ms,
                num_of_writes,
                num_of_bytes_written,
                io_stall_write_ms, io_stall
        INTO #fileStats
        FROM sys.dm_io_virtual_file_stats(null, null);
END

SELECT  
        DB_NAME(vfs.database_id) AS database_name,
        --vfs.database_id ,
        vfs.FILE_ID ,
        (vfs.io_stall_read_ms - history.io_stall_read_ms)
         / NULLIF((vfs.num_of_reads - history.num_of_reads), 0) avg_read_latency,
        (vfs.io_stall_write_ms - history.io_stall_write_ms)
         / NULLIF((vfs.num_of_writes - history.num_of_writes), 0) AS avg_write_latency ,
        mf.physical_name
FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
                JOIN sys.master_files AS mf
                        ON vfs.database_id = mf.database_id AND vfs.FILE_ID = mf.FILE_ID
                RIGHT OUTER JOIN #fileStats history
                        ON history.database_id = vfs.database_id AND history.file_id = vfs.file_id
ORDER BY avg_write_latency DESC;

Because, as you can see, this query just uses a simple 'trick' to record file stall metrics as they are now (or over time), and then 'deltas' subsequent requests against that number so that you can more or less watch file stalls in real time. Accordingly, I like to use this query to watch what file stalls look like for a few minutes, or even 10-of-minutes during peak load, and then even reset the deltaing process by setting @Reset = 1, to determine if there are any usage patterns that would clearly define which databases might be causing excessive I/O requests and so on.

Related: Troubleshooting Slow Servers

Please or Register to post comments.

What's Practical SQL Server?

Practical advice, insight, and help for core SQL Server considerations.

Contributors

Michael K. Campbell

Michael K. Campbell is a contributing editor for SQL Server Pro and Dev Pro and is an ASPInsider. Michael is the president of OverAchiever Productions, a consultancy dedicated to technical evangelism...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×