One of the things that’s great about my job is the fact that I get to meet people who are a lot smarter than me.  One such person I count among my friends is Jimmy May, a Microsoftee from the Indianapolis area.  Jimmy has been using SQLIOSIM for a while, but has discovered that its reporting leaves a bit to be desired.  Jimmy and I were discussing the reporting characteristics of SQLIOSIM and he thought it’d be useful to share his findings with the public.  Thanks for sharing, Jimmy!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

 

~~~~~

 

One thing which threw me off at first is the all-important “Target I/O” parameter.  As I understand it—be clear that I’m paraphrasing here—SQLIOSim tries to hammer the system trying to get to that threshold.  In fact, a well-tuned system with lots of capacity will resist high values.  Note my comment below loosely comparing it to disk latency.

 

 

SQLIOSim Sources

KB: http://support.microsoft.com/kb/231619/en-us

SQL Server Storage Engine blog: http://blogs.msdn.com/sqlserverstorageengine/archive/2006/10/06/SQLIOSim-available-for-download.aspx

PSS SQL Server Engineers:  http://blogs.msdn.com/psssql/archive/2007/04/10/sqliosim-outstanding-i-o-warnings.aspx

 

SQLIOSim Amalgamation of Undocumented Info

Characterization of Output:

Amalgamated from Bob Dorr & George Reynya

 

Target IO Duration (ms)

A threshold--SQLIOSim tries to hammer the system yet tries to stay under that threshold

If the average I/O duration exceeds the target I/O duration, the SQLIOSim utility throttles the number of outstanding I/O operations to decrease the load and to improve I/O completion time.

Pasted from <http://support.microsoft.com/kb/231619>

 

Running Average IO Duration (ms)

Low values are good

\[Bob Dorr\] Characterized as "Response time for the stress test"

\[Bob Dorr\] Target IO is 100ms; since ideal I/O is 4-8ms, anything higher is not a preferred outcome

\[JWM\] Though strictly speaking this isn't "disk latency" (Avg. Disk sec/\[Read|Write|Transfer\], acceptable  values are similar:

Data files:  0-5ms or so are excellent

Log files:  0-2ms are acceptable

Warnings for I/Os exceeding 15s (à la SP4 diags)

Algorithm is designed to mitigate big swings, esp. because of one very large I/O (similar to SQL Server Checkpoint logic)

Used to control the number of outstanding I/O requests

 

IO request blocks

High numbers are good

The more requests a system can handle, the more I/Os are issued by the tool

Characterized as "Concurrent IOs"

Capped by MaxIO; not uncommon to have 100 or more, esp. for Enterprise Edition for which read-aheads & such are attempted

 

Accumulators:  Reads, Scatter Reads, Writes, Gather Writes

High numbers are good

Simple Accumulators

Each time an operation is posted, the category for the specific request type is incremented

 

Total IO Time (ms)

Desirable numbers are relative

Useful for comparing drives, especially on the same system. 

If the values for the Accumulators are similar for two drives, but one drive has significantly less Total IO Time,

then the latter is more performant because it was able to service the same I/O in less time.

Characterized as "Accumulated total duration time for the I/O in ms"

The same duration is used to track the "Running Average IO Duration (ms)"

Sum of accumulators * running average = ~Running Average IO Duration

 

Number of times IO throttled

Low numbers are better

Number of times a request was removed because the duration was exceeded

 

Additional Documentation

Additional information will be based on demand & will be added to KB article & Wiki

 

Architecture

SQLIOSim is built with lightweight UMS/SQLOS-like layer to better simulate SQL Server, incl. logical CPUs

Output

Output Example #1:

Data

********** Final Summary for file S:\sqliosim.mdx **********

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 17, Number of times IO throttled = 9006, IO request blocks = 33

Reads = 75414, Scatter Reads = 88286, Writes = 3295, Gather Writes = 54845, Total IO Time (ms) = 44936446

********** Final Summary for file T:\sqliosim.mdx **********

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 4135, IO request blocks = 106

Reads = 67091, Scatter Reads = 81249, Writes = 2501, Gather Writes = 53652, Total IO Time (ms) = 13033268

Interpretation

\[George Reynya\] The running average of 17ms for S: does not look all that bad as a target as it means that over the life of the test you maintained 17ms response time for the stress test.  Your test shows that the number of concurrent IOs has dropped to 33 which is not so good.  Especially when you compare the same basic work load against the T: drive--0ms w/ 106 I/O requests.   I would look at the S drive IO path a bit to determine the differences.

\[JWM\] Note:  The trial lasted 300s.  The interpretation is consistent with physical layout:  Disks were SAS DAS 73GB 15K RAID 10; S: was comprised of 6 disks, T: of 8 disks.

\[JWM\] Note:  George's comment regarding 17ms vs. 33ms was relative; in fact, a neither value would not be acceptable to me—I’m shooting for <5ms for my data drives.

Output Example #2:

Data

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 121, Number of times IO throttled = 1167, IO request blocks = 13

Interpretation

\[George Reynya\]  Customer disk system exceeded Target IO Duration by 21ms for this file and had to throttle 1167 times in an attempt to keep the IO within the target.    With the number of IOs we do for a file this is a bit concerning.  We tried to throttle several times and they could not keep the average below the 100ms target.

Output Example #3:

Data

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim.mdx ********** 

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 15, Number of times IO throttled = 382, IO request blocks = 108

04/27/07 14:24:55 Reads = 187508, Scatter Reads = 259268, Writes = 15286, Gather Writes = 130498, Total IO Time (ms) = 5460993

04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616951, Write time = 6606954, Idle time = 170616, Bytes read = 74679347200, Bytes written = 74403336192, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim2.mdx **********

04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 16, Number of times IO throttled = 762, IO request blocks = 44

04/27/07 14:24:55 Reads = 183061, Scatter Reads = 256624, Writes = 15038, Gather Writes = 132071, Total IO Time (ms) = 5890665

04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886717, Read time = 3333859, Write count = 616967, Write time = 6606962, Idle time = 170630, Bytes read = 74679609344, Bytes written = 74403405824, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim.ldx **********

04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 0, IO request blocks = 8

04/27/07 14:24:55 Reads = 0, Scatter Reads = 0, Writes = 316141, Gather Writes = 0, Total IO Time (ms) = 1346919

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616962, Write time = 6606959, Idle time = 170623, Bytes read = 74679347200, Bytes written = 74403381248, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

Stress Test Attempt Complete

Interpretation

\[JWM\]  This test was done against what was thought by the customer to be a well-architected disk I/O subsystem (EMC Clariion CX700, lots of 146GB 15K disks, RAID 5 (RAID5--when will  they learn?)).  The system is intended to support a system from which very high performance is required.

The results of this test belie this assumption.  I would want Average IO Duration to be less than 5ms, no more than 10ms; however both data files have values exceeding this:  15ms & 16ms, respectively.

The IO Request Blocks aren't nearly as high as I'd expect, especially compared to the 8 disk RAID 10 drive T: in Output Example #1.

The far less respectable values for M:\sqliosim2.mdx compared to the first file will prompt further investigation.

Note also that I/O was throttled 382 & 762 times, respectively.

The Average IO Duration for the log file is excellent.  I don't have enough experience interpreting output for log files, so I don't know whether to be concerned about the log number of IO Request Blocks.