SQL Server 2005 introduces dynamic management objects (views and functions) that give you server-state information in a convenient, relational format. You can use this information to tune performance, diagnose problems, and monitor the health of your system. SQL Server 2005 also includes new T-SQL querying elements that let you simplify and optimize your code. I'll explain how to use some of the new T-SQL querying elements to analyze performance information obtained from dynamic management objects, and more specifically how to simplify the process of aggregating performance data and isolating the top resource consumers.
In my examples, I'll use a dynamic management function named sys.dm_io_virtual_file_stats, which provides I/O statistics about database files. If you aren't familiar with this function, you can read the "sys.dm_io_virtual_file_stats" topic in SQL Server 2005 Books Online at http://msdn2.microsoft.com/en-us/library/ms190326.aspx. I'll also use the new T-SQL querying elements common table expressions (CTEs), the ROW_NUMBER function, and the OVER clause with aggregate functions. I covered all of the aforementioned querying elements in past articles: "Get in the Loop with CTEs" May 2004, InstantDoc ID 42072; "Calculating Row Numbers in SQL Server 2005," April 2006, InstantDoc ID 42302; and "OVER Clause Simplifies Aggregate Window Calculations," November 2005, InstantDoc ID 48297.
At this point, I want to note that although I’ll be using the sys.dm_io_virtual_file_stats dynamic management function in my examples, you can use the following techniques with other dynamic management objects as well. The choice of which dynamic management object to query will depend on which aspect of performance you wish to analyze; for example, you can examine wait statistics at the server level, I/O statistics at the database file level, or resource consumption at the session level.
Getting I/O Statistics
Let's begin by using the sys.dm_io_virtual_file_stats dynamic management function to get I/O statistics about database files. To request I/O statistics about a specific file, you need to include a database ID and a file ID, as the following code shows:
DB_ID('AdventureWorks'), -- db
1) AS IO_Stats; -- file id
You can be in any database when you query the dynamic management function, so I didn't include a USE statement in the code. If you want information about all files in a particular database, specify NULL as the file ID, as the following code shows:
NULL) AS IO_Stats; -- all files
If you want information about all files in all databases, specify NULL as the database ID and the file ID, as the following code shows:
NULL, -- all databases
NULL) AS IO_Stats; -- all files
If you mistype a database name, it’s as if you specified NULL for the database ID; the system will provide information about all databases.
Some of the attributes the sys.dm_io_virtual_file_stats dynamic management function returns are database_id, file_id, num_of_bytes_read, and num_of_bytes_written. The last two attributes represent the number of bytes SQL Server has physically read from or written to the file since SQL Server was last started.
The Top Resource Consumers
Suppose you have five databases—A produces 2 percent of the I/O activity, B produces 50 percent, C produces 25 percent, D produces 20 percent, and E produces 3 percent—and you want to identify the top databases that collectively produce at least 90 percent of the total I/O activity (reads and writes) on the system. You'll want to isolate databases B, C, and D because they're the top consumers and together produce 95 percent of the I/O activity in the system. After the top resource consumers are identified, you can focus your I/O tuning efforts on those databases. Of course you could write a simple TOP query to isolate TOP n databases; the tricky part is to isolate databases based on a running percentage filter. Imagine that you have dozens, hundreds, or even thousands of databases on your system. Identifying the top consumers based on a running percentage filter makes more sense than filtering a constant number of top resource consumers.
The querying techniques that I'm about to explain show you how to isolate the top resource consumers so that you can focus your performance tuning efforts. However, rather than analyze I/O statistics based on the number of bytes read and written, you can analyze I/O stats based on I/O stalls (attribute io_stall), examine read and write activity separately, or look at completely different aspects of the system.
The 4-Step Solution
My solution for isolating the top resource consumers can be divided into four steps. Each step involves developing a query that relies on the result set of the query from the previous step. Using CTEs was a convenient way to develop my solution in a modular approach.
Step 1: Calculate the total I/O for each database. I wrote a query against the sys.dm_io_virtual_file_stats dynamic management function that groups the data by database, then calculates the total I/O in megabytes (io_in_mb) for each database (database_name), as Listing 1 shows. When I ran the query on my laptop (which I recently restarted), I got the output that Table 1 shows.
Step 2: Calculate the percentage of I/O for each database. The query in Listing 2 shows you how to rank the I/O activity for each database in descending order and calculate the percentage that each database uses of the total system I/O.
The code in Listing 2 creates a CTE (called Agg_IO_Stats) from the query in Step 1. The outer query calculates the results in the row_num and pct columns. The outer query calculates row_num by using the ROW_NUMBER function based on io_in_mb DESC ordering. Then the query calculates pct by dividing the current database’s io_in_mb value by the total system I/O: io_in_mb / SUM(io_in_mb) OVER(). When you specify the OVER clause with empty parentheses, you expose a window with all the rows that the SELECT phase received as input to the aggregate function. (Each logical querying phase produces a virtual table that the next query uses as input.)
The results from running this query are listed in Table 2. It's easy to identify the databases that have the highest I/O activity because the query results are expressed as I/O in megabytes and as a percentage of the total system I/O activity and are ranked from highest to lowest.
Step 3: Calculate a running percentage (run_pct) for each database. Next, you need to run the code in Listing 3 to calculate the cumulative percentages for all databases that have a row_num value that's smaller than or equal to the current database’s row_num value.
The code in Listing 3 creates a CTE (called Rank_IO_Stats) out of the outer query from Step 2. The outer query in Listing 3 joins two instances of Rank_IO_Stats—named R1 and R2. The join condition matches to each row from R1 all rows from R2 that have a smaller or equal row_num value (R2.row_num
The results from running this query on my laptop are shown in Table 3. You can see the cumulative percentage of I/O activity in the run_pct column. For example, you can observe that the databases Performance, Northwind, AdventureWorks, and Generic have the highest I/O activity and percentage values, and that they account for 91.52 percent of the total system I/O used. So if your goal is to focus your tuning efforts on the databases that account for 90 percent of the I/O activity on the system, you now have this information as well as information for the less active databases.
Step 4: Calculating a running percentage for the filtered databases. The final step is to query only those databases that collectively reach a certain level of I/O activity (e.g., 90 percent) and filter out all the other databases. This step is a bit tricky. If you were thinking about specifying the expression SUM(R2.pct)
Listing 4 shows this query. You'll notice that the code in Listings 3 and 4 is similar except that Listing 4 includes the aforementioned expressions in the HAVING clause. As you can observe in the output in Table 4, four rows are returned. The threshold requirement (90 percent), in this case, returns more than the minimum number of required rows (three). If you specify that five rows should be the minimum number returned, you'll get the output that Table 5 shows.
Try This at Home
In essence, the techniques that I've just described let you calculate each database's percentage of the total, find the running percentage of the total, and define a filter based on a given percentage threshold. These techniques can be very handy when analyzing performance information obtained from the dynamic management objects provided by SQL Server 2005. I urge you to try these techniques against other dynamic management objects as well. Do a search on sys.dm_ in SQL Server 2005 BOL to see an index of dynamic management objects, and determine which ones might help you with your analysis.