Downloads
43898.zip

In the Q&A "SQL Server Tools Return Different Results," above (InstantDoc ID 43897), I explain that different SQL Server tools report performance information in different ways based on how they calculate the metrics they measure. In most cases, minor discrepancies in the data that different SQL Server tools report don't matter. However, wrong answers lead to poor query-tuning decisions. And you can get a wrong answer if you're using Query Analyzer to analyze I/O for a query that includes a user-defined function (UDF).

Consider the UDF that Listing 2, page 13, creates to calculate order amounts. The following query returns all rows and all columns from the Orders table:

— Query 1
SELECT *
FROM Orders

And the next query returns all rows and all columns from the Orders table and uses the UDF from Listing 2 to calculate the total order amount for each order:

— Query 2
SELECT * ,dbo.GetTotalAmount
(OrderId) TotalAmountForThisOrder
FROM Orders

Note that I'm using simple queries so that we can focus on the use of STATISTICS IO with UDFs. Also note that the UDF uses an (index = 0) hint, which forces the code to perform a table scan and increases the I/O the function uses.

If you run Query 1 and Query 2 with STATISTICS IO and STATISTICS TIME enabled, you'll see that Query Analyzer reports that each query performs 23 logical I/Os—which is the cost of the table scan. On my laptop, Query Analyzer reports that Query 1 takes approximately 20ms to run and that Query 2 takes about 5ms. You wouldn't expect such a range of response times to produce the same result set, especially when Query Analyzer claims that SQL Server performed 23 reads each time. So, how do you account for the 15ms difference in response time? Remember that the UDF performs a table scan on the Order Details table. To investigate further, instead of running the UDF to find the total amount of an order, run the following query, which the UDF contains:

SELECT SUM(UnitPrice * Quantity)
FROM   \[Order Details\] (index =0)
WHERE  OrderId = 10248

To demonstrate my point, I'm again forcing a table scan to raise the I/O levels. Now, you can see that the query requires 11 I/Os. So, it's impossible that Query 2 requires only 23 I/Os, because the UDF that Query 2 uses costs 11 I/Os each time it's called. If each of the 830 rows in the result set requires 11 I/Os, Query 2 should cost 9130 I/Os. Also note that according to Query Analyzer, Query 2 (the one that uses the UDF) takes less time to run than Query 1. Your eyes and a stopwatch will tell a different story. Query 2 takes more than 3 seconds to run on my laptop, not the 5ms that STATISTICS TIME reports. And according to SHOWPLAN, the UDF doesn't even show up as part of Query 2's plan.

Are the few additional seconds and the few thousand I/Os that the UDF adds to this query's performance a big deal? Of course. But worse than the UDF overhead is the unreliable statistics information that a developer trying to tune such a query might receive. Several weeks ago, I was working with a customer who had a series of SELECT statements that used UDFs. In some cases, the UDFs added millions of unnecessary I/Os to the total query cost. The problem wasn't immediately evident because the server had a lot of RAM and a fast EMC SAN, but adding a million extra I/Os to a query isn't a good thing. The customer was shocked to learn of the problem because all the developers use Query Analyzer to see how expensive a query is. But as you just saw, you can't always believe the I/O information that SQL Server tools give you when UDFs are involved