Downloads
24284.zip

Some of my developers always choose multistatement table-valued user-defined functions (UDFs) over inline UDFs, even though an inline UDF might meet the need. What performance differences exist between inline and multistatement table-valued UDFs?

AUDFs in SQL Server 2000 are a powerful addition to your T-SQL programming toolkit. Table-valued UDFs let you encapsulate rich T-SQL logic that returns a result set, then use that result set anywhere that a real table would be allowed. Table-valued UDFs come in two formats: multistatement and inline. Multistatement UDFs let you include multiple statements in the UDF. The UDF returns a result set that's populated inside the function, as I discuss later. Inline table-valued UDFs can include only one SELECT statement. Multistatement table-valued UDFs can add flexibility to many T-SQL solutions, but inline UDFs are almost always more efficient than comparable multistatement UDF solutions. Unfortunately, many programmers use inefficient multistatement UDFs when an inline UDF would perform better. Let's look at an example that demonstrates why inline table-valued UDFs can be more efficient.

First, use the code that Web Listing 1 shows to create a table called BigOrders. (You can download this Q&A's Web listings from the SQL Server Magazine Web site at http://www.sqlmag.com. Enter 24284 in the InstantDoc ID text box, and click Download the code.) This table is identical to the Northwind Orders table except that it includes a new column called NewId that becomes the primary key.

Next, run the batch that Web Listing 2 shows, which inserts 16,600 rows into BigOrders by duplicating the contents of the Northwind Orders table 20 times. Now you're ready to create two table-valued UDFs to use for testing.

Run the script that Listing 1 shows. The two UDFs return the same result set. Each UDF simply returns all the rows and columns from the BigOrders table. Next, run the following batch to see how SELECT statements against the UDFs perform:

SELECT * FROM dbo.GetBigOrdersMultiStatement() WHERE NewId = 10000
SELECT * FROM dbo.GetBigOrdersInLine() WHERE NewId = 10000
SELECT * FROM BigOrders WHERE NewId = 10000

For my test, I ran the three SELECT statements above and captured some I/O, CPU, and duration data about them by using SQL Server Profiler. Figure 2 shows the Profiler output, which reveals that the SELECT against the base table and the SELECT against the inline UDF were both efficient. However, the SELECT from the multistatement UDF performed a large number of reads, consumed more CPU processing time, and ran for a noticeably longer duration than the other two SELECT statements. Each of the three queries returned the same single row, so why was the performance of the two UDFs so different? The answer is in the code for GetBigOrdersMultiStatement. The multistatement UDF needs to populate a local table variable with data from BigOrders. So although the query is returning only one row, it has to read all the data from BigOrders, then insert all 16,600 rows into the table variable before the data will return. This unnecessary I/O negates the index's performance benefit because the query performs a full table scan against the BigOrders table.

Note in Figure 2 that Profiler shows 36,389 logical reads, which is incorrect because a table scan accesses only 407 pages. The SET STATISTICS IO option, which you can find on Query Analyzer's Tools, Options, Connection Properties menu, reports that the SELECT statement against GetBigOrdersMultiStatement performs only 417 reads. I can't explain this discrepancy, so I've requested an answer from Microsoft, which I'll share in this column when I receive it. Regardless of that discrepancy, inline table-valued UDFs typically perform better than multistatement UDFs because they use indexes more effectively.

Here's another way to comprehend the difference between multistatement and inline UDFs. The query plan for the two UDFs that Listing 1 creates shows that the optimizer treats an inline UDF the same way it treats a view: It performs an index seek against the clustered index of the base table. The query plan for the table-valued UDF shows that the optimizer performs an index seek against the index of the UDF's table, which the optimizer had to materialize first by scanning the whole base table. The optimizer processes an inline function the same way it processes a view against the base table, whereas it processes the multistatement function as if you created a temporary table with a redundant copy of the data from the base table. In such a case, inline processing is more efficient. This simple example should encourage you to use inline UDFs instead of multistatement UDFs whenever possible.