Keeping with the theme I began in “Don't Overlook the New SQL Server 2005 NEWSEQUENTIALID( ) Function” (June 2006, InstantDoc ID 50164)—new and possibly overlooked SQL Server 2005 functions—this month I describe a new user-defined function (UDF) option, RETURNS NULL ON NULL INPUT, that helps speed up the processing of NULL input and the new list of non-deterministic system functions allowed in a UDF.
When you use the RETURNS NULL ON NULL INPUT UDF option, SQL Server doesn't invoke a function that has NULL input arguments and simply returns a NULL. You can use this option whenever you know that your function is supposed to return NULL when any of the input arguments is NULL. There are two main benefits to using this new option. For NULL inputs, you eliminate both the high overhead that occurs when you invoke scalar UDFs and the need to perform NULL input validation in the function’s definition. Let's take a look at how this option works.
NULL Processing Examples
First, I'll show you an example of a function that processes NULL input. You need to create a UDF—fn_EndOfMonth. Run the code in Listing 1 to create a function that accepts a DATETIME value (@dt) and returns the corresponding end-of-the-month date. For example, when you enter the DATETIME 20060212, the fn_EndOfMonth function returns 20060228.
If you don’t specify the RETURNS NULL ON NULL INPUT option when you create the UDF, SQL Server defaults to the CALLED ON NULL INPUT option and SQL Server calls the function regardless of whether you enter NULL input. To demonstrate this idea, run the code in Listing 2 to create the T1 table and populate it with 10 rows—five with known DATETIME values and five with NULLs. Then use SQL Server Profiler to trace the SP:Starting event and the data columns EventClass, SPID, ObjectName, and TextData. Start the trace, then run the following query:
The results should be similar to what Table 1 shows, and the trace should produce results similar to the ones that Figure 1 shows. Notice that the Profiler trace shows that the fn_EndOfMonth function was invoked 10 times, once for each row in the T1 table regardless of whether the input was NULL.
To understand how the RETURNS NULL ON NULL INPUT option works, I'll alter the fn_EndOfMonth function's definition to include the RETURNS NULL ON NULL INPUT option. Run the code in Listing 3. To see the results, click the Clear Trace Window button in Profiler to clear the trace data, then rerun the previous query.
You'll notice that the fn_EndOfMonth function still produces the same output that Table 1 shows. The trace data, as Figure 2 shows, confirms that SQL Server invoked the fn_EndOfMonth function only five times when the input wasn't NULL. In summary, you'll see that the first trace (Figure 1) that ran on the UDF that didn't use the RETURNS NULL ON NULL INPUT option captures ten events while the UDF that did use this option (Figure 2) captures only five events.
Reducing High Overhead
As I mentioned earlier, invoking scalar UDFs within queries can cause high overhead. In general, to maintain good performance, best practice is to include your calculations as inline expressions in the query. But some calculations can't be expressed inline. For example, calculations that contain iterative logic must be encapsulated within UDFs.
Here’s an example of the performance variance you might see when you use the RETURNS NULL ON NULL INPUT option. I populated the T1 table with 1,000,000 rows—half with known datetime values and half with NULL input. I ran the query twice—the first on the function with the CALLED ON NULL INPUT option and the second on the function with the RETURNS NULL ON NULL INPUT option. I also clicked Tools, Options, Query Results, SQL Server, Results to Grid and then enabled the Discard results after execution option in SQL Server Management Studio (SSMS) because I didn't want the performance test to include the time it took to generate the output. The query on the function with the RETURNS NULL ON NULL INPUT option took three seconds, two seconds less than the other query.
You can also specify the RETURNS NULL ON NULL INPUT option when you register Common Language Runtime (CLR) scalar UDFs. To demonstrate this, I'll use the fn_RegexReplace function that I described in the article “CLR or Not CLR: Is That the Question?” (April 2006, InstantDoc ID 49429). For your convenience, Listing 4 provides the function’s definition using C#. As a reminder, this function provides you with string-replacement capabilities based on regular expressions patterns. Currently, the function’s code performs NULL validation and returns a NULL if there is NULL input.
To build an assembly file that includes the .NET code that Listing 4 shows, create a C# Class Library project called RegexReplace in Visual Studio 2005. Specify C:\RegexReplace as the target folder. Add a class item called RegexReplace.cs and copy the code from Listing 4 into the body of the new class item. Then, select Build, Build RegexReplace to create the assembly file C:\RegexReplace\RegexReplace\bin\Debug\RegexReplace.dll. Next, connect to the tempdb database and run the following command to load the intermediate language (IL) code from the assembly file to the tempdb database.
Finally, to register the function in the tempdb database, run the code in Listing 5. Because the code that registers the function doesn't explicitly specify a NULL treatment option, SQL Server assumes the default CALLED ON NULL INPUT option, which means that SQL Server will invoke the function when given NULL input. To test this concept, run the trace that I described earlier to confirm that SQL Server invokes this function when there's NULL input. You can use the following query, which formats the fax numbers from the Customers table in the Northwind database.
'$1-') AS FormattedFax
WHERE CustomerID LIKE 'F%';
This query returns the output that Table 2 shows. Notice that the regular expression-based replacement performed in this query removes the parentheses from the original fax numbers and substitutes all non-digit characters (e.g., dot and space) with hyphens. Run the trace in the background while the query is running—you'll get eight events. This result confirms the notion that SQL Server will invoke the function for each row, even when it contains NULL fax numbers.
If you know that you want to return NULL on NULL input, run the code in Listing 6. This code alters the function's registration to include the RETURNS NULL ON NULL INPUT option. When you use this option, you don't have to include NULL validation in the function’s definition. Listing 7 shows the revised function definition without the NULL validation.
Run the following query while the trace is running in the background.
'$1-') AS FormattedFax
WHERE CustomerID LIKE 'F%';
The resulting trace data will show six function calls instead of eight because there are six non-NULL fax numbers in the input set.
Non-Determinism and the Side Effects
SQL Server 2000 won't invoke non-deterministic system functions within UDFs. A non-deterministic function is one that doesn't always return the same output even when the input remains the same. Here’s the list of non-deterministic system functions that aren't allowed in SQL Server 2000: @@CONNECTIONS, @@TOTAL_ERRORS, @@CPU_BUSY @@TOTAL_READ, @@IDLE, @@TOTAL_WRITE, @@IO_BUSY GETDATE, @@MAX_CONNECTIONS, GETUTCDATE, @@PACK_RECEIVED, NEWID, @@PACK_SENT RAND, @@PACKET_ERRORS TEXTPTR, @@TIMETICKS. Many programmers find this list of disallowed functions too restrictive. For example, in SQL Server 2000, you might want to invoke the GETDATE function within a UDF, but you can't.
Fortunately, SQL Server 2005 made the list of disallowed functions much shorter. In comparison, SQL Server 2005 allows the following non-deterministic functions in UDFs: CURRENT_TIMESTAMP, @@MAX_CONNECTIONS, GET_TRANSMISSION_STATUS, @@PACK_RECEIVED, GETDATE, @@PACK_SENT, GETUTCDATE, @@PACKET_ERRORS, @@CONNECTIONS, @@TIMETICKS, @@CPU_BUSY, @@TOTAL_ERRORS, @@DBTS, @@TOTAL_READ, @@IDLE, @@TOTAL_WRITE, @@IO_BUSY. However, it doesn't allow these non-deterministic system functions: NEWID, RAND, NEWSEQUENTIALID, TEXTPTR.
SQL Server 2005 Books Online (BOL) doesn't provide a reason for this change or the distinction between the functions that are and aren't allowed, but you'll notice that the functions you can't use have side effects. That is, the SQL Server 2005 non-deterministic functions that aren't allowed make internal changes in the database or the system and leave a mark (such as a change to an interval value) that they have been invoked. And UDFs aren't allowed to have side effects. For example, when SQL Server invokes the RAND function it changes an internal value that's used to calculate the seed for the next invocation of the function. This means that had SQL Server allowed you to invoke RAND within a UDF, the invocation of the UDF would have changed the result of subsequent invocations of RAND outside the UDF as well. For this reason, you can't use RAND in a UDF, even in SQL Server 2005.
As an example, let's look at how you can invoke GETDATE from a UDF. First, run the code in Listing 8 to create the fn_EndOfThisMonth UDF. This UDF returns the date for the last day in the current month, based on the return value of the GETDATE function. If you include this function in a SQL Server 2000 UDF, you'll get the following error when you attempt to create the UDF:
Invalid use of 'getdate' within a function.
However, if you include the fn_EndOfThisMonth function in SQL Server 2005 the UDF is created successfully. When you run the code
to test the function, you'll get the date of the last day for the current month.
Discover SQL Server 2005's New Features
SQL Server 2005 includes so many new features that it's easy to overlook a few of them. As you probably realize after reading my past three T-SQL 2005 articles, these overlooked features are quite useful. You might want to spend some time browsing SQL Server 2005’s BOL—you might find some surprisingly handy features that you can use to simplify your T-SQL code and improve its performance.