I first started working with SQL Server with version 6.0. Although I had first worked with SQL in a VMS environment in 1989, I had done most of my database work in a flat-file environment. I was surprised that I didn't miss the cursor-like aspects of row operations as much as the ability to easily apply functions. With SQL Server 2000, I can finally create user-defined functions (UDFs) to use not only in scripts, but also inline with Data Manipulation Language (DML) statements. SQL Server 2000 has two types of UDFs: those that return a scalar value, and those that return tables. In this article, I discuss examples of each type and briefly describe how you might use UDFs in SQL Server 2000 applications.
UDFs That Return a Scalar Value
UDFs that return a scalar value are probably what you most expect. Like GETDATE(), USER(), and many of SQL Server's own built-in functions, this type of UDF lets you define code that returns a scalar value to the calling script, procedure, or line of code.
In many ways, UDFs look like stored procedures. Like stored procedures, UDFs accept parameters and return a value. Unlike stored procedures, however, they can't contain output parameters. More importantly, a UDF return value isn't limited to an integer. Instead, the value can be any valid SQL Server data type, except for BLOBs, cursors, and timestamps.
Let's look at an example that shows how easily you can build a function to deal with a problem that SQL Server developers commonly face—how to see whether a datetime field entry occurred on a specific day. Your datetime field usually has specific time-of-day information that makes it difficult to compare just entry dates. More specifically, two fields with the same date but different times won't return a match.
A typical solution is to convert the date to a string, then back to a date to truncate the time information. If you perform this conversion on both of the datetime fields you want to compare, the comparison based on just the date (without the time) will succeed. The solution might look like the code in Listing 1. Following are the results:
(1 row(s) affected) KeyCol TestDate ------- ----------- (0 row(s) affected) KeyCol TestDate ------- ----------- 12000-02-10 21:29:43.140 (1 row(s) affected)
The second query became a little messy—the more functions you embed in your query, the more convoluted the code becomes. Sometimes, when I've had to strip down many dates to just the day, not the time, the resulting code has become difficult to read.
Now I do the same thing with a simple UDF. First, I use the new CREATE FUNCTION command to create the function; I format this command much like a stored procedure. I could code the query this way:
CREATE FUNCTION DayOnly(@Date datetime) RETURNS varchar(12) AS BEGIN RETURN CONVERT(varchar(12), @Date, 101) END
Now I reformat the query slightly to take advantage of the new function:
SELECT * FROM TestTable WHERE dbo.DayOnly(TestDate) = dbo.DayOnly(GETDATE())
Even for this simple query, the new code is much more readable. The call works much as it would from most languages that support functions, with one hitch: The owner (schema) is required to execute our function. For some reason, SQL Server doesn't resolve functions as it does other objects.
UDFs have more advantages than just readability. You can embed queries in UDFs and use them to encapsulate subqueries. In fact, you can encapsulate almost any query that returns a discrete value. To demonstrate, let's encapsulate the stock subquery example from SQL Server Books Online (BOL). The subquery version looks like Listing 2. An encapsulated version of the average subqueries and the difference subqueries appears in the functions in Listing 3. Notice that it's legal to embed one UDF in another one.
Listing 4 contains the BOL query as I revised it to use the new functions instead of the old subquery model. Running this query yields the same results as running the subquery did. Besides being more readable, the UDF is reusable, which saves time as your functions become more complex.
UDFs That Return a Table
As an alternative to scalar values, SQL Server's new UDFs can return something far more interesting: tables. You can perform a JOIN against such a table and even apply WHERE conditions to it.
Using a table as a return value is easy. To a UDF, a table is just like any other SQL Server data type, except that you need to both name the table and define the columns (much as you would a temporary table). To illustrate, let's build a simple function that returns a table containing a list of the names (last, first) of all of the authors in the Pubs sample database plus their address information. Listing 5 contains an example of one solution along with a sample query to show the contents of the table the function produces. Table 1 shows a portion of those results (I've clipped out the middle for brevity).
You could have done everything up to this point more easily with a view. But suppose you want to parameterize a view, showing only authors who sold a minimum number of books. Using a view, you could join our query with another table or two, but that would get a bit wordy, complex, and hard to read. Also, you would have to include in the query a column (the sales quantity) that you don't necessarily want in the output, then use a WHERE clause. A view and the query using it might look something like the code in Listing 6. To simplify things, you can encapsulate everything in a function, as Listing 7 shows.
Now the query to return the desired value is simple:
SELECT * FROM dbo.TableFunction(25)
You can reuse this function without having to copy and paste. Also note that although a stored procedure could produce similar results, you couldn't join the results of that stored procedure to another table.
To illustrate the advantage a UDF has over a stored procedure, let's take the example a step further. Suppose a manager wants a report that lists the author and publisher for every book that sold more than 25 copies. If you used a stored procedure, you couldn't join to the results without a cumbersome, multistep process. With our function, joining other tables to the results is simple. As you can see in Listing 8, joining to the function is as easy as if the function were a table or a view. The only real difference is that I can now use one or more parameters instead of a WHERE clause.
Determinism is the idea that for a given set of valid inputs, a procedure will return the same result every time. Before SQL Server 2000, determinism wasn't very important, but with the addition of indexing to views and computed columns, the idea is now crucial. To build an index over something, SQL Server needs to define deterministically (define with certainty) which item is indexed.
UDFs can be either deterministic or nondeterministic. What the function does, rather than any parameter, defines its determinism. SUM() is a built-in, deterministic function: The sum of 3, 5, and 10 is always 18 whenever you call the function with those input values. On the other hand, GETDATE() is nondeterministic: The value that the function returns changes almost every time you call it.
To be deterministic, a function must meet three criteria. First, the function must be schema-bound. Any objects the function depends on have a dependency recorded and no changes to those objects are allowed (for example, by an ALTER statement) unless you first drop the dependent function. Second, all other functions that your function refers to, whether user- or system-defined, must also be deterministic. Finally, the function can't reference any objects that are nondeterministic (which rules out everything except other functions) outside the scope of the function.
If you build a view or a computed column that refers to a nondeterministic function, you won't be able to build an index on that view or computed column. This limitation isn't the end of the world, but you'll need to consider whether a function is deterministic before you create indexes against views or columns that use that function.
Creating System Functions
System stored procedures and functions are handy, but one particular benefit is that you can use them in any database without having to fully qualify them. Although they exist in the Master database, you can call system stored procedures and functions from any database, and they function as if they existed within that database.
So how does this benefit apply to you? Many SQL programmers enjoy being able to create their own system stored procedures, so you might also want to create your own system functions—that is, functions that you can call from any database and that you don't need to qualify with the source database or even the owner name. (Remember that regular UDFs must be owner-qualified in order to resolve.)
Creating a system UDF has a couple of steps. First, you need to create the function in the Master database. Use the prefix fn_ to begin the function name. And after you create the function, use the system stored procedure sp_changeobjectowner to change the function's owner to system_function_schema. If you follow these steps, you can call your function from any database on your SQL Server machine. To illustrate this process, let's review the first sample function in this article, which looked like this:
CREATE FUNCTION DayOnly(@Date datetime) RETURNS varchar(12) AS BEGIN RETURN CONVERT(varchar(12), @Date, 101) END
Now, use a typical DROP command to drop that function:
DROP FUNCTION dbo.DayOnly
Next, rerun the CREATE FUNCTION script, changing the current database to Master, the name to include the fn_ prefix, and the ownership to system_function_schema. Listing 9 shows the correct code. This function will yield the current date, even if you are in a database that is different from the one you created the function in and you didn't owner-qualify that database.
Deleting System Functions That You Created
A problem occurs when you later want to delete the system functions you created. If you try to drop your function after changing the owner name, SQL Server will give you a doesn't exist error and won't delete your function because SQL Server doesn't generally allow the deleting of system objects. To drop a system function you created, you must use sp_configure to enable the Allow updates option for your server. To drop the newly created system function, execute the code in Listing 10.
I have two warnings about using the Allow updates option. First, turning on this option is extremely dangerous: You could accidentally delete rows from system tables that might disable your server, or you might drop vital functions or procedures. If such a deletion happens, the only way to safely reinstall the deleted object or data is to reload SQL Server from the CD-ROM. So, be sure to turn the Allow updates option off after you're finished. Second, don't add system procedures and functions indiscriminately. As with any system table or function update, you need to think and plan carefully before modifying such an important area of your system.
Only the Beginning
UDFs can give you great power and flexibility. They can enhance readability, serve as parameterized views, and even provide the performance benefits of stored procedures, while letting you access the functionality within a query (something you can't do with stored procedures). And you can use UDFs to encapsulate subqueries and derived tables—even if you want to JOIN to the results.
But these examples are just the beginning. In practice, the power and flexibility of UDFs will push SQL Server into previously uncharted territory. I'm eager to hear about new ways to use UDFs. Feel free to email me at firstname.lastname@example.org with your ideas.