In my five-day SQL Server Internals and Tuning class, I have a module called simply "Query Tuning." When we start this module, I tell the students that if the SQL Server optimizer were perfect, we wouldn’t need this module. If the optimizer were perfect, it could take any legal T-SQL query you submit and figure out the best way to execute it to return the results you’re asking for. Unfortunately, the optimizer isn’t perfect (yet). However, there are still some things you can do in your T-SQL code to help the optimizer come up with a great execution plan.
One of the topics discussed in this module is query hints, which I talked about in my commentary "Take a Hint About Query Hints." There are several other topics discussed that deal with constructs in your code that can have a negative impact on performance. There are simple changes you can made to remove these troublesome constructs that can give immediate performance improvements.
One of these constructs is scalar-valued user-defined functions (UDFs). UDFs were added to SQL Server 2000 and were a wonderful addition to the product. Microsoft had been telling us since SQL Server 6 that UDFs were coming "really soon," and they finally appeared in SQL Server 2000. UDFs come in two varieties: scalar functions, which return a single value, and table valued functions, which return a set of result rows and can be referenced in a FROM clause. Even after waiting all these years, it turned out that scalar UDFs didn’t live up to their promise.
I discovered some bad behavior shortly after SQL Server 2000 came out when I was contracted to do some performance tuning and troubleshooting for an insurance company in Atlanta. One of the developers had written a scalar UDF to mask social security numbers in a report they needed to run frequently. The function took a social security number as a character string input value, replaced some of the digits with special characters, and returned the masked value as output.
When we ran the report with only about 10,000 customer rows, it took more than 20 minutes to run. Something wasn’t right. I started a trace to see what statements in the report were taking the most time and saw 10,000 separate calls to the masking function. For each row, the trace showed SQL Server calling the function, replacing the string values, and returning the result, just as if we had written code with a cursor to process the data one row at a time. If I replaced the function call with the actual code used to mask the characters, the trace showed a single call to the function and the report finished in under a minute.
For example, here’s a scalar UDF that takes a social security number (nine digits and two hyphens) as input and replaces the first five digits with X’s.
SELECT @SSN = 'xxx-xx-' + right(@ssn,4)
To test it, you could use any table that has a social security number column, including the authors table from the old sample database pubs.
To improve the performance, you would need to forget about the UDF, and replace the SELECT with the function definition:
Of course, with 23 rows in the authors table, you wouldn’t see the performance advantage, but if you ran a trace you would see a difference in the amount of work that SQL Server was doing behind the scenes. Although the developer wasn’t happy to have his function ignored, the DBA, who was ultimately responsible for system performance, was delighted.
The poor performance of scalar UDFs occurs only when the function is referenced in the SELECT list. In those cases, you can replace the function reference with the code that defines the function and potentially make an enormous difference in performance. Microsoft is well aware of this problem, but it has no immediate plans to change the way scalar UDFs work when used in a SELECT list. My tests have shown the same poor performance in my installation of SQL Server 2012 RC0.
Query tuning doesn’t always require detailed knowledge of everything SQL Server is doing and the way it works internally. Sometimes it requires only that you have tools that let you see where problems are occurring, and then use a little creative energy to come up with a different way to write the problematic queries to get the same results. In this case, replacing the call to the function with the underlying function code was a very minor change that had dramatic results.