Small Code Changes, Big SQL Server Performance Improvements

In my five-day SQL Server Internals and Tuning class, I have a module called simply "Query Tuning." When we start this module, I tell the students that if the SQL Server optimizer were perfect, we wouldn’t need this module. If the optimizer were perfect, it could take any legal T-SQL query you submit and figure out the best way to execute it to return the results you’re asking for. Unfortunately, the optimizer isn’t perfect (yet). However, there are still some things you can do in your T-SQL code to help the optimizer come up with a great execution plan.

One of the topics discussed in this module is query hints, which I talked about in my commentary "Take a Hint About Query Hints." There are several other topics discussed that deal with constructs in your code that can have a negative impact on performance. There are simple changes you can made to remove these troublesome constructs that can give immediate performance improvements.

One of these constructs is scalar-valued user-defined functions (UDFs). UDFs were added to SQL Server 2000 and were a wonderful addition to the product. Microsoft had been telling us since SQL Server 6 that UDFs were coming "really soon," and they finally appeared in SQL Server 2000. UDFs come in two varieties: scalar functions, which return a single value, and table valued functions, which return a set of result rows and can be referenced in a FROM clause. Even after waiting all these years, it turned out that scalar UDFs didn’t live up to their promise.

I discovered some bad behavior shortly after SQL Server 2000 came out when I was contracted to do some performance tuning and troubleshooting for an insurance company in Atlanta. One of the developers had written a scalar UDF to mask social security numbers in a report they needed to run frequently. The function took a social security number as a character string input value, replaced some of the digits with special characters, and returned the masked value as output.

When we ran the report with only about 10,000 customer rows, it took more than 20 minutes to run. Something wasn’t right. I started a trace to see what statements in the report were taking the most time and saw 10,000 separate calls to the masking function. For each row, the trace showed SQL Server calling the function, replacing the string values, and returning the result, just as if we had written code with a cursor to process the data one row at a time. If I replaced the function call with the actual code used to mask the characters, the trace showed a single call to the function and the report finished in under a minute.

For example, here’s a scalar UDF that takes a social security number (nine digits and two hyphens) as input and replaces the first five digits with X’s.

CREATE FUNCTION dbo.fnMaskSSN (@ssn char(11))
RETURNS char(11)
AS BEGIN
SELECT @SSN = 'xxx-xx-' + right(@ssn,4)
RETURN @SSN
END;

To test it, you could use any table that has a social security number column, including the authors table from the old sample database pubs.

SELECT dbo.fnMaskSSN(au_id), au_lname, au_fname
FROM authors;

To improve the performance, you would need to forget about the UDF, and replace the SELECT with the function definition:

SELECT 'xxx-xx-' + right(au_id,4), au_lname, au_fname
FROM authors;

Of course, with 23 rows in the authors table, you wouldn’t see the performance advantage, but if you ran a trace you would see a difference in the amount of work that SQL Server was doing behind the scenes. Although the developer wasn’t happy to have his function ignored, the DBA, who was ultimately responsible for system performance, was delighted.

The poor performance of scalar UDFs occurs only when the function is referenced in the SELECT list. In those cases, you can replace the function reference with the code that defines the function and potentially make an enormous difference in performance. Microsoft is well aware of this problem, but it has no immediate plans to change the way scalar UDFs work when used in a SELECT list. My tests have shown the same poor performance in my installation of SQL Server 2012 RC0.

Query tuning doesn’t always require detailed knowledge of everything SQL Server is doing and the way it works internally. Sometimes it requires only that you have tools that let you see where problems are occurring, and then use a little creative energy to come up with a different way to write the problematic queries to get the same results. In this case, replacing the call to the function with the underlying function code was a very minor change that had dramatic results.

Discuss this Article 7

kberghall
on Jan 20, 2012
It makes very little difference in this case, but here is how I would simplify the example UDF. CREATE FUNCTION dbo.fnMaskSSN (@ssn char(11)) RETURNS char(11) AS BEGIN RETURN 'xxx-xx-' + right(@ssn,4) END;
kberghall
on Jan 20, 2012
We have used UDFs extensively since they became available. There are definitely some performance traps if you use them the wrong way, but generally we have been very pleased with the performance and it is an extremely convenient way to encapsulate and re-use code. I tried you little SSN example and ran it on a table with 800,000 records and it ran thru all in 5 seconds. This is on a SQL Server 2008 R2 on a relatively slow server. So I think the query with 10,000 records must have had some other type of performance issue if it took 20 min. Also we have noticed that UDFs generally perform better if simplified down to one RETURN statement instead of declaring variables, etc. especially when accessing tables within the UDF.
KDelaney
on Jan 24, 2012
Maurice... thanks for your input. Yes, I tried to stress that the problem was only with scalar UDFs. I don't have enough words in these short commentaries to cover all the possible solutions, but I definitely stress in my class that one solution is to rewrite using a table-valued function. KBerghall... thanks for your comments. I first ran into this problem long ago, so the actual numbers are just what I remember. The report definitely did more than just the masking of the social security numbers, but I do remember that all we did was remove the call to the function and replace it with the function code and we made a drastic improvement. I would NOT expect any test on the current version with today's hardware to show the exact behavior we saw back then, but as mirzarashidbai attests, the problem does still exist. My tracing also shows far working going on behind the scenes if you have a call to a sclar UDF in your select list. If you are not having problems with scalar UDFs in the SELECT list, then that is great, but I offer this as one area where some people can have a major impact on performance. I am not saying UDFs are always bad, either, only that if you are trying to tune your code, you can try removing scalar functions in the SELECT list. Thanks again for the all the feedback!! ~Kalen
mirzarashidbaig
on Jan 19, 2012
This is really interesting information as we faced the performance issue with the UDF. This article will really help. Thanks for posting.
KDelaney
on Jan 25, 2012
Typos: far working = far more work sclar = scalar
Dydek
on Feb 17, 2012
I totally agree with your findings.. We have've recently eliminated a few UDFs created by an ORACLE-to-SQL migration tool, a datetime-to-date truncation function, and time to execute our queries went down by a whooping 99%. The optimization effect is greater when UDFs is used in a WHERE clause, because it automatically forces cursor processing. There are few articles floating on the web, confirming that most of the time UDFs are bad, but not all of the time the difference is that significant. Here are my 3 favorite ones: http://allinthehead.com/retro/247/user-defined-functions-considered-harmful http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/04/21/not-all-udfs-are-bad-for-performance.aspx http://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/
pelsql
on Jan 20, 2012
There is an easy way to work around this problem. Dump scalar udf use and replace them by cross apply + inline UDF, the column returned by the udf being the result of the expression originally written in the scalar udf. The optimizer expands what is in the inline UDF as if I would have put the expression directly into the select list. The overhead of function call is just flushed away by the optimizer as the access plan shows. By example this scalar function: Create function TSqlTools.TestScalarSln(@n as nvarchar(8)) returns nvarchar(8) as Begin declare @b nvarchar(8) Set @b = REPLACE(@n, '0', '_') return @b End can easily be rewritten by : Create function TSqlTools.TestInlineSln(@n as bigInt) returns TABLE as Return (Select REPLACE(@n, '0', '_') as newNb ) The SQL query must be rewritten also: select min(tsqltools.TestScalarSln (c)) From BigTable -- this is th scalar call -- by this equivalent select min(I.newNb) From BigTable B cross apply TSqlTools.TestInlineSln (c) as I What is great about this approach is that even if the inline function defines many transformations, if you just use one, this gem that is the SQL optimizer just keep the one used. Inline functions can then be made great domain tool sets. This function is an example of it: create function TSqlTools.RelatedDateExpr (@d datetime) returns table as return ( With DayNumExpr (dayNum) as (select (DATEPART(dw, @d) + @@DATEFIRST) % 7) Select dayNum , rtrim(SUBSTRING('Saturday Sunday Monday Tuesday WednesdayThursday Friday ', DayNum*9+1, 9)) as Eng , rtrim(SUBSTRING('Samedi Dimanche Lundi Mardi Mercredi Jeudi Vendredi ', DayNum*9+1, 9)) as Fr , dateadd(yy, datediff(yy, '19000101', @d), '19000101') as StartOfYear , dateadd(yy, 1+datediff(yy, '19000101', @d), '19000101') as StartOfNextYear ) You pay just for what you use: Select Expr.Eng from [ATableWithDates] as T cross apply TSqlTools.RelatedDateExpr(T.aDate) as Expr

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.