Treat Yourself to Fn_Split()

Downloads
21071.zip

This UDF lets you emulate VB's Split() function and parameter arrays

Hardly a week goes by in the newsgroups that a SQL Server developer doesn't ask how to pass a variable-length parameter array to a stored procedure--and learns that the technique isn't possible. You can use variable-length parameter arrays in Visual Basic (VB), but what if you prefer to program in T-SQL? Last year, I took advantage of SQL Server 2000's new support for user-defined functions (UDFs) to develop a handy UDF that emulates VB's Split() function. Later, I realized I could use the UDF to create a workaround that would let developers incorporate variable-length parameter arrays into their T-SQL programming. Let's look at why I needed the UDF, how I programmed it to solve my business problem, and how I applied it to emulate variable-length parameter arrays.

In "User-Defined Functions," November 2000, author Robert Vieira invited readers to email him new ways to apply UDFs. Shortly after reading Vieira's article, I began a project that involved parsing thousands of lines of ASCII text for a telemarketing firm. The rows of text represented INSERT queries, which were written in a proprietary language instead of standard SQL. My job was to transform these queries into SQL and run them to complete a complex data transformation. Some queries were as long as 1400 characters. Thinking aloud about how to attack my query problem, I said to a coworker, "I wish SQL Server had a function like the Visual Basic Split() function to help me parse these queries." My coworker replied, "Why don't you make your own SQL Server split function?" So, I created the fn_Split() multistatement table-valued function. (A multistatement table-valued function is essentially a stored procedure that returns a table that you can embed in a query.)

Before I describe fn_Split(), however, let's examine the Split() function. In VB, the Split() function fills a string array with values that it parses from a delimited string. Split() accepts the delimited string as its first parameter and the delimiter as its second parameter. In Split(), the default delimiter is a space. The array index is zero-based, so the first print statement in the following code prints the word brown to the Immediate window:

Dim astr() as String
astr = Split("How now brown cow")
Debug.print astr(2)  'Prints "brown" using space delimiter
astr = Split("How now brown cow", "")
Debug.print astr(2)   'Prints "w" using zero-length string delimiter

Note that the code uses a space as the default delimiter. If the delimiter is a zero-length (empty) string, the function returns an array of single characters and spaces. For example, the second Split() function statement in the above code uses an empty string as its delimiter, so the second print statement prints the character w. The array for the second split has 17 elements, one for each character or space in the string.

Building the Fn_Split() Function


Listing 1 shows the fn_Split() function. Because the Split() function has been so useful in my VB development work, I decided to make fn_Split() resemble it as much as possible. However, SQL Server can't pass an array of values between procedures. So instead of returning an array of values, as Split() does, fn_Split() returns a two-column SQL Server table variable.

Each element in Split()'s array is associated with a zero-based numeric index. For example, in astr(2) from the previous code snippet, 2 is the index of the third element in the array. To retain Split()'s index concept, I include a column, idx, that corresponds to the Split() array index and represents the row number starting with zero. The other column, value, corresponds to the values in the individual Split() array elements. The default delimiter for fn_Split() is a space, but for convenience and self-documentation, fn_Split() also lets you specify a space as the delimiter by using the string 'space' as the second parameter. Alternatively, you can use two single quotes with a space between them as the second parameter. You can specify an empty string as the delimiter by using the string 'empty' or two single quotes with no space between them.

Note that you must call UDFs under their owner designation, and you must use the database name when a function resides outside the current database. A sample database called SplitExamples, which contains the fn_Split() function, is available for download from http://www.sqlmag.com (see "More on the Web," page 58, for download instructions). After attaching the sample database to your server, you can invoke fn_Split() from a Query Analyzer window as follows:

SELECT idx, value FROM SplitExamples.dbo.fn_Split
('How now brown cow', 'space')
GO

This statement returns the information in Table 1 to the Query Analyzer grid. The statement

SELECT idx, value FROM SplitExamples.dbo.fn_Split
('How now','empty')
GO

returns the information in Table 2.

How Fn_Split() Works


Fn_Split() uses the CHARINDEX() function to identify the position of the first delimiter in the text and uses the SUBSTRING() function to set the value column for the first row, as the following statement shows:

SET @value = SUBSTRING(@sText,1,CHARINDEX
(@sDelim,@sText)-1)

Then, fn_Split() inserts the value and an index into the return table variable @retArray:

INSERT @retArray (idx,value)
VALUES (@idx,@value)

Fn_Split() trims the first value and its delimiter from the front of the text, increments the index, then loops to repeat the process until all the delimiters are gone. When fn_Split() can't find any more delimiters, it retrieves the remaining text for the last row and exits the loop.

In my project, a typical query string that runs against the telemarketing survey data looks like this:

q105=8 and q106>3 and q106<9

which means "get all the IDs of people who replied to question 105 with answer 8 and who replied to question 106 with answers greater than 3 and less than 9." If I use the string 'and' as my delimiter, fn_Split() returns the information in Table 3.

Each row in this table represents a query that's smaller and simpler than the original. For this project, I created a second table-valued function called fn_GetIDs() to transform and run these smaller queries. Fn_GetIDs() accepts one of these smaller strings and returns a one-column table of interview IDs. For the first row, fn_GetIDs() generates a column of IDs that satisfies the following SELECT statement:

SELECT IDs FROM SourceTable WHERE Question = 105 AND Answer = 8

Now, I can use a SQL cursor to process my fn_Split() table and create a dynamic SQL string that gives me the final query that I need. Listing 2, page 58, shows the procedure that uses a cursor to build the following dynamic SQL string from Table 2:

INSERT TargetTable(IDs)
SELECT a.IDs
FROM dbo.fn_GetIDs('q105=8') a
JOIN dbo.fn_GetIDs('q106>3') T1
ON a.IDs = T1.IDs
JOIN dbo.fn_GetIDs('q106<9') T2
ON a.IDs = T2.Ids

You can also count substrings with fn_Split(). When you pass this sentence as the delimited string and the word the as the delimiter into the sample stored procedure that Listing 3 shows, the query returns the number 9 for nine occurrences of the word the.

How to Emulate Variable-Length Parameter Arrays


In VB, you can pass variable-length parameter arrays to VB Function and Sub procedures by using the Paramarray keyword. SQL Server has no corollary to Paramarray and, therefore, no native way to accept variable-length arrays as parameters for stored procedures. Sometimes, however, variable-length parameter arrays are exactly what you need. For example, suppose a user wants to view the titles of books written by selected authors in the Pubs database. Because you don't know ahead of time how many authors the user has selected, you can't create a parameter for each author name in your stored procedure. In this situation, an object like Paramarray would be useful in a SQL Server stored procedure to capture any number of author names.

As a workaround to the SQL Server limitation, you can pass a delimited string of author names as one parameter to a single varchar variable in your stored procedure. In the stored procedure, you can use the fn_Split() function to transform the varchar variable that contains the list of author names into a table of author names. Then, you can retrieve the list of titles you want by using a standard SQL query that joins the fn_Split() table of author names with a table or view from the Pubs database. The accompanying downloadable code includes a sample Microsoft Access project file named split.adp, which includes a form for selecting author last names from a multiselect list. When you click the command button on the form, a procedure concatenates the selections and passes them as one comma-delimited string to a stored procedure through an ADO connection. The stored procedure in Listing 4 uses fn_Split() to retrieve a list of titles, author names, and prices for the grid on the form.

Increase Your Value


I don't know how I would have tackled my telemarketing data problem without table-valued functions. I know the project would have taken a lot longer and would have been a lot messier without the fn_Split() UDF. The more UDFs I create, the more applications I find for them in my work. With UDFs in my development arsenal, I work faster and better. On my list of best new features in SQL Server 2000, user-defined functions command the top spot.

Discuss this Article 4

dennis (not verified)
on May 3, 2004
SQL Server really forgot A LOT of functions to make our lives easier. More articles like this REALLY saves time - Burton Roberts should work for MS! LOL
gpwin59
on Mar 15, 2007
very useful - thank you
dan182 (not verified)
on Aug 31, 2004
it is pretty good and works nicely until the length of your delimeted string is bigger than 8000 characters. other than that works like a charm. daniel
jshardiman
on Oct 11, 2012
Great article, useful function. Just to let you know, your table has a zero-based index. That means that your "LISTING 3: Sample Code for Counting Substrings" will be wrong, always. use COUNT() instead of MAX() to get the correct number of rows in your table. MAX will give you one less than the actual number of rows in the table.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Mike Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.