Use a T-SQL split function
|Executive Summary: Learn how to use a T-SQL split function to handle arrays as inputs for Microsoft SQL Server stored procedures.|
Although SQL Server stored procedures don’t support using arrays as inputs, DBAs often need to run procedures that can accept an unknown number of input parameters. For example, suppose you need to develop a stored procedure in the Northwind database that accepts a list of order IDs as input and returns all orders from the Orders table whose keys appear in the input list. A common solution is to use dynamic SQL; in “Handling Arrays” (InstantDoc ID 100453), I explain why this solution isn’t preferable.
Another option is to create a table function that accepts a string with a separated list of elements as input (@arr), and possibly a second argument with the separator character (@sep), and returns a table result with a row for each element. You can then use such a split function in your stored procedure and join its output table with the Orders table to identify the qualifying orders.
In this article I discuss how to use a T-SQL implementation of the split function solution. Next month I’ll discuss a Common Language Runtime (CLR) implementation of the split function approach.
To use the T-SQL split function approach, first run the code in Listing 1 to create and populate an auxiliary table of numbers. Next, run the code in Listing 2 to create the T-SQL implementation of the fn_split function. The split logic implemented by the function’s query can be divided into three parts: generating copies, extracting elements, and calculating position.
Generating copies. The first part of the query produces a copy of the @arr value for each element in the array. This action requires the Nums table that the script in Listing 1 created; as you can see, the query in Listing 2 is against the Nums table. The predicate in the WHERE clause is in charge of producing a match for each element in the array. In simple terms, a match is found for each number n from the Nums table that represents the character position of @sep (the list’s separator) in @arr. For x number of elements, there are x - 1 separators. In order to obtain x matches for x elements, you must add a separator in front of the first element. That is, you need to look for @sep in @sep + @arr instead of in @arr. In addition, because the Nums table can have many more numbers than the number of characters in @arr, you filter only the numbers that are smaller than or equal to the length of @arr plus one for the separator you added in front of the first element. Here’s the complete predicate in the query’s WHERE clause:
AND SUBSTRING(@sep + @arr, n, 1) = @sep
Extracting elements. The second part of the script in Listing 2 extracts elements from each copy of @arr. The column n in the Nums table represents the character position where the element of interest starts. Thus, a call to the SUBSTRING function extracts the element:
n) - n)
The SUBSTRING function operates on @arr as the input string and extracts a substring starting at character position n. The length to extract is the position of the next occurrence of @sep in @arr (calculated by the CHARINDEX function) minus n. Two things to note about the use of CHARINDEX to calculate the position of the next occurrence of @sep in @arr are that (1) in order to ensure that the separator is also found after the last element, the function looks for @sep in @arr + @sep rather than just in @arr and (2) the last argument to the function tells the function where to start looking for @sep. In my example, you would tell CHARINDEX to look for the next occurrence of @sep starting at the beginning of the current element, which is represented by n; hence n is provided as the last argument to the function.
Calculating position. The final portion of the script in Listing 2 calculates the position of the element within the array. In SQL Server 2000 the solution is quite tricky. The position of the element starting at character n can be calculated by counting the number of occurrences of @sep prior to the nth character (in the left n - 1 characters) plus 1. To count the number of occurrences of a character c in a string s, use the REPLACE function to remove all occurrences of c from s (call the result s2) by substituting them with an empty string, then simply subtract the length of s2 from s. The length of the original string minus the length of the string after removal of all occurrences of the separator character gives you the number of occurrences of the separator character. The original length of our string of interest is by definition (n - 1), therefore the complete expression that calculates the number of occurrences of @sep in the left n - 1 characters in @arr is:
(n - 1) - LEN(REPLACE(LEFT(@arr, n-1), @sep, N'))By adding 1 to the expression above, you get the position of the element within the array. Note that if you’re working with SQL Server 2005 or later, you can calculate the position of the element with a simple expression based on the ROW_NUMBER function:
ROW_NUMBER() OVER(ORDER BY n)
Also, remember that as of SQL Server 2005 you can work with VARCHAR(MAX) or NVARCHAR(MAX) as the input parameters to the function and stored procedure, and thus support very long input strings (up to 2GB).
Now that you understand the logic behind this solution, try running the following code to test the function:
You’ll get the output in Table 1, which includes a row for each element and indicates its position in the array.
Continue on Page 2
Run the code in Listing 3 to create the usp_getorders procedure, which is based on a call to the function fn_split, as well as a join between the function and the Orders table to return only the qualifying orders. Note the use of a new GUID to inspect the plan caching and reuse behavior of the new solution.
Next, run the following code to test the stored procedure with two different inputs:
EXEC dbo.usp_getorders <br> N'10260,10270,10265,10290';
Because the query strings for both invocations are the same, the plan produced for the first invocation of the query can be reused for subsequent invocations. You can confirm this by inspecting the plans associated with the query in cache:
FROM sys.syscacheobjects -- in 2000 query master
WHERE SQL LIKE '%89CE9E3A-943A-42F2-AEB6-
AND sql NOT LIKE '%sys%';
Table 2 shows the output of this query.
Note that there’s only one plan in cache, and the usecounts attribute indicates that it was used twice. This implementation has great performance benefits over the dynamic SQL approach that I discussed in “Handling Arrays” (InstantDoc ID 100453). This method optimizes the query only once, then reuses the cached plan. Also, because there’s only one plan in cache, the procedure creates no memory overhead. In addition, because this implementation of the procedure uses a static query, there’s no exposure to SQL injection, and the solution is therefore secure. Finally, because the fn_split function returns the position of the element in the array (pos attribute), you can sort the rows in the output by that position by adding ORDER BY pos to the query.
Using a T-SQL split function to handle arrays as inputs is a good solution—the approach is secure and doesn’t negatively affect performance. However, the implementation is unnecessarily complex. Because you should try to develop solutions that are as simple and straightforward as possible, you should consider additional alternatives. Next month I’ll present an approach that uses a CLR version of the split function, as well as an implementation based on SQL Server 2008’s table valued parameters.