SQL Server provides a basic set of aggregation functions, such as MIN, MAX, and SUM. However, when you need other aggregations, you need to develop your own custom solutions. For example, you might need a custom aggregation solution for string concatenation to generate, say, an itinerary of flights all in one string. Or, you might need other aggregation solutions such as aggregate product (to calculate compound interest rates, for example), aggregate median (for statistical analysis), or aggregate bitwise operations (to calculate effective user permissions, for instance).
In SQL Server 2000, you can choose from three custom aggregation techniques: cursor-based, pivoting, and specialized. SQL Server 2005 adds another technique for calculating custom aggregates: using a Common Language Runtime (CLR)-based solution. Each technique has its pros and cons.This month, I begin a series of columns that explores each type of custom aggregation technique, starting with cursor-based solutions. I use string concatenation as the sample custom aggregation task in all the articles. String concatenation involves some technical considerations that other custom aggregation tasks might not include, such as the frequent requirement to perform the concatenation in a specified order and in a limited-length target string.
To make sure you choose the appropriate custom aggregation solution for a given task, you need to first develop criteria for evaluating the various techniques. I usually ask the following evaluation questions to determine whether a custom aggregation solution meets my needs:
- Is the solution generic? A generic custom aggregation solution is like a template—you can quickly apply a generic solution to other custom aggregation requests.
- Does the solution support an unlimited number of elements in a group? Does it have any special limitations?
- Can you apply the aggregation to the elements in a desired order?
- How well does the solution perform?
- Is the solution set-based?
After describing each solution, I'll evaluate it according to these factors.
Let's look at two concatenation tasks to see how to implement them using the cursor-based aggregation technique. Using the Northwind database, your first task is to return each customer and, for each, a concatenated list of order IDs expressed as a comma-separated list of values and sorted by OrderID. Table 1 shows an abbreviated example of the desired result.
The ideal solution would be generic, unlimited in the number of elements or string size it can handle, ordered (if required), fast, and set-based. Imagine that SQL Server provided such a string-concatenation function that would even let you specify the order in which to concatenate elements.The solution would look something like the following pseudo code:
CONCAT(',' + CAST(OrderID
ORDER BY OrderID),
1, 1, ') AS ArrOrders
GROUP BY CustomerID;
However, because SQL Server doesn't provide a CONCAT function, you have to build your own.
Your second task, also using Northwind, is to aggregate distinct values, returning an array of distinct employee IDs for each customer. Because an employee might handle multiple orders for each customer, the employee ID can appear more than once per customer in the Orders table. However, you need each employee ID to appear in the result array only once. Table 2 shows an abbreviated example of the desired result.
The cursor-based technique for implementing custom aggregation is straightforward. Listing 1 shows the solution for your first task (string concatenation). Listing 2 shows the solution for your second task ( distinct string concatenation). Both solutions use similar code.The only difference is the query on which you apply the cursor.
The first solution in Listing 1 declares a cursor for the following query:
ORDER BY CustomerID, OrderID;
The query simply returns all CustomerID, OrderID combinations from all rows. The query first sorts the rows by CustomerID (the grouping column) to organize the data by the group, then sorts by OrderID to concatenate the OrderIDs in an ordered fashion.
The second solution in Listing 2 declares a cursor for the following query:
ORDER BY CustomerID, EmployeeID;
Using the DISTINCT keyword, the query returns all distinct combinations of CustomerID, EmployeeID, meeting the requirement to concatenate distinct EmployeeIDs for each customer. Other than that, the solutions are identical.
To get a sorted array, the cursor retrieves data sorted by the grouping column ( Customer ID) and, within the group, by the element that it's concatenating (either OrderID or EmployeeID).The code fetches a row at a time and continues concatenating elements, adding a leading comma to each element. When the group changes, the code stores in a table variable a row for the previously handled group. Within that row, the code stores the group ID (CustomerID) and the concatenated string, minus the first comma. Row by row, the cursor technique builds the result in the table variable.
Let's evaluate the cursor-based solution the factors I listed previously. The cursor technique is generic.You can treat a cursor solution as a template, using similar code when you need to implement other custom aggregation solutions. This technique also supports an unlimited number of elements in a group. The resulting concatenated strings aren't limited in size beyond the data type limitation. In addition, you can apply the cursor-based aggregation to the elements in a desired order.
However, the row-by-row manipulation that cursors perform incurs a lot of overhead. Compared to set-based code that scans the same amount of data, cursors are usually much slower. In addition, because cursor solutions are procedural and not set-based, cursors typically involve more code than set-based solutions.
Join me next month when I explore the pivoting technique. Until then, check out this month's logical puzzle and test your troubleshooting skills in "Catch That Bug!"