Produce pivoted, denormalized output from normalized data
Editor's Note: Congratulations to Darren Brinksneader from A Technological Advantage, who submitted the T-SQL Black Belt solution that contributing editor Itzik Ben-Gan based this article on. Darren will receive $100 and a T-SQL Black Belt shirt. Send your experts-only T-SQL tips to Itzik Ben-Gan at firstname.lastname@example.org.
Crosstab queries let you store your data in a normalized manner but also let you produce pivoted, denormalized output from that data. In other words, crosstab queries let you rotate rows to columns to see different summaries of the source data. Suppose you occasionally want to get a pivoted view of your orders in which each row represents a different month, each column represents a different year, and each cell holds the number of orders for the appropriate month and year intersection. Storing your data in such a way in a table would break the first normal form, which disallows repeating groups. Crosstab queries provide a solution to that problem, but they are limited in that you usually need to know the number of destination columns (in this example, the number of order years) to write your query. But if the number of destination columns is unknown, you can use the crosstab query solution that reader Darren Brinksneader submitted and that I expand on in this article. Darren's solution uses dynamic execution to work around the crosstab query's limitation.
Learn more from "Generating a Crosstab Report" and "
Suppose you want to provide a pivoted view of the number of orders by month and year based on the Orders table in the Northwind database. If you know beforehand the possible year values, you can issue the query that Listing 1 shows.
The CASE expression, wrapped with the SUM aggregate function, adds 1 to the appropriate column only for rows that fall in the year that the column represents. The rows are grouped by month, so each row in the result will hold a different month, and each column will store the count of orders for the year that the column represents. This method is a bit tricky because the SUM function's purpose here is to perform a count, but keep in mind that you add 1s and not an actual value, which is stored in the table. You can use crosstab queries that perform other aggregate computations, such as summing values, instead of counting the number of qualifying occurrences. For example, the query in Listing 2 calculates the total quantity pivoted by store ID and order year for the Sales table in the Pubs database. Note that you sum the qualifying quantity values here instead of summing 1s. This method gives you the total quantity for each combination of store ID and order year instead of the number of qualifying rows.
Both of the previous examples rely on your knowing beforehand the values for the destination columns. But what if you don't know those values? Furthermore, as your data changes, you might need to add new columns to the result. So, you'll constantly need to be aware of the table's content and adjust your queries accordingly—unless you use dynamic execution.
Darren Brinksneader from A Technological Advantage submitted a solution that generates a crosstab view of the data in the MagicSquares table, which I presented in "Using T-SQL with Magic Squares," August 2000. The cool thing about Darren's solution is that it's dynamic—in other words, it works for any number of destination columns without requiring prior knowledge of the table's data. I have extended Darren's solution and provided a stored procedure that generates a crosstab view for any table.
Let's walk through the code for the sp_CrossTab stored procedure step by step. First, take a look at the stored procedure's header, which Listing 3 shows. Note that all the elements involved are dynamic, including the table name (@table), the column that holds the values to group by (@onrows), and the column that stores the values that will appear as the destination columns (@oncols). The @onrows and @oncols parameters can accept either a column name or an expression—for example, CustomerID or MONTH (OrderDate), respectively. If @onrows is an expression, you can supply an alias for the destination column in the @onrowsalias parameter so that the grouping column won't appear with no name in the result. The @sumcol parameter lets you specify a name for the column whose values will be summed for each intersection of @onrows and @oncols (which is how the query in Listing 2 calculated the qty column). If you don't supply a value for @sumcol, the stored procedure will perform a count of qualifying rows.
Let's proceed to the stored procedure's first step—generating the beginning of the dynamic SQL string, as Listing 4 shows. Step 1 starts the SQL string that will eventually produce the crosstab output. Beginning the string is a straightforward process: You store the SELECT clause, followed by the grouping column and its alias, if one was supplied.
Step 2, which Listing 5 shows, stores all the distinct key values that the stored procedure will use as the destination columns in a temporary table. Note that in this case, you don't know the data type of the grouping column beforehand, but you need to provide a data type for the column in the temporary table. I chose the nvarchar data type because I can safely assume that the stored procedure will use a numeric or character-based grouping column. You can convert both numeric and character-based values to the nvarchar data type.
Step 3 contains the stored procedure's core logic. The code in Step 3, which Listing 6 shows, iterates through all the key values in the temporary table in a loop and uses those values to generate the destination columns. The code appends those columns to the SQL string that the stored procedure gradually builds. The values that the CASE expression evaluates are also converted to the nvarchar data type to compare them with the key values from the temporary table, which are nvarchar data type values. You could skip casting these values to nvarchar because SQL Server performs an implicit casting from both numerical and non-Unicode values to Unicode values. But I don't think anyone should rely on implicit casting, so I perform an explicit casting to be safe.
Step 4, which Listing 7 shows, simply wraps up the dynamically built SQL string by adding to it the FROM, GROUP BY, and ORDER BY clauses and executing it by using the EXEC command. Now you can use the stored procedure. Listing 8 shows a few sample invocations.
Good News, Bad News
Darren's solution, which I've expanded on in this article, brings good news and bad news. The bad news is that dynamic execution is far from elegant and isn't very readable. The good news is that the stored procedure encapsulates the SQL string, so after you write the string, the string isn't visible. Also, when you create the sp_CrossTab stored procedure in the Master database and prefix the procedure with sp_, you can invoke the procedure from any database. Listing 8 shows only a few examples of how to invoke this stored procedure, but you can use the procedure for any combination of table, rows, and columns that you want.