ADO.NET tops dynamic SQL for generating variable-width crosstab reports
A tabular report lists information in rows, like the rows of a database table. The column headers in a tabular report correlate to table column names. A cross-tabulation report (crosstab report, for short) is a two-dimensional matrix or spreadsheet that has look-up criteria listed across the top in the column headers and down the left side in row headers. The data you're looking up—which is usually summarized by an aggregate function such as SUM(), AVERAGE(), or COUNT()—occupies the inner cells of the matrix.
For example, suppose you work for the world-famous Pubs bookstore chain, and the CEO wants to know how well each of the company's stores are selling the books of the company's authors. You could create a 3-column tabular report by using a view like the one that Listing 1, page 22, creates. This view, vwSales, lists total book sales grouped by store and by author, with each author-store-sales tuple listed in one row. Figure 1, page 22, shows the tabular report that Listing 1's view generates. However, your report would be easier to read and would convey more information if you presented it as a crosstab report, such as the one that Figure 2, page 22, shows. The crosstab report lists the store names horizontally in column headers instead of alongside the author names in the rows. So, to find sales for author Stearns MacFeather at Bookbeat, you'd look at the cross section of the Stearns MacFeather row and the Bookbeat column.
Crosstab reports fall into two categories: fixed-width and variable-width. For fixed-width reports, you know at design time the number and names of the columns in the report. Using a T-SQL query to produce a fixed-width crosstab report is straightforward because you can hard-code a CASE expression embedded in an aggregate function to evaluate each column of output. Listing 2 shows an example of using CASE expressions in a fixed-width crosstab query for our Pubs bookstore application. The first CASE expression in Listing 2 says: If the value in the Store column equals "Barnum's," then return the value in the Sales column; otherwise, return 0. The crosstab query uses SUM() functions in its column expressions, so there's no need to pre-aggregate the data in the view it uses for its data source. So, in its FROM clause, the crosstab query in Listing 2 uses the vwSales2 view that Listing 3 creates, instead of vwSales. If you're using COUNT() or AVG() as your aggregate function, leave "Else 0" off the CASE expression; otherwise, the answers will average or count transactions that have sales values of 0, which don't really exist. When you drop "Else 0" from the CASE expression, you get NULL values instead of zeros for author-store combinations for which no records exist.
But let's say that Pubs has a hundred bookstores, with new ones opening and others closing each month. For this situation, you need a variable-width crosstab report that dynamically reads the store names from the data and produces a column for each one, no matter how many exist that month. Let's explore two very different methods of producing variable-width crosstab reports. The first example uses dynamic SQL in a stored procedure to create a crosstab query string that contains CASE expressions. The EXEC command executes the query string to return the report. Instead of CASE expressions or dynamic SQL, the second example uses the new relational features in ADO.NET—the data-access component in Visual Studio .NET—to cross-tabulate the data. Along the way, I point out advantages and disadvantages of each method.
Using Dynamic SQL
The stored procedure procXTabDSQL, which Listing 4 shows, uses the vwSales2 view and a local cursor that loops through a list of store names to generate a crosstab query string. The query string starts with the row header as the first column. The procedure then opens the cursor on the list of store names. The cursor fetches the individual store names into the variable @StoreName. Each iteration generates a string containing a SUM() function wrapped around a CASE expression, then concatenates that string to the varchar variable @strSQL. Each SUM() function generates a column of output in the final report.
The CASE expression in the first SUM() function is for the Barnum's bookstore column of the report. SQL Server interprets the embedded apostrophe in "Barnum's" as the delimiter for the comparison string and tries to compare the data stored in stor_name to Barnum instead of Barnum's. To fix this problem, the code must replace the single quote in the name with two single quotes. The first quote in a double-singlequote sequence is an escape character for the second quote; it tells the SQL Server string parser to interpret the second quote as a character rather than a string delimiter. The apostrophe in the column label \[Barnum's\] is OK because it's a column label embedded in square column brackets. Listing 2 contains the correct CASE expression for the Barnum's column of the report.
The difficulty grows when you try to generate this SQL query from a string that is itself delimited with single quotes. You must provide an escape sequence for the comparison string's leading and trailing single quotes. The comparison string is contained in a variable, so you must programmatically replace its single quote with two single quotes by using a REPLACE() function. Unfortunately, the REPLACE() function also uses single quotes as string delimiters for its string-literal arguments. So to replace one single quote with two single quotes in the @StoreName variable, you have to use the awkward syntax
The ASCII number for the single quote is 39. To improve the readability of the CASE expressions in Listing 4's code, I selectively use CHAR(39) in place of some of the single quotes.
After the code has finished looping through the store names, it concatenates the rest of the SQL string to the varchar variable. When the SQL string for the crosstab query is complete, the code calls the EXECUTE procedure to generate the crosstab result set.
Even though you've overcome the single-quotes problem, you must still contend with the 8000-character limit of varchar variables. If the dynamic SQL string exceeds 8000 characters, the query will crash. You might think that 8000 characters is enough for any query you would generate, but the final SQL string can become quite long, depending on the number of columns and the character length of each CASE expression. Our example's query will crash at about 100 columns. You can programmatically limit the number of columns so that the query doesn't crash, but if you need more than the limit, you're out of luck.
One way to overcome this limit is by leveraging ADO.NET's relational features. The following example generates a variable-width crosstab report with as many columns as you need without using SQL cursors, dynamic SQL, or CASE expressions.
Listing 5, page 24, shows this example's Visual Basic .NET code, which describes a class called XTabDotNetDS. Notice the Inherits DataSet statement in callout A at the top of the class. If you aren't familiar with ADO.NET, a DataSet is like a miniature in-memory database. The DataSet object has a Tables collection. An ADO.NET table is called a DataTable. Each DataTable has a Rows collection, which contains DataRow objects, and a Columns collection, which contains DataColumn objects. The DataSet object also has a Relations collection that contains DataRelation objects, which describe the referential relationships between the DataTables. The Inherits DataSet statement ensures that the custom DataSet class starts with all the collections and capabilities of the ADO.NET DataSet base class. When the XTabDotNetDS object is created, the constructor for the class calls three procedures: Get3Tables(), AddXTabTable(), and FillXTabTable().
Get3Tables(). The Get3Tables() procedure that callout B in Listing 5 shows uses a SqlDataAdapter object to populate the DataSet with records returned from the stored procedure that Listing 6, page 24, creates. The SqlDataAdapter object in .NET serves as a bridge between a SQL Server data source and a DataSet object. The stored procedure from Listing 6 returns the three source tables. When SqlDataAdapter calls multiple tables from SQL Server into a newly created DataSet, ADO.NET assigns the generic names Table, Table1, Table2, and so on to the new DataTables. Renaming the tables to something meaningful at this point is a good idea so that you don't have to reference the indexes or the generic names when using the tables. I chose Stores, Authors, and Sales for table names. Note that you don't need to write code that explicitly opens and closes the SQL connection object when you're using SqlDataAdapter; SqlDataAdapter handles it for you behind the scenes. After adding the tables, the code creates a DataRelation between the Authors table and the Sales table on the au_name columns. Eventually, the code iterates through the Authors table to create the row headers of the crosstab report. For each Author row header, the code uses this DataRelation object to find the sales data to populate the rest of the columns in that Author's DataRow in the crosstab table.
AddXTabTable(). The AddXTabTable() procedure at callout C adds a fourth DataTable, XTab, to the Tables collection. This table will hold the final crosstab report. After adding the DataTable, the code creates the schema for it by adding columns to its Columns collection. The first column in XTab, Column(0), is for row headers, which will contain Author names. The other column names will be the store names from the Stores DataTable. To get the store names, you can create a DataView.
The DataView constructor takes the Stores DataTable as its parameter. The DataView is a wrapper around the DataTable object. The code can still access all the DataTable's properties through the DataView, but the DataView adds important methods you don't otherwise have. In this case, all you want is for the DataView.Sort method to put the stores in alphabetical order. After the code sorts the stores in the DataView, the code loops through the rows and appends a column for each store to our XTab DataTable. Store(0), Barnum's, becomes the column name of Column(1). Note that the code sets the data types of the new columns to System.Single to hold the currency data and makes 0 the default value.
FillXTabTable(). All that's left is to fill the XTab DataTable with data by using the FillXTabTable() procedure that the code at callout D shows. The procedure begins by dimensioning the object variables, then loops through the rows in the Authors table. For each row in the Authors table, the code creates a new DataRow for the XTab DataTable and puts the author's name in the first column—Column(0). The rest of the columns in the new row contain zeros by default.
Next, the code invokes the Author DataRow's GetChildRows method on the DataRelation object to retrieve from the Sales table an array of DataRows that are linked to the author. The array (arowSales) contains one DataRow for each store that had sales for that author. For example, GetChildRows returns a one-element array for author Sheryl Hunter, who had sales only at Barnum's bookstore.
The code then loops through the author's sales DataRows and places each sales total under the XTab column identified in the stor_name field of that sales DataRow. Stores for which the author has no sales retain the default value, 0. When the XTab row is finished, the code adds it to the DataTable's Rows collection and returns to the top of the loop to process the next author. When all the authors are processed, the XTab DataTable is finished, as Figure 3 shows.
To view the results in a .NET Windows Forms DataGrid, you can assign the XTab DataTable to the DataGrid's DataSource. If you want the DataGrid to be sorted when it first appears, you can use the code that Listing 7 shows to wrap the table in a DataView, sort the DataView, and assign the DataView to the DataGrid's DataSource.
Two Roads Diverge
These two approaches to creating dynamic crosstab reports produce similar—but not identical—results. The ADO.NET version includes a row for every author in the Authors table, whereas the dynamic SQL version excludes authors who have zero sales across all stores. This difference could be important depending on the purpose of the report. The ADO.NET version gives you the flexibility, through code, to exclude authors with no child records in the Sales table. One way is to simply check the count of rows that the Author DataRow's GetChildRows method returns. If no sales records exist, the code doesn't add the new row to the crosstab DataTable. The converse, getting the SQL crosstab query to return rows of authors with no sales, is more difficult, but possible. To include all authors in the dynamic SQL version, you might decide to start by adding into the source data at least one dummy sales record that has a sales value of 0 for each author record that otherwise has no sales. Unfortunately, this approach won't work if you're using COUNT() as the summarizing function because those dummy sales will count, too, and you don't want that. The way around this problem is to pre-aggregate the source data by using the summarizing function you want, then add the dummy records, as long as every author gets at least one. Then, in the crosstab query, always use the SUM() function. The code that Listing 8 shows creates a function, dbo.fn_Sales(), that UNIONs the pre-aggregated records from vwSales with a set of dummy records—one for each author-store pair—which you generate by using a cross join. If you replace dbo.vwSales2 in Listing 4's crosstab query with dbo.fn_Sales(), the final result is the same as the one you achieved with ADO.NET: All authors will appear in the report.
An advantage of the dynamic SQL approach is that it doesn't require the Microsoft .NET Framework. If you're using Visual Studio 6.0, for example, you can easily create an ADO recordset from the dynamic SQL crosstab table and bind it to a flex-grid control. On the other hand, the ADO.NET approach isn't necessarily tied to SQL Server. You can easily adapt it for use with databases that don't support CASE expressions, such as Microsoft Access.
Cursors in SQL Server are slow and expensive in terms of resources; they're a last resort when no other alternative is practical. Dynamic SQL is inefficient because SQL Server doesn't reuse the query plans it generates from directly executed SQL strings. You also need to consider security issues when executing SQL strings. I avoid cursors and dynamic SQL wherever possible, and the ADO.NET solution lets me do just that. When I use ADO.NET, I don't worry about the 8000-character limit of varchar variables, and I have better control over my output than with T-SQL, as I mentioned earlier. So as long as I'm working in the .NET Framework, I'll use the ADO.NET solution.