A common complaint I hear from my development team is, “Why can’t T-SQL do that?” As a member of a small development shop for a sizable healthcare company, the line is often blurred between my official role as DBA and my ad-hoc role as database developer. I bridge the gap between our web/report developers’ skill set and the latest offerings from SQL Server’s ever-increasing tool set, demonstrating how a new feature lets you approach a problem differently and more efficiently. One such SQL Server 2005 feature is the SQL Common Language Runtime (CLR) in which SQL objects can be developed in the .NET Framework, utilizing robust languages such as C# and VB.NET. I've gravitated to C#, which I believe is the more intuitive language.
The C# language augments T-SQL’s set-based strengths by providing procedural efficiencies and expanded data structures such as multi-dimensional arrays. How many times have you found yourself creating temp tables with cursors in T-SQL to mimic the utility of an array? This article demonstrates how leveraging the CLR and C# arrays can easily provide a solution to a common report requirement which is otherwise cumbersome to develop in T-SQL: query result-set transposition.
The Need to Transpose Data
The company I work for provides healthcare to the elderly. The financial/clinical data and subsequent enterprise reports are not only critical to the well being of the company but are necessary to ensure the high-quality care our patients deserve. Often, this data is date-driven, meaning the query displays the dates in a column; however, the report layout requires that the dates be displayed in the first row as the column header, as Figure 1 shows. It's quickly apparent that the report simply needs the data from the source query to be flipped so that the rows become the columns and the columns become the rows. In other words, we need to transpose the data.
In Microsoft Excel, this process is simple: Either use the Transpose function or paste-transpose the data. However, in T-SQL, there isn't an easy method using built-in objects to accomplish a transposition. Like many database shops, we eagerly tried out SQL Server 2005's new T-SQL PIVOT and UNPIVOT features, and although we found great utility with their use, the PIVOT/UNPIVOT process doesn't solve the transpose question: We aren't pivoting data into aggregation buckets. Herein lies our developers’ complaint: “Why can’t we simply transpose data in T-SQL, just switching rows and columns, as we can do in Excel?”
The process seems simple enough, but I can understand why Microsoft doesn't have a transpose function in T-SQL. Because row data will mostly not be of the same data type, the transposed column data type can't be determined without some assumptions of the output. So, rather than wait for Microsoft to provide a solution, why not use the tools we have to create our own transpose solution? If the data could be stored in an array, where manipulation is granular, the transposition would be straightforward. The CLR and C# offers flexibility not found in T-SQL.
Using the SQL CLR and C# to Transpose Data
Our goal is to create a general solution that takes any query the caller has permission context to execute and transpose the result set. Our solution will be a CLR stored procedure written in C# which accepts a single string parameter of the caller’s query and returns a transposed result set. The overall strategy for solving this problem can be used as a template to attack general CLR data problems:
- Obtain the data set by executing a T-SQL query within the CLR environment
- Manipulate your data
- Generate a result set, scalar value, and report messages back to the caller
- Handle errors
After you've mastered this basic use of the CLR, you have the basis and subsequent confidence to tackle a whole new world of complex data questions. When faced with a problem, your solution environment is no longer confined to T-SQL but is expanded to the CLR toolsets.
Examining a Simple Transpose
Figure 2 shows a simple query and a transposition of its result set. When considering the result set of a query, we typically don't consider the column header as a row. However, in this example, we observe that the column names are also involved in the transpose and are therefore, for our purposes, considered a row. As a consequence, our 3-by-4 result set is transposed to a 4-by-3 result set.
The data from the query is stored in the two-dimensional array, queryData\[i, j\], where the i index references the row location and the j index references the column location. Note that array-elements are zero-based referenced: The first position is referenced by 0, the second position by 1, the third by 2, and so on. Therefore, the expression queryData\[1,5\] refers to the data value located at the intersection of the second row (1) and the sixth column (5).
With the data stored in the queryData array, we can now follow the transpose of a single element using the array notation, as you see in Figure 3. The data element 10.78 is contained in dataArray\[1,2\]. When the element is transposed, it moves to the zero-based location of \[2,1\]. If the transposed data is stored in a new array transposedData\[i, j\], we have the relationship that Figure 4 shows. Notice that the indices are simply switched. To generalize this, tranposedData\[i, j\] = queryData\[j, i\], which is the basis of our solution
Assumptions Made for the Solution
Often, assumptions are made in the problem to narrow the scope of a solution. In this case, two assumptions are made:
- Limit number of queried rows to 2048—When an array is constructed, the size must be declared; the size of the queryData\[i, j\] array is the number of rows and columns in our result set. We do know the number of columns of the result set but not the number of rows at the time the array is constructed. Hence, this value is arbitrary and configurable in the variable declaration section. In practice, I've never had a business need to transpose more than 180 rows; in esting, I've transposed 5,000 rows by 784 columns of clinical data—which is over 3.9 million data elements.
- The transposed result set columns’ data type will be character-string—The 100-character limit is configurable in the declaration section and can be as large as 8,000. This limit has met our reporting requirements. But this limitation begs the question, Is this character-string assumption reasonable? Consider this transpose that Figure 5 shows. What is the data type of the transposed column? DATETIME, VARCHAR, NUMERIC, or INT? To address this dilemma, we choose the safe route of constructing the transposed result set columns with the data type of VARCHAR(100) and casting the data appropriately. Moreover, the motivation of the transpose functionality is to fulfill a data-reporting requirement in which the presentation is text-based; any required aggregation can be performed in the input query itself.
The CLR/C# Transpose Stored Procedure
Using the general strategy that I've described, the diagram in Figure 6 outlines the program workflow. The code is divided in to five sections:
- Declare variables
- Execute caller's query and store data
- Transpose the data
- Output the data back to caller
- Handle errors
Listing 1 (below) provides the C# code in its entirety. We bypass the variable declarations and focus on the remaining sections.
Section 2: Executing Caller's Query and Storing the Data
In this section, the caller’s query is executed and the result set is stored in an array. The first order of business is to open a connection to the SQL service: First create a SqlConnection object, then create a SqlCommand object using the caller’s query and the connection object, as you see in Callout A. The connection then opens. Take note of the connection string context connection=true. With this connection, the query is executed in the same context of the caller, in both location and security.
Next, the query is executed and is assigned to a SqlDataReader object. When the ExecuteReader method is called, the query execution begins, but data isn't read. Before we begin reading the data rows, column metadata such as count, names, and data type can be obtained; the FieldCount method is used to determine the column count, and the GetName method is used for the column names, as you see in Callout B.
Using the Read method of the reader object, the queryData array is filled with the result set, as you see in Callout C. To maintain the total number of rows, the rowCount variable increments each time a row is read.
Once the outer loop is finished reading the rows, the queryData array is now populated with the query’s result set. To recap what is known, the first row of the queryData array contains the column names, the remaining array rows contain the queried data, rowCount holds row count (plus 1 to account for the column-header row), and columnCount holds the column count of the result set.
Section 3: Transposing the Data
The next order of business is to transpose the data into the new array transposedData. To help with readability, new row and column count variables are created and used to size the new transposedData array, as you see in Callout D. The new row count is the old column count, and the new column count is the old row count.
Loop through the queryData array, transposing the data into tranposedData. The outer loop transverses the rows and the inner loop transverses the columns, as you see in Callout E.
The transpose is achieved with the tranposedQueryData\[i, j\] = queryData\[j, i\] statement that was gleaned from the “Examining a Simple Transpose” section. The transposedData array now contains the transposed data.
Section 4: Outputting the Transposed Result Set Back to the Caller
We're now ready to output, or pipe, the result set back to the caller. The basic strategy for constructing a result set back to the caller is as follows:
- Construct a SqlMetaData array, filling each column element with a column name, data type, and data size.
- Construct a SqlDataRecord object, using the SqlMetaData array.
- Begin the result set output by calling the SqlContext.Pipe.SendResultsStart method, passing in the SqlDataRecord.
- Output the rows of the result set, one row at a time. Fill each element of the SqlDataRecord one column at a time and pipe the record to the caller with the SqlContext.Pipe.SendResultsRow method.
- Finish the result set ,calling the SqlContext.Pipe.SendResultsEnd method.
Utilizing this strategy, construct the SqlMetaData array and define each column, looping through the column names, as you see in Callout F. This is where the VARCHAR(100) assumption is used in the SqlDbType.VarChar, maxDataSize reference . Remember, the first row of the transposedData\[0,j\] array contains the transposed column names.
Now, construct the SqlDataRecord and start the output, as you see in Callout G. Loop through the row and columns of the transposedData array, and send the row to the caller, as you see in Callout H. Finally, end the result set and send the caller the message Transpose complete. Messages are sent to the caller using the SqlContext.Pipe.Send method, as you see in Callout I.
Section 5: Handle Errors
With CLR programming, it's important to handle errors gracefully and meaningfully. Errors from this procedure typically stem from a malformed input query. In the example that Figure 7 shows, the s is missing from the sys.objects table reference.
The Try/Catch construction handles any errors that might occur during the entire transpose process, as you see in Callout J. The CLR Transpose procedure is now complete, and control is returned to the Caller.
Compile and Deploy the Transpose CLR Stored Procedure
Deploying the Transpose CLR stored procedure to your database can be summarized by three steps: Compile the C# code to a DLL library file, create an assembly in the database using the DLL, and create the CLR stored procedure referencing the assembly. The Visual Studio 2005/2008 Professional edition makes life easy for you by performing all three steps within the Visual Studio environment. However, if you don't own the Professional edition, you can use the free Microsoft C# Visual Studio Express edition to compile your code; the remaining two steps are T-SQL commands executed on your database. The download files associated with this article will guide you through the exercise.
Using the Transpose Stored Procedure
The syntax for the Transpose stored procedure is
EXEC Transpose @query = \[query to be transformed\]
If the query contains single quotes to denote strings, substitute the single quotes with two single quotes, as Figure 8 shows.
Experiment with the various types of queries: simple, relational, and distributed. Also, intentionally create mistakes in the query syntax to view how the Try/Catch handles errors.
Extend Your T-SQL Environment
This Transpose procedure provides a simple exercise in using the SQL CLR to extend your T-SQL environment to solve a common problem that's otherwise cumbersome to solve in traditional T-SQL. Once you've mastered executing queries, processing data, and generating result sets within the CLR/C# environment, you'll gain the confidence to tackle more complex problems.
Listing 1: The CLR/C# Transpose Stored Procedure
- USING System;
- USING System.DATA;
- USING System.DATA.SqlClient;
- USING System.DATA.SqlTypes;
- USING Microsoft.SqlServer.Server;
- public partial class MyStoredProcedureClass
- public static void Transpose(SqlString queryParameter)
- // SECTION 1: Variable declarations
- INT maxNumberofRows = 2048;
- INT rowCount = 1;
- INT columnCount = 0;
- INT transposedRowCount = 0;
- INT transposedColumnCount = 0;
- INT maxDataSize =100;
- string callersQuery = queryParameter.ToString();
- string\[,\] queryData;
- string\[,\] transposedData;
- // .NET SQL objects. These objects will GET instantiated later IN the code.
- SqlConnection conn;
- SqlCommand comm;
- SqlDataReader dataReader;
- SqlMetaData\[\] transposedColumns;
- SqlDataRecord rowRecord;
- // SECTION 2 : EXECUTE Caller's query and store data
- BEGIN CALLOUT A
- conn = new SqlConnection("context connection=true;");
- comm = new SqlCommand(callersQuery, conn);
- END CALLOUT A
- BEGIN CALLOUT B
- dataReader = comm.ExecuteReader();
- columnCount = dataReader.FieldCount;
- queryData = new string\[maxNumberofRows, columnCount\];
- for (int j = 0; j < columnCount; j++)
- queryData\[0,j\] = dataReader.GetName(j);
- END CALLOUT B
- BEGIN CALLOUT C
- while (dataReader.Read())
- for (int j= 0;j < columnCount; j++)
- queryData\[rowCount,j\] = dataReader\[j\].ToString();
- END CALLOUT C
- // SECTION 3: Transpose the data
- BEGIN CALLOUT D
- transposedRowCount = columnCount;
- transposedColumnCount = rowCount;
- transposedData = new string\[transposedRowCount, transposedColumnCount\];
- END CALLOUT D
- BEGIN CALLOUT E
- for (int i = 0; i < transposedRowCount; i++)
- for (int j = 0; j < transposedColumnCount; j++)
- transposedData\[i, j\] = queryData\[j,i\];
- END CALLOUT E
- // SECTION 4: Ouput the data back to Caller
- BEGIN CALLOUT F
- transposedColumns = new SqlMetaData\[transposedColumnCount\];
- for (int j = 0; j < transposedColumnCount; j++)
- = new SqlMetaData(transposedData\[0, j\], SqlDbType.VarChar, maxDataSize);
- END CALLOUT F
- BEGIN CALLOUT G
- rowRecord = new SqlDataRecord(transposedColumns);
- END CALLOUT G
- BEGIN CALLOUT H
- for (int i = 1; i < transposedRowCount; i++)
- for (int j = 0; j < transposedColumnCount; j++)
- rowRecord.SetSqlString(j, transposedData\[i, j\]);
- END CALLOUT H
- BEGIN CALLOUT I
- SqlContext.Pipe.Send("Transpose complete.");
- END CALLOUT I
- // SECTION 5: Handle errors
- BEGIN CALLOUT J
- catch (Exception e)
- SqlContext.Pipe.Send("There was a problem. \n\nException Report: ");
- END CALLOUT J