Author’s Note: I’d like to thank Adam Machanic, Dejan Sarka, and Eladio Rincón for providing preliminary code samples and advice regarding the CLR-based solutions.

In the past couple of months I started a series of articles covering solutions to running aggregates. This series focuses on analyzing performance aspects of the various solutions—in particular their algorithmic complexity. In other words, this series explores the way the solutions scale when different variables such as the number of aggregates, partitions, or rows per partition change. In "Subqueries and Joins for Running Aggregates" I presented the problem and discussed set-based solutions using subqueries and joins, and in "Set-Based vs. Cursor-Based Solutions for Running Aggregates" I covered a solution based on T-SQL cursors. This month I present a Common Language Runtime (CLR)-based solution and explain the circumstances in which it would perform better than the other solutions.

Getting Started

For your convenience, Web Listing 1, Web Listing 2, and Web Listing 3 contain the code necessary to create and populate the Sales table that is used in the article’s examples. Run the code in Web Listing 1 to create the table. Run the code in Web Listing 2 to create the GetNums function that is used to populate the Sales table with sample data. Use the code in Web Listing 3 to populate the Sales table with sample data, adjusting the number of partitions (employees) and number of rows per partition (days per employee) based on your needs.

Web Listing 1: DDL Statement to Create Sales Table
SET NOCOUNT ON;
USE tempdb;

IF OBJECT_ID('dbo.Sales', 'U') IS NOT NULL DROP TABLE dbo.Sales;

CREATE TABLE dbo.Sales
(
  empid INT      NOT NULL,                -- partitioning column
  dt    DATETIME NOT NULL,                -- ordering column
  qty   INT      NOT NULL DEFAULT (1),    -- measure 1
  val   MONEY    NOT NULL DEFAULT (1.00), -- measure 2
  CONSTRAINT PK_Sales PRIMARY KEY(empid, dt)
);
GO
Web Listing 2: DDL Statement to Create GetNums Function
IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL
  DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)
  SELECT n FROM Nums WHERE n <= @n;
GO
Web Listing 3: Code to Populate Sales Table with Sample Data
DECLARE
  @num_partitions     AS INT,
  @rows_per_partition AS INT,
  @start_dt           AS DATETIME;

SET @num_partitions     = 10000;
SET @rows_per_partition = 10;
SET @start_dt = '20090101';

TRUNCATE TABLE dbo.Sales;

INSERT INTO dbo.Sales WITH (TABLOCK) (empid, dt)
  SELECT NP.n AS empid, DATEADD(day, RPP.n - 1, @start_dt) AS dt
  FROM dbo.GetNums(@num_partitions) AS NP
    CROSS JOIN dbo.GetNums(@rows_per_partition) AS RPP;

As a quick reminder, the Sales table contains a row for each employee and date, with the employee ID, sales date, quantity, and value. The running aggregate that is used as the basis for the performance analysis is a running sum of quantity (or value) per employee and date; that is, for each employee and date, calculate the total quantity from the beginning of the employee’s activity until the current date.

CLR-Based Solution

SQL Server 2005 introduced CLR integration, which lets you develop routines such as functions and procedures using .NET code. CLR integration complements T-SQL in areas where it is weak, such as string manipulation, iterative logic, and so on. For the most part, T-SQL is the preferable option when the task at hand involves data manipulation, both in terms of performance and in terms of code complexity. However, there are a few uncommon cases that involve data manipulation tasks in which CLR-based solutions sometimes outperform T-SQL solutions. Running aggregates is such an example. The reasons that in certain cases CLR-based solutions can outperform T-SQL solutions have to do with the way the optimizer currently treats set-based solutions, and because of lack of support for ANSI SQL set-based language elements that lend themselves to better optimization. I’ll present the CLR-based solution and compare its performance to the solutions that I provided in previous columns. In a future article I’ll describe the missing language elements and explain why they have great potential to perform better than any other solution.

Listing 1 shows a CLR-based solution for our problem using C# code.

Listing 1: CLR-Based Solution Using C#
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class StoredProcedures
\{
  \[Microsoft.SqlServer.Server.SqlProcedure\]
  public static void SalesRunningSum()
  \{
    using (SqlConnection conn = new SqlConnection("context connection=true;"))
    \{
      SqlCommand comm = new SqlCommand();
      comm.Connection = conn;
      comm.CommandText = @"" +
          "SELECT empid, dt, qty " +
          "FROM dbo.Sales " +
          "ORDER BY empid, dt;";

      SqlMetaData\[\] columns = new SqlMetaData\[4\];
      columns\[0\] = new SqlMetaData("empid", SqlDbType.Int);
      columns\[1\] = new SqlMetaData("dt", SqlDbType.DateTime);
      columns\[2\] = new SqlMetaData("qty", SqlDbType.Int);
      columns\[3\] = new SqlMetaData("sumqty", SqlDbType.BigInt);

      SqlDataRecord record = new SqlDataRecord(columns);

      SqlContext.Pipe.SendResultsStart(record);

      conn.Open();

      SqlDataReader reader = comm.ExecuteReader();

      SqlInt32 prvempid = 0;
      SqlInt64 sumqty = 0;

      while (reader.Read())
      \{
        SqlInt32 empid = reader.GetSqlInt32(0);
        SqlInt32 qty = reader.GetSqlInt32(2);
 
        if (empid == prvempid)
        \{
          sumqty += qty;
        \}
        else
        \{
          sumqty = qty;
        \}

        prvempid = empid;

        record.SetSqlInt32(0, reader.GetSqlInt32(0));
        record.SetSqlDateTime(1, reader.GetSqlDateTime(1));
        record.SetSqlInt32(2, qty);
        record.SetSqlInt64(3, sumqty);

        SqlContext.Pipe.SendResultsRow(record);
      \}

      SqlContext.Pipe.SendResultsEnd();
    \}
  \}
\};

The solution defines a stored procedure called SalesRunningSum that calculates a running sum of quantity for each employee and date. Listing 2 has the Visual Basic version of the solution in case that's your language of preference.

Listing 2: CLR-Based Solution Using Visual Basic
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server

Partial Public Class StoredProcedures
  <Microsoft.SqlServer.Server.SqlProcedure()> _
  Public Shared Sub SalesRunningSum()

    Using conn As New SqlConnection("context connection=true")
      Dim comm As New SqlCommand
      comm.Connection = conn
      comm.CommandText = "" & _
          "SELECT empid, dt, qty " & _
          "FROM dbo.Sales " & _
          "ORDER BY empid, dt;"

      Dim columns() As SqlMetaData = New SqlMetaData(3) \{\}
      columns(0) = New SqlMetaData("empid", SqlDbType.Int)
      columns(1) = New SqlMetaData("dt", SqlDbType.DateTime)
      columns(2) = New SqlMetaData("qty", SqlDbType.Int)
      columns(3) = New SqlMetaData("sumqty", SqlDbType.BigInt)

      Dim record As New SqlDataRecord(columns)

      SqlContext.Pipe.SendResultsStart(record)

      conn.Open()

      Dim reader As SqlDataReader = comm.ExecuteReader

      Dim prvempid As SqlInt32 = 0
      Dim sumqty As SqlInt64 = 0

      While (reader.Read())
        Dim empid As SqlInt32 = reader.GetSqlInt32(0)
        Dim qty As SqlInt32 = reader.GetSqlInt32(2)

        If (empid = prvempid) Then
          sumqty = sumqty + qty
        Else
          sumqty = qty
        End If

        prvempid = empid

        record.SetSqlInt32(0, reader.GetSqlInt32(0))
        record.SetSqlDateTime(1, reader.GetSqlDateTime(1))
        record.SetSqlInt32(2, qty)
        record.SetSqlInt64(3, sumqty)

        SqlContext.Pipe.SendResultsRow(record)
      End While

      SqlContext.Pipe.SendResultsEnd()
    End Using
 
  End Sub
End Class

The procedure’s code defines a SqlConnection object called conn that uses the current user’s connection to SQL Server by specifying the option context connection=true. The code then defines a SqlCommand object called comm that uses conn as its connection. The code sets the CommandText property of comm to a query against the Sales table that retrieves the sales rows (employee ID, date, and quantity) ordered by employee ID and date. The code then defines an array called columns containing four SqlMetaData objects with the definitions of the four result columns that the procedure will return in its result set: empid, dt, qty, and sumqty. The code continues by defining a SqlDataRecord object called record based on columns. The code then marks the beginning of the result set that will be streamed to the client by invoking the SqlContext.Pipe.SendResultsStart method based on the record parameter, and opens the connection.

From this point on the code uses a data reader (SqlDataReader object called reader) to iterate through the records returned by the query defined earlier in comm’s CommandText property. In each iteration the code accumulates the quantity in a variable called sumqty as long as the employee ID didn’t change; otherwise it overwrites the value of sumqty with the current row’s quantity. The code sets the four attributes of the result record with the current employee ID, date, quantity, and running sum of quantity, and sends the result row to the client by invoking the SqlContext.Pipe.SendResultsRow method based on the record parameter. Finally, the code marks the end of the result set and returns the pipe to its initial state by invoking the SqlContext.Pipe.SendResultsEnd method.

You might be curious why I used a stored procedure to implement the solution rather than a table-valued function (TVF); after all, because the output is a result set, it would be more natural to use a TVF and consume the result from an outer query. The reason for this is that a stored procedure is more efficient with large result sets or under load because it relies on a "streaming" implementation that uses memory only for one row at a time. As for a TVF solution, the Context Connection can be open only in the actual TVF method, not in the Fill Row method. In the Fill Row method you must fill an intermediate result collection, which requires allocation of a lot of memory.

Deploy the SalesRunningSum stored procedure in the tempdb database. For details about deploying user-defined assemblies in SQL Server, see "5 Steps for Developing and Deploying CLR Code in SQL Server."

Assuming you already created and populated the Sales table, use the following code while connected to tempdb to test the procedure:

EXEC dbo.SalesRunningSum;

Performance

If you think about it, the CLR-based solution is just another version of a cursor—only a much faster one compared with the T-SQL cursor. The reason the CLR data reader is faster than the T-SQL cursor is that the overhead of each record manipulation of the former is lower than that of the latter. Recall that I expressed the cost of previous solutions based on elements such as number of partitions (represented by the letter p), average number of rows per partition (r), and number of aggregations to calculate (a). I used the letter o to express the overhead that is associated with the manipulation of each record using a T-SQL cursor. You can then express the overhead of each record manipulation with a CLR data reader as c, where c is smaller than o. Soon you will be able to tell how much smaller c is than o based on performance tests.

The cost of processing p partitions with average partition size of r rows using a T-SQL cursor can be expressed as pro. Similarly, you can express the cost of the CLR-based solution as prc.

Last month I mentioned that the performance of the T-SQL cursor-based solution isn’t significantly affected when you increase the number of aggregations (a). As expected, my performance tests show the same applies to the CLR-based solution. To give you a sense, I got only about 15 percent performance degradation when calculating four aggregates instead of one.

As for increasing the number of partitions (p), as expected, the performance degradation was linear—which makes sense, because increasing the number of partitions by a factor of f should give you the cost pfrc. Figure 1 shows the results of three performance tests comparing the set-based solution using subqueries, the T-SQL cursor-based solution, and our new CLR-based solution. I started with 10,000 partitions with a constant partition size of 10, and kept increasing the number of partitions by 10,000 every time until I reached 100,000 partitions.

Figure 1: Graph showing effect of number of partitions

The graph clearly shows that all three solutions have linear complexity with respect to an increase in the number of partitions. You can make two other interesting observations by inspecting the graph. First, the CLR-based solution is more than four times faster than the T-SQL cursor-based solution. Second, with our very small partition size of 10 rows, the set-based solution using subqueries is faster than the CLR-based solution.

The cost of the set-based solution is expressed as p(r + r2)/2. With respect to an increase in the partition size (r) by a factor of f, its complexity is close to quadratic—p(rf + (rf)2)/2. With the cursor solution, the complexity is linear, because pro becomes prfo. The performance tests that I covered last month showed that up to a partition size of about 500 rows the set-based solution was more efficient than the cursor-based solution, whereas the reverse was true with a partition size of more than 500 rows. Figure 2 shows the effect that changing the partition size has on all three solutions. I used a constant number of partitions (1,000) and partition sizes varying between 10 and 1,000. Interestingly, the point where the CLR-based solution becomes more efficient than the set-based one is about 15 rows per partition.

Figure 2: Graph showing effect of change in partition size

When to Use It

The CLR-based solution to running aggregates has close to constant complexity with respect to an increase in the number of aggregates to be calculated, and linear complexity with respect to an increase in number of partitions or in partition size. In my performance tests, the CLR-based solution was more than four times faster than the T-SQL cursor-based solution. Up to a partition size of 15 rows, the set-based solution was faster than the CLR solution, and the reverse was true beyond 15 rows. If performance is your main goal, you should use the set-based solution only with very small partition sizes; otherwise, you should use the CLR-based solution. Next month I’ll cover another solution to running aggregates that involves nested iterations, and I’ll explain which language elements that address running aggregates exist in standard SQL but aren’t yet implemented in SQL Server 2008.