Much has been said already about assignment SELECT statements that perform multi-row variable assignment with an ORDER BY clause. See Paul Nielsen’s blog on the subject as an example. The basic concept is to perform an aggregate calculation in desired order. A very common use of this technique is to perform string concatenation. Consider the following code as an example:

DECLARE @s AS VARCHAR(MAX);
SET @s = '';

SELECT @s = @s + col2 + ';'
FROM dbo.T1
ORDER BY col1;

SELECT @s;

Some expect this code to return a string with the concatenated col2 values in col1 ordering. I’m not aware of any official documentation that describes how such code should behave. The documentation only refers to single row assignments. Some (including me) are reluctant to rely on this technique, and prefer to use alternatives that have guaranteed, documented behavior (e.g., using FOR XML PATH). Reasoning for distrusting the undocumented techniques is similar to what I describe here regarding the similar multi-row UPDATE with variables technique.

A few days ago I got a beautiful example from Aviv Zucker from Intel demonstrating that this technique cannot be trusted. We simplified the repro and narrowed down the optimization reasoning in the cases where it doesn’t generate the “expected” result. In “expected,” of course I’m referring to the expectation some have for this technique to return all concatenated values based on specified ordering. Personally, I don’t have any expectations from this technique.

I got the same behavior on SQL Server 2008+SP1+CU6 and SQL Server 2005+SP3.

First, here’s code to create a helper function called GetNums that returns a sequence of integers, as well as a table called T1 populated with 100 rows of sample data:

SET NOCOUNT ON;
USE tempdb;
GO

-- Helper function GetNums to generate a sequence of numbers
IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
  SELECT TOP (@n) n FROM Nums ORDER BY n;
GO

-- Create a table called T1 and populate it with sample data
IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

CREATE TABLE dbo.T1
(
  col1   INT IDENTITY NOT NULL,
  col2   VARCHAR(100) NOT NULL,
  filler CHAR(2000) NULL,
  CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED(col1)
);

INSERT INTO dbo.T1(col2)
  SELECT 'String ' + CAST(n AS VARCHAR(10))
  FROM dbo.GetNums(100) AS Nums;
GO

Currently the table is organized as a B-tree with the clustered index key being col1. There are no clustered indexes defined on the table.

As the first test I ran the following code:

-- Test 1, with ORDER BY
DECLARE @s AS VARCHAR(MAX);
SET @s = '';

SELECT @s = @s + col2 + ';'
FROM dbo.T1
ORDER BY col1;

SELECT @s;
GO

I got the following output that I referred to earlier as the “expected” output by some:

String 1;String 2;String 3;String 4;String 5;String 6;String 7;String
8;String 9;String 10;String 11;String 12;String 13;String 14;String
15;String 16;String 17;String 18;String 19;String 20;String 21;String
22;String 23;String 24;String 25;String 26;String 27;String 28;String
29;String 30;String 31;String 32;String 33;String 34;String 35;String
36;String 37;String 38;String 39;String 40;String 41;String 42;String
43;String 44;String 45;String 46;String 47;String 48;String 49;String
50;String 51;String 52;String 53;String 54;String 55;String 56;String
57;String 58;String 59;String 60;String 61;String 62;String 63;String
64;String 65;String 66;String 67;String 68;String 69;String 70;String
71;String 72;String 73;String 74;String 75;String 76;String 77;String
78;String 79;String 80;String 81;String 82;String 83;String 84;String
85;String 86;String 87;String 88;String 89;String 90;String 91;String
92;String 93;String 94;String 95;String 96;String 97;String 98;String
99;String 100;

Here’s the graphical execution plan I got for this code:

And here’s the textual plan:

|--Compute Scalar(DEFINE:(\[Expr1003\]=(\[@s\]+\[tempdb\].\[dbo\].\[T1\].\[col2\])+';'))
 |--Clustered Index Scan(OBJECT:(\[tempdb\].\[dbo\].\[T1\].\[PK_T1\]), ORDERED FORWARD)

Observe that the clustered index on col1 was scanned in an ordered fashion, and then the variable assignment took place. It was a matter of optimization that things were performed in this specific order. But if we assume for a moment that this technique is not supposed to guarantee any specific order of assignment, then if it made more sense in terms of performance in certain cases to first deal with the scalar computation and then with the ordering, the result might not reflect the specified ordering. Not only this, you might end up getting an intermediate state of the concatenation. That’s exactly what the next test demonstrates. First, create a covering index on (col2, col1) and then rerun the multi-row assignment with ORDER BY code:

-- Test 2, with ORDER BY, after adding covering nc index
CREATE NONCLUSTERED INDEX idx_nc_col2 ON dbo.T1(col2, col1);
GO

DECLARE @s AS VARCHAR(MAX);
SET @s = '';

SELECT @s = @s + col2 + ';'
FROM dbo.T1
ORDER BY col1;

SELECT @s;
GO

This time I got the following output:

String 100;

Clearly, the result does not contain the concatenated values. So what happened? The answer lies in the code’s execution plan:

Here’s the textual form of the plan:

|--Sort(ORDER BY:(\[tempdb\].\[dbo\].\[T1\].\[col1\] ASC))
 |--Compute Scalar(DEFINE:(\[Expr1003\]=(\[@s\]+\[tempdb\].\[dbo\].\[T1\].\[col2\])+';'))
  |--Index Scan(OBJECT:(\[tempdb\].\[dbo\].\[T1\].\[idx_nc_col2\]))

The optimizer figured that with such a small set of rows it’s cheaper to scan the data from the covering nonclustered index in no particular order, then apply the computation, and then sort the data by col1. The key is in the fact that the sorting happens after the computation and not before it. It seems that the output reflects an intermediate state of the concatenation. To me it’s not that the output is incorrect here since I’ve never seen any official documentation that states what is considered correct result for such code. But yeah, if based on your intuition and observations you expected to get the concatenated col2 values in col1 ordering, here’s evidence why this technique should not be trusted. Better stick to the guaranteed and supported techniques like the one using the FOR XML PATH option.

Lastly, consider the following test without an ORDER BY clause:

-- Test 3, without ORDER BY
DECLARE @s AS VARCHAR(MAX);
SET @s = '';

SELECT @s = @s + col2 + ';'
FROM dbo.T1;

SELECT @s;
GO

I get the following output:

String 1;String 10;String 100;String 11;String 12;String 13;String
14;String 15;String 16;String 17;String 18;String 19;String 2;String
20;String 21;String 22;String 23;String 24;String 25;String 26;String
27;String 28;String 29;String 3;String 30;String 31;String 32;String
33;String 34;String 35;String 36;String 37;String 38;String 39;String
4;String 40;String 41;String 42;String 43;String 44;String 45;String
46;String 47;String 48;String 49;String 5;String 50;String 51;String
52;String 53;String 54;String 55;String 56;String 57;String 58;String
59;String 6;String 60;String 61;String 62;String 63;String 64;String
65;String 66;String 67;String 68;String 69;String 7;String 70;String
71;String 72;String 73;String 74;String 75;String 76;String 77;String
78;String 79;String 8;String 80;String 81;String 82;String 83;String
84;String 85;String 86;String 87;String 88;String 89;String 9;String
90;String 91;String 92;String 93;String 94;String 95;String 96;String
97;String 98;String 99;

Obviously there’s no guarantee for ordering here. Whether you want to rely on this technique or not to return all values concatenated when order doesn’t matter to you is up to you. I haven’t stumbled into a case where I didn’t get all concatenated values, but who’s to say there isn’t one.

Again, I’d like to thank Aviv Zucker for sending me his initial example!

Cheers,
BG