A couple of weeks ago I posted the T-SQL challenge

“Grouping
consecutive rows with a common element”

(http://www.sqlmag.com/article/articleid/93462/sql_server_blog_93462.html).

I got many interesting solutions via e-mail. We were supposed to give away

two prizes: one winner was supposed to be chosen based on the fastest

correct solution, and another winner was supposed to be chosen randomly

out of those who provided correct solutions.

The fastest solutions ran for about 4-5 seconds against the big table, but since

I got several solutions that ran that fast (Maciej Pilecki, Dieter Noeth,
Michael

Yang and Sania Chan), I had to randomly choose a winner out of those. So

the winner under this category is Dieter Noeth; you will get one of my

T-SQL books, courtesy of MSPress.

As for the category of correct solutions regardless of performance, I got

correct solutions from: Steve Kass, Maciej Pilecki, Chris Gin, Dieter Noeth,

Tung H. Nguyen, Steve Dassin, Michael Yang, Craig Bennett,

Alejandro Mesa, Rajeev Lahoty and Sania Chan (apologies if I missed

anyone). The winner chosen randomly in this category is Craig Bennett; you

will get a SQL Server Magazine hat. I will send your e-mail addresses to

SQL Server Magazine’s editors, and they will get in touch with you to send

you the prizes.

Here I’ll present two of the approaches used in the fast solutions.

One approach was to calculate for each row from the Attendance table a

grouping element (call it grp) that uniquely identifies a consecutive group of

rows. The expression that calculates grp is:

ORDER BY dt, slot) -

ROW_NUMBER() OVER(PARTITION BY student, attend

ORDER BY dt, slot) AS grp

status gets a unique value that you can later use for grouping. As for the fact

that precedence among rows is based on the combination of columns dt, slot,

the trick is to concatenate the two into one value. This can be achieved with

different techniques. One technique is to use binary concatenation:

max in each group, and then break the min and max apart to the two elements.

Here’s the complete solution using binary concatenation:

AS

(

SELECT student, attend,

CAST(dt AS BINARY(8)) + CAST(slot AS BINARY(4)) AS dt_slot,

ROW_NUMBER() OVER(PARTITION BY student

ORDER BY dt, slot) -

ROW_NUMBER() OVER(PARTITION BY student, attend

ORDER BY dt, slot) AS grp

FROM dbo.Attendance

),

CTE_RAW

AS

(

SELECT

student,

MIN(dt_slot) AS min_dt_slot,

MAX(dt_slot) AS max_dt_slot,

attend,

COUNT(*) AS cnt

FROM CTE_Grp

GROUP BY student, attend, grp

)

SELECT

student,

CAST(SUBSTRING(min_dt_slot, 1, 8) AS DATETIME) AS from_dt,

CAST(SUBSTRING(min_dt_slot, 9, 4) AS INT) AS from_slot,

CAST(SUBSTRING(max_dt_slot, 1, 8) AS DATETIME) AS to_dt,

CAST(SUBSTRING(max_dt_slot, 9, 4) AS INT) AS to_slot,

attend,

cnt

FROM CTE_RAW;

AS

(

SELECT student, attend,

DATEADD(second, slot, DT) AS dt_slot,

ROW_NUMBER() OVER(PARTITION BY student

ORDER BY dt, slot) -

ROW_NUMBER() OVER(PARTITION BY student, attend

ORDER BY dt, slot) AS grp

FROM dbo.Attendance

),

CTE_RAW

AS

(

SELECT

student,

MIN(dt_slot) AS min_dt_slot,

MAX(dt_slot) AS max_dt_slot,

attend,

COUNT(*) AS cnt

FROM CTE_Grp

GROUP BY student, attend, grp

)

SELECT

student,

DATEADD(day, DATEDIFF(day, 0, min_dt_slot), 0) AS from_dt,

DATEPART(second, min_dt_slot) AS from_slot,

DATEADD(day, DATEDIFF(day, 0, max_dt_slot), 0) AS to_dt,

DATEPART(second, max_dt_slot) AS to_slot,

attend,

cnt

FROM CTE_RAW;

- Assign row numbers to the rows in the Attendance table based on dt, slot

ordering (partitioned by student)

- Join two instances of the previous result-set to match for each student’s

row the next row (row with the same student and a row number greater by

one)

- Filter rows where the attendance status changes

Here you need to be careful not to lose the first/last row for each student. To

get around this problem some solutions used an outer join. An interesting

technique used by Michael Yang was to add sentinel rows; one before the

first student’s row and another after the last student’s row.

- Assign row numbers to the remaining rows based on dt, slot ordering

(partitioned by student)

- Join two instances of the previous result-set based on a match in student

and an offset of one between the row numbers

- Collect the from and to elements from each row, and calculate the count as

the offset between the originally calculated row numbers

Here’s the complete solution using the sentinel rows approach:

(

SELECT student, dt, 1 AS slot, -1 AS attend,

CASE dt WHEN '19000101' THEN 0 ELSE cnt + 1 END AS pseudodate

FROM (SELECT student, COUNT(*) AS cnt

FROM dbo.Attendance

GROUP BY student) AS D1,

(SELECT CAST('19000101' AS DATETIME) AS dt

UNION ALL

SELECT CAST('99991231' AS DATETIME)) AS D2

),

Pseudodates AS -- assign row numbers

(

SELECT student, dt, slot, attend,

ROW_NUMBER() OVER(PARTITION BY student

ORDER BY dt, slot) AS pseudodate

FROM dbo.Attendance

UNION ALL

SELECT student, dt, slot, attend, pseudodate FROM AddIn

),

-- match current/next rows

-- keep only pairs where status changes

-- assign new row numbers

CurNxt AS

(

SELECT

Cur.student AS student1,

Cur.dt AS dt1,

Cur.slot AS slot1,

Cur.attend AS attend,

Cur.pseudodate AS pseudo1,

Nxt.dt AS dt2,

Nxt.slot AS slot2,

Nxt.pseudodate AS pseudo2,

ROW_NUMBER() OVER(PARTITION BY Cur.student

ORDER BY Cur.pseudodate) AS seq

FROM Pseudodates AS Cur

JOIN Pseudodates AS Nxt

ON Cur.student = Nxt.student

AND Cur.pseudodate = Nxt.pseudodate - 1

AND Cur.attend != Nxt.attend

)

-- match rows with offset of one between row numbers

-- collect from/to elements

-- calculate count

SELECT

F.student1 AS student,

F.dt2 AS from_dt,

F.slot2 AS from_slot,

T.dt1 AS to_dt,

T.slot1 AS to_slot,

T.attend AS attend,

T.pseudo1 - F.pseudo1 AS cnt

FROM CurNxt AS F

JOIN CurNxt T

ON F.student1 = T.student1

AND T.seq - 1 = F.seq;

--

BG