A couple of weeks ago I posted the T-SQL challenge

“Grouping
consecutive rows with a common element”

(http://www.sqlmag.com/article/articleid/93462/sql_server_blog_93462.html).

I got many interesting solutions via e-mail. We were supposed to give away

two prizes: one winner was supposed to be chosen based on the fastest

correct solution, and another winner was supposed to be chosen randomly

out of those who provided correct solutions.

The fastest solutions ran for about 4-5 seconds against the big table, but since

I got several solutions that ran that fast (Maciej Pilecki, Dieter Noeth,
Michael

Yang and Sania Chan), I had to randomly choose a winner out of those. So

the winner under this category is Dieter Noeth; you will get one of my

T-SQL books, courtesy of MSPress.

As for the category of correct solutions regardless of performance, I got

correct solutions from: Steve Kass, Maciej Pilecki, Chris Gin, Dieter Noeth,

Tung H. Nguyen, Steve Dassin, Michael Yang, Craig Bennett,

Alejandro Mesa, Rajeev Lahoty and Sania Chan (apologies if I missed

anyone). The winner chosen randomly in this category is Craig Bennett; you

will get a SQL Server Magazine hat. I will send your e-mail addresses to

SQL Server Magazine’s editors, and they will get in touch with you to send

you the prizes.

Here I’ll present two of the approaches used in the fast solutions.

One approach was to calculate for each row from the Attendance table a

grouping element (call it grp) that uniquely identifies a consecutive group of

rows. The expression that calculates grp is:

ROW_NUMBER() OVER(PARTITION BY student ORDER BY dt, slot) - ROW_NUMBER() OVER(PARTITION BY student, attend ORDER BY dt, slot) AS grpThe trick here is that each set of consecutive rows with the same attendance

status gets a unique value that you can later use for grouping. As for the fact

that precedence among rows is based on the combination of columns dt, slot,

the trick is to concatenate the two into one value. This can be achieved with

different techniques. One technique is to use binary concatenation:

CAST(dt AS BINARY(8)) + CAST(slot AS BINARY(4)) AS dt_slotAnother technique is to add the slot value to dt as a time unit (e.g., second):

DATEADD(second, slot, DT) AS dt_slotOnce dt and slot are concatenated, of course you can calculate the min and

max in each group, and then break the min and max apart to the two elements.

Here’s the complete solution using binary concatenation:

WITH CTE_Grp AS ( SELECT student, attend, CAST(dt AS BINARY(8)) + CAST(slot AS BINARY(4)) AS dt_slot, ROW_NUMBER() OVER(PARTITION BY student ORDER BY dt, slot) - ROW_NUMBER() OVER(PARTITION BY student, attend ORDER BY dt, slot) AS grp FROM dbo.Attendance ), CTE_RAW AS ( SELECT student, MIN(dt_slot) AS min_dt_slot, MAX(dt_slot) AS max_dt_slot, attend, COUNT(*) AS cnt FROM CTE_Grp GROUP BY student, attend, grp ) SELECT student, CAST(SUBSTRING(min_dt_slot, 1, 8) AS DATETIME) AS from_dt, CAST(SUBSTRING(min_dt_slot, 9, 4) AS INT) AS from_slot, CAST(SUBSTRING(max_dt_slot, 1, 8) AS DATETIME) AS to_dt, CAST(SUBSTRING(max_dt_slot, 9, 4) AS INT) AS to_slot, attend, cnt FROM CTE_RAW;And here’s the solution that uses datetime based concatenation:

WITH CTE_Grp AS ( SELECT student, attend, DATEADD(second, slot, DT) AS dt_slot, ROW_NUMBER() OVER(PARTITION BY student ORDER BY dt, slot) - ROW_NUMBER() OVER(PARTITION BY student, attend ORDER BY dt, slot) AS grp FROM dbo.Attendance ), CTE_RAW AS ( SELECT student, MIN(dt_slot) AS min_dt_slot, MAX(dt_slot) AS max_dt_slot, attend, COUNT(*) AS cnt FROM CTE_Grp GROUP BY student, attend, grp ) SELECT student, DATEADD(day, DATEDIFF(day, 0, min_dt_slot), 0) AS from_dt, DATEPART(second, min_dt_slot) AS from_slot, DATEADD(day, DATEDIFF(day, 0, max_dt_slot), 0) AS to_dt, DATEPART(second, max_dt_slot) AS to_slot, attend, cnt FROM CTE_RAW;Another approach was to:

- Assign row numbers to the rows in the Attendance table based on dt, slot

ordering (partitioned by student)

- Join two instances of the previous result-set to match for each student’s

row the next row (row with the same student and a row number greater by

one)

- Filter rows where the attendance status changes

Here you need to be careful not to lose the first/last row for each student. To

get around this problem some solutions used an outer join. An interesting

technique used by Michael Yang was to add sentinel rows; one before the

first student’s row and another after the last student’s row.

- Assign row numbers to the remaining rows based on dt, slot ordering

(partitioned by student)

- Join two instances of the previous result-set based on a match in student

and an offset of one between the row numbers

- Collect the from and to elements from each row, and calculate the count as

the offset between the originally calculated row numbers

Here’s the complete solution using the sentinel rows approach:

WITH AddIn as -- sentinel rows ( SELECT student, dt, 1 AS slot, -1 AS attend, CASE dt WHEN '19000101' THEN 0 ELSE cnt + 1 END AS pseudodate FROM (SELECT student, COUNT(*) AS cnt FROM dbo.Attendance GROUP BY student) AS D1, (SELECT CAST('19000101' AS DATETIME) AS dt UNION ALL SELECT CAST('99991231' AS DATETIME)) AS D2 ), Pseudodates AS -- assign row numbers ( SELECT student, dt, slot, attend, ROW_NUMBER() OVER(PARTITION BY student ORDER BY dt, slot) AS pseudodate FROM dbo.Attendance UNION ALL SELECT student, dt, slot, attend, pseudodate FROM AddIn ), -- match current/next rows -- keep only pairs where status changes -- assign new row numbers CurNxt AS ( SELECT Cur.student AS student1, Cur.dt AS dt1, Cur.slot AS slot1, Cur.attend AS attend, Cur.pseudodate AS pseudo1, Nxt.dt AS dt2, Nxt.slot AS slot2, Nxt.pseudodate AS pseudo2, ROW_NUMBER() OVER(PARTITION BY Cur.student ORDER BY Cur.pseudodate) AS seq FROM Pseudodates AS Cur JOIN Pseudodates AS Nxt ON Cur.student = Nxt.student AND Cur.pseudodate = Nxt.pseudodate - 1 AND Cur.attend != Nxt.attend ) -- match rows with offset of one between row numbers -- collect from/to elements -- calculate count SELECT F.student1 AS student, F.dt2 AS from_dt, F.slot2 AS from_slot, T.dt1 AS to_dt, T.slot1 AS to_slot, T.attend AS attend, T.pseudo1 - F.pseudo1 AS cnt FROM CurNxt AS F JOIN CurNxt T ON F.student1 = T.student1 AND T.seq - 1 = F.seq;Cheers,

--

BG