A couple of weeks ago I posted the T-SQL challenge
Grouping consecutive rows with a common element”
(http://www.sqlmag.com/article/articleid/93462/sql_server_blog_93462.html).

I got many interesting solutions via e-mail. We were supposed to give away
two prizes: one winner was supposed to be chosen based on the fastest
correct solution, and another winner was supposed to be chosen randomly
out of those who provided correct solutions.

The fastest solutions ran for about 4-5 seconds against the big table, but since
I got several solutions that ran that fast (Maciej Pilecki, Dieter Noeth, Michael
Yang and Sania Chan), I had to randomly choose a winner out of those. So
the winner under this category is Dieter Noeth; you will get one of my
T-SQL books, courtesy of MSPress.
As for the category of correct solutions regardless of performance, I got
correct solutions from: Steve Kass, Maciej Pilecki, Chris Gin, Dieter Noeth,
Tung H. Nguyen, Steve Dassin, Michael Yang, Craig Bennett,
Alejandro Mesa, Rajeev Lahoty and Sania Chan (apologies if I missed
anyone). The winner chosen randomly in this category is Craig Bennett; you
will get a SQL Server Magazine hat. I will send your e-mail addresses to
SQL Server Magazine’s editors, and they will get in touch with you to send
you the prizes.

Here I’ll present two of the approaches used in the fast solutions.

One approach was to calculate for each row from the Attendance table a
grouping element (call it grp) that uniquely identifies a consecutive group of
rows. The expression that calculates grp is:

ROW_NUMBER() OVER(PARTITION BY student                                                    ORDER BY dt, slot) -                                  ROW_NUMBER() OVER(PARTITION BY student, attend                                                    ORDER BY dt, slot) AS grp
The trick here is that each set of consecutive rows with the same attendance
status gets a unique value that you can later use for grouping. As for the fact
that precedence among rows is based on the combination of columns dt, slot,
the trick is to concatenate the two into one value. This can be achieved with
different techniques. One technique is to use binary concatenation:
CAST(dt AS BINARY(8)) + CAST(slot AS BINARY(4)) AS dt_slot
Another technique is to add the slot value to dt as a time unit (e.g., second):
DATEADD(second, slot, DT) AS dt_slot
Once dt and slot are concatenated, of course you can calculate the min and
max in each group, and then break the min and max apart to the two elements.
Here’s the complete solution using binary concatenation:
WITH CTE_Grp                              AS                              (                                SELECT student, attend,                                  CAST(dt AS BINARY(8)) + CAST(slot AS BINARY(4)) AS dt_slot,                                  ROW_NUMBER() OVER(PARTITION BY student                                                    ORDER BY dt, slot) -                                  ROW_NUMBER() OVER(PARTITION BY student, attend                                                    ORDER BY dt, slot) AS grp                                FROM dbo.Attendance                              ),                              CTE_RAW                              AS                              (                                SELECT                                  student,                                  MIN(dt_slot) AS min_dt_slot,                                  MAX(dt_slot) AS max_dt_slot,                                  attend,                                  COUNT(*) AS cnt                                FROM CTE_Grp                                GROUP BY student, attend, grp                              )                              SELECT                                student,                                CAST(SUBSTRING(min_dt_slot, 1, 8) AS DATETIME) AS from_dt,                                CAST(SUBSTRING(min_dt_slot, 9, 4) AS INT) AS from_slot,                                CAST(SUBSTRING(max_dt_slot, 1, 8) AS DATETIME) AS to_dt,                                CAST(SUBSTRING(max_dt_slot, 9, 4) AS INT) AS to_slot,                                attend,                                cnt                              FROM CTE_RAW;
And here’s the solution that uses datetime based concatenation:
WITH CTE_Grp                              AS                              (                                SELECT student, attend,                                  DATEADD(second, slot, DT) AS dt_slot,                                  ROW_NUMBER() OVER(PARTITION BY student                                                    ORDER BY dt, slot) -                                  ROW_NUMBER() OVER(PARTITION BY student, attend                                                    ORDER BY dt, slot) AS grp                                FROM dbo.Attendance                              ),                              CTE_RAW                              AS                              (                                SELECT                                  student,                                  MIN(dt_slot) AS min_dt_slot,                                  MAX(dt_slot) AS max_dt_slot,                                  attend,                                  COUNT(*) AS cnt                                FROM CTE_Grp                                GROUP BY student, attend, grp                              )                              SELECT                                student,                                DATEADD(day, DATEDIFF(day, 0, min_dt_slot), 0) AS from_dt,                                DATEPART(second, min_dt_slot)                  AS from_slot,                                DATEADD(day, DATEDIFF(day, 0, max_dt_slot), 0) AS to_dt,                                DATEPART(second, max_dt_slot)                  AS to_slot,                                attend,                                cnt                              FROM CTE_RAW;
Another approach was to:
- Assign row numbers to the rows in the Attendance table based on dt, slot
ordering (partitioned by student)
- Join two instances of the previous result-set to match for each student’s
row the next row (row with the same student and a row number greater by
one)
- Filter rows where the attendance status changes
Here you need to be careful not to lose the first/last row for each student. To
get around this problem some solutions used an outer join. An interesting
technique used by Michael Yang was to add sentinel rows; one before the
first student’s row and another after the last student’s row.
- Assign row numbers to the remaining rows based on dt, slot ordering
(partitioned by student)
- Join two instances of the previous result-set based on a match in student
and an offset of one between the row numbers
- Collect the from and to elements from each row, and calculate the count as
the offset between the originally calculated row numbers

Here’s the complete solution using the sentinel rows approach:
WITH AddIn as -- sentinel rows                              (                                SELECT student, dt, 1 AS slot, -1 AS attend,                                  CASE dt WHEN '19000101' THEN 0 ELSE cnt + 1 END AS pseudodate                                FROM (SELECT student, COUNT(*) AS cnt                                      FROM dbo.Attendance                                      GROUP BY student) AS D1,                                  (SELECT CAST('19000101' AS DATETIME) AS dt                                   UNION ALL                                   SELECT CAST('99991231' AS DATETIME)) AS D2                              ),                              Pseudodates AS -- assign row numbers                              (                                SELECT student, dt, slot, attend,                                  ROW_NUMBER() OVER(PARTITION BY student                                                    ORDER BY dt, slot) AS pseudodate                                FROM dbo.Attendance                                                               UNION ALL                                                                SELECT student, dt, slot, attend, pseudodate FROM AddIn                              ),                              -- match current/next rows                              -- keep only pairs where status changes                              -- assign new row numbers                              CurNxt AS                              (                                SELECT                                  Cur.student     AS student1,                                  Cur.dt         AS dt1,                                  Cur.slot       AS slot1,                                  Cur.attend     AS attend,                                  Cur.pseudodate AS pseudo1,                                  Nxt.dt         AS dt2,                                  Nxt.slot       AS slot2,                                  Nxt.pseudodate AS pseudo2,                                  ROW_NUMBER() OVER(PARTITION BY Cur.student                                                    ORDER BY Cur.pseudodate) AS seq                                FROM Pseudodates AS Cur                                  JOIN Pseudodates AS Nxt                                    ON Cur.student = Nxt.student                                    AND Cur.pseudodate = Nxt.pseudodate - 1                                    AND Cur.attend != Nxt.attend                              )                              -- match rows with offset of one between row numbers                              -- collect from/to elements                              -- calculate count                              SELECT                                F.student1 AS student,                                F.dt2 AS from_dt,                                F.slot2 AS from_slot,                                T.dt1 AS to_dt,                                T.slot1 AS to_slot,                                T.attend AS attend,                                T.pseudo1 - F.pseudo1 AS cnt                              FROM CurNxt AS F                                JOIN CurNxt T                                  ON F.student1 = T.student1                                  AND T.seq - 1 = F.seq;
Cheers,
--
BG