Solutions to Packing Date and Time Intervals Puzzle

A couple of months ago I posted a puzzle involving packing date and time intervals. You can find the details of the puzzle here, and further clarifications here. I promised to post the solutions to the puzzle by the first day of spring, so here goes…

I provided both a small set of sample data to test the validity of the solution, and a large set to test its performance. Here’s the small set again:

SET NOCOUNT ON;
USE tempdb;

IF OBJECT_ID('dbo.Sessions') IS NOT NULL DROP TABLE dbo.Sessions;

CREATE TABLE dbo.Sessions
(
  id        INT          NOT NULL IDENTITY(1, 1),
  username  VARCHAR(14)  NOT NULL,
  starttime DATETIME2(3) NOT NULL,
  endtime   DATETIME2(3) NOT NULL,
  CONSTRAINT PK_Sessions PRIMARY KEY(id),
  CONSTRAINT CHK_endtime_gteq_starttime
    CHECK (endtime >= starttime)
);

-- indexes
CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);
CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);

-- sample data (small)
INSERT INTO dbo.Sessions VALUES
  ('User1', '20111201 08:00:00.000', '20111201 08:30:00.000'),
  ('User1', '20111201 08:30:00.000', '20111201 09:00:00.000'),
  ('User1', '20111201 09:00:00.000', '20111201 09:30:00.000'),
  ('User1', '20111201 10:00:00.000', '20111201 11:00:00.000'),
  ('User1', '20111201 10:30:00.000', '20111201 12:00:00.000'),
  ('User1', '20111201 11:30:00.000', '20111201 12:30:00.000'),
  ('User2', '20111201 08:00:00.000', '20111201 10:30:00.000'),
  ('User2', '20111201 08:30:00.000', '20111201 10:00:00.000'),
  ('User2', '20111201 09:00:00.000', '20111201 09:30:00.000'),
  ('User2', '20111201 11:00:00.000', '20111201 11:30:00.000'),
  ('User2', '20111201 11:32:00.000', '20111201 12:00:00.000'),
  ('User2', '20111201 12:04:00.000', '20111201 12:30:00.000'),
  ('User3', '20111201 08:00:00.000', '20111201 09:00:00.000'),
  ('User3', '20111201 08:00:00.000', '20111201 08:30:00.000'),
  ('User3', '20111201 08:30:00.000', '20111201 09:00:00.000'),
  ('User3', '20111201 09:30:00.000', '20111201 09:30:00.000');

You can use the following code to test the portability of your solution on Oracle:

CREATE TABLE dbo.Sessions
(
  id        INT          NOT NULL,
  username  VARCHAR2(14) NOT NULL,
  starttime TIMESTAMP    NOT NULL,
  endtime   TIMESTAMP    NOT NULL,
  CONSTRAINT PK_Sessions PRIMARY KEY(id),
  CONSTRAINT CHK_endtime_gteq_starttime
    CHECK (endtime >= starttime)
);

-- indexes
CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);
CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);

-- sample data
INSERT INTO dbo.Sessions VALUES(1, 'User1',
  TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(2, 'User1',
  TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(3, 'User1',
  TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(4, 'User1',
  TO_DATE('20111201 10:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 11:00:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(5, 'User1',
  TO_DATE('20111201 10:30:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 12:00:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(6, 'User1',
  TO_DATE('20111201 11:30:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 12:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(7, 'User2',
  TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 10:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(8, 'User2',
  TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 10:00:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(9, 'User2',
  TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(10, 'User2',
  TO_DATE('20111201 11:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 11:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(11, 'User2',
  TO_DATE('20111201 11:32:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 12:00:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(12, 'User2',
  TO_DATE('20111201 12:04:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 12:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(13, 'User3',
  TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(14, 'User3',
  TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(15, 'User3',
  TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'));
INSERT INTO dbo.Sessions VALUES(16, 'User3',
  TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'),
  TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'));

And here’s the large set:

-- helper function GetNums
IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL
  DROP FUNCTION dbo.GetNums;
GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
    L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
  SELECT TOP (@n) n FROM Nums ORDER BY n;
GO

-- code to create and populate the table Users
IF OBJECT_ID('dbo.Users') IS NOT NULL DROP TABLE dbo.Users;

CREATE TABLE dbo.Users
(
  username  VARCHAR(14)  NOT NULL,
  CONSTRAINT PK_Users PRIMARY KEY(username)
);

DECLARE @num_users AS INT = 2000;

INSERT INTO dbo.Users(username)
  SELECT 'User' + RIGHT('000000000' + CAST(U.n AS VARCHAR(10)), 10) AS username
  FROM dbo.GetNums(@num_users) AS U;
GO

-- code to create and populate the table Sessions with 5,000,000 rows
DECLARE
  @num_users          AS INT          = 1000,
  @intervals_per_user AS INT          = 5000,
  @start_period       AS DATETIME2(3) = '20110101',
  @end_period         AS DATETIME2(3) = '20110107',
  @max_duration_in_ms AS INT  = 3600000; -- 60 nimutes
 
TRUNCATE TABLE dbo.Sessions;

WITH C AS
(
  SELECT 'User' + RIGHT('000000000' + CAST(U.n AS VARCHAR(10)), 10) AS username,
      DATEADD(ms, ABS(CHECKSUM(NEWID())) % 86400000,
        DATEADD(day, ABS(CHECKSUM(NEWID())) % DATEDIFF(day, @start_period, @end_period), @start_period)) AS starttime
  FROM dbo.GetNums(@num_users) AS U
    CROSS JOIN dbo.GetNums(@intervals_per_user) AS I
)
INSERT INTO dbo.Sessions WITH (TABLOCK) (username, starttime, endtime)
  SELECT username, starttime,
    DATEADD(ms, ABS(CHECKSUM(NEWID())) % (@max_duration_in_ms + 1), starttime) AS endtime
  FROM C;

To remind you, your task is to pack each set of intervals that overlap or abut into one contiguous interval. The desired result for the small set of sample data is:

username  starttime               endtime
--------- ----------------------- -----------------------
User1     2011-12-01 08:00:00.000 2011-12-01 09:30:00.000
User1     2011-12-01 10:00:00.000 2011-12-01 12:30:00.000
User2     2011-12-01 08:00:00.000 2011-12-01 10:30:00.000
User2     2011-12-01 11:00:00.000 2011-12-01 11:30:00.000
User2     2011-12-01 11:32:00.000 2011-12-01 12:00:00.000
User2     2011-12-01 12:04:00.000 2011-12-01 12:30:00.000
User3     2011-12-01 08:00:00.000 2011-12-01 09:00:00.000
User3     2011-12-01 09:30:00.000 2011-12-01 09:30:00.000

I’d like to thanks all those who sent solutions. Special thanks go to Alejandro Mesa for helping me verify the validity of my solutions and for their safekeeping. Here I’m going to cover only correct solutions (as far as I could tell). I will include also solutions that didn’t meet all of the puzzle’s requirements, like being standard, set-based solutions. Following are the solutions and their run times as measured on my Alienware M15x laptop with a Core i7 processor (8 logical CPUs), against hot cache.

Itzik Classic: 5,621 Seconds

The first one I want to present is the classic set-based solution that was available for some time:

-- indexes
/*
CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);
CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);
*/

-- solution
WITH StartTimes AS
(
  SELECT DISTINCT username, starttime
  FROM dbo.Sessions AS S1
  WHERE NOT EXISTS
    (SELECT * FROM dbo.Sessions AS S2
     WHERE S2.username = S1.username
       AND S2.starttime = S1.starttime)
),
EndTimes AS
(
  SELECT DISTINCT username, endtime
  FROM dbo.Sessions AS S1
  WHERE NOT EXISTS
    (SELECT * FROM dbo.Sessions AS S2
     WHERE S2.username = S1.username
       AND S2.endtime > S1.endtime
       AND S2.starttime = starttime) AS endtime
FROM StartTimes AS S;

The CTE StartTimes isolates packed interval start times using a query that returns all interval start times for which you cannot find any interval by the same user that started before the current interval start and ended on or after the current interval start. The EndTimes CTE isolates packed interval end times using a query that returns all interval end times for which you cannot find any interval by the same user that ended after the current interval end and started on or before the current interval end. The outer query than matches to each packed interval start the nearest packed interval end going forward by returning the minimum end that is greater than or equal to the current start. As you can see, the solution is very slow.

Geri Reshef: 1,690 Seconds

-- indexing
/*
CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);
CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);
*/

-- solution
With T1 As
(
  Select UserName, StartTime Time
  From Sessions

  Union All

  Select UserName, EndTime
  From Sessions
),
T2 As
(
  Select Row_Number() Over(Partition By UserName Order By Time) Nm,
                UserName, Time
  From  T1
),
T3 As
(
  Select        A.Nm-Row_Number() Over(Partition By A.UserName Order By A.Time,B.Time) Nm1,
                A.UserName,
                A.Time StartTime,
                B.Time EndTime
  From  T2 A
    Inner join T2 B
                 On A.UserName = B.UserName
                And A.Nm=B.Nm - 1
  Where Exists
    ( Select    *
                  From  Sessions S
                  Where S.UserName = A.UserName
                        And (S.StartTime  A.Time) )
                Or A.Time=B.Time
)
Select  UserName,
                Min(StartTime) StartTime,
                Max(EndTime) EndTime
From    T3
Group By UserName, Nm1
Order By UserName, StartTime;

Ami Levin: 225 Seconds

-- Indexing
/*
CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);
*/

-- Solution
SELECT UserName, StartTime, EndTime,
  ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY StartTime, EndTime) AS Serial
INTO #MySessions
FROM dbo.Sessions;

CREATE INDEX idx_serial_user_start_end ON #MySessions(serial, username, starttime, endtime);

WITH Sessions_Ordered
AS
(
  SELECT UserName, StartTime, EndTime, Serial
  FROM #MySessions
),
Sessions_Groups
AS
(
  SELECT *, 0 AS Grouper
  FROM Sessions_Ordered
  WHERE Serial = 1
 
  UNION ALL

  SELECT B.UserName, B.StartTime,
    CASE WHEN B.EndTime > A.EndTime THEN B.EndTime ELSE A.EndTime END,
    B.Serial,
    A.Grouper + CASE
                  WHEN A.EndTime
       

Alejandro Mesa: 130 Seconds

-- indexing
/*
CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);
*/

-- solution
SELECT
    username,
    starttime,
    endtime,
    ROW_NUMBER() OVER(PARTITION BY username ORDER BY starttime, endtime, id) AS rn
INTO
    #T1
FROM
        dbo.[Sessions]

CREATE UNIQUE CLUSTERED INDEX idx_#T1_id
ON #T1(username ASC, rn ASC);

WITH rs AS (
SELECT
    username,
    starttime,
    endtime,
    rn
FROM
        #T1
WHERE
    rn = 1
   
UNION ALL

SELECT
    PRV.username,
    CASE
    WHEN R.overlaps = 1 THEN PRV.starttime
    ELSE NXT.starttime
    END,
    CASE
    WHEN R.overlaps = 1 AND PRV.endtime >= NXT.endtime THEN PRV.endtime
    ELSE NXT.endtime
    END,
    NXT.rn
FROM
        rs AS PRV
        INNER JOIN
        #T1 AS NXT
        ON NXT.username = PRV.username
        AND NXT.rn = PRV.rn + 1
        CROSS APPLY
        (
        SELECT
            CASE
            WHEN PRV.starttime = NXT.starttime THEN 1
            ELSE 0
            END AS overlaps
        ) AS R
)
SELECT
    username,
    starttime AS new_starttime,
    MAX(endtime) AS new_endtime
FROM
        rs
GROUP BY
    username,
    starttime
--ORDER BY
--    username,
--    starttime
OPTION (MAXRECURSION 0);

DROP TABLE #T1;

Alejandro’s description of the solution:

The solution assigns a consecutive number using ROW_NUMBER partitioned by (username) and ordered by (starttime, endtime, id), and dumps it into a table that will be used by a recursive cte. The anchor part select all rows where “rn = 1”, first row for each username, and the recursive part join the anchor with the next row in line “prv.username = nxt.username and nxt.rn = prv.rn + 1”. I also used the apply operator to return a column to identify when previous and next rows overlap, by using (prv.starttime = nxt.starttime). Then in the column list of the recursive part, I push down prv.username, prv.starttime if both rows overlap, and prv.endtime if the rows overlap and prv.endtime is greater than or equal to next.endtime, otherwise nxt.endtime.

The end result will have groups by (username, starttime), so the next step is to group by those columns and pull the maximum endtime.

Stefan: 40 Seconds

-- indexing
/*
CREATE UNIQUE INDEX idx_user_start ON dbo.Sessions(username, starttime, id);
*/

-- solution
if object_id('tempdb..#intervals') is not null drop table #intervals

;with cte1 as
(
  select *, datediff(hour, 0, starttime)+n-1 as hash
  from sessions
    cross apply dbo.GetNums(datediff(hour, 0, endtime)-datediff(hour, 0, starttime)+1) t
),
cte2 as
(
  select distinct username, endtime
  from sessions a
  where not exists(
    select *
    from cte1 b
    where a.username=b.username
      and b.starttimea.endtime
      and b.hash=datediff(hour, 0, a.endtime) )
)
select *, row_number() over (partition by username order by endtime) as rn
into #intervals
from cte2

-- add information about the start of each interval by locating the first session after each interval
;with cte1 as
(
  select a.username, a.endtime, isnull(b.endtime,'19000101') as pe
  from #intervals a
    left join #intervals b
      on a.username=b.username and a.rn=b.rn+1
)
select username,
  (select min(b.starttime)
   from sessions b
   where a.username=b.username and b.starttime>a.pe) as starttime,
  endtime
from cte1 a

Stefan’s description of the solution:

Create a table with one row per interval. endtime is the end of each interval. The end of an interval is identified by finding the end of a session that is not overlapped by any other session. To speed up the search for overlapping sessions we only search the sessions that overlap the hour we are searching for.

Peter Larsson (Peso) 1: 32 Seconds

-- indexes
/*
CREATE UNIQUE INDEX idx_user_start ON dbo.Sessions(username, starttime, id);
CREATE UNIQUE INDEX idx_user_end ON dbo.Sessions(username, endtime, id);
*/

-- Create staging table
CREATE TABLE    #Stage
                (
                        UserName VARCHAR(14) NOT NULL,
                        StartTime DATETIME2(3) NOT NULL,
                        EndTime DATETIME2(3) NOT NULL
                )

-- Populate staging table with Gap information
INSERT          #Stage
                (
                        UserName,
                        StartTime,
                        EndTime
                )
SELECT          s.UserName,
                s.StartTime,
                e.EndTime
FROM            (
                        SELECT  UserName,
                                StartTime,
                                DENSE_RANK() OVER (PARTITION BY UserName ORDER BY StartTime) - 1 AS rnStart
                        FROM    dbo.[Sessions]
                ) AS s
INNER JOIN      (
                        SELECT  UserName,
                                EndTime,
                                DENSE_RANK() OVER (PARTITION BY UserName ORDER BY EndTime) AS rnEnd
                        FROM    dbo.[Sessions]
                ) AS e ON e.UserName = s.UserName
                        AND e.rnEnd = s.rnStart
WHERE           e.EndTime
       

Peter Larsson (Peso) 2: 23 Seconds

-- indexing
/*
CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);
*/

-- solution
-- Make sure printed output in each iteration doesn't affect timing
SET NOCOUNT ON

-- Create a control table for holding the Islands found for each UserName
CREATE TABLE    #Control
                (
                        UserName VARCHAR(14) NOT NULL,                  -- Current UserName
                        IslandID INT NOT NULL,                          -- Current Island number
                        IslandStartTime DATETIME2(3) NOT NULL,          -- Starting time for current Island
                        ExtensionStartTime DATETIME2(3) NOT NULL,       -- Latest extension StartTime of current Island
                        ExtensionEndTime DATETIME2(3) NOT NULL          -- Latest extension EndTime of current Island
                )

-- Create a helper index to speed up island detection
CREATE UNIQUE NONCLUSTERED INDEX UX_Control ON #Control (UserName, IslandID) INCLUDE (IslandStartTime, ExtensionStartTime, ExtensionEndTime)

-- Insert initial starting Island for each UserName
INSERT          #Control
                (
                        UserName,
                        IslandID,
                        IslandStartTime,
                        ExtensionStartTime,
                        ExtensionEndTime
                )
SELECT          usr.UserName,
                1 AS IslandID,
                curr.StartTime AS IslandStartTime,
                curr.StartTime AS ExtensionStartTime,
                curr.EndTime AS ExtensionEndTime
FROM            dbo.[Users] AS usr
CROSS APPLY     (
                        SELECT TOP(1)   w.StartTime,
                                        w.EndTime
                        FROM            dbo.[Sessions] AS w
                        WHERE           w.UserName = usr.UserName
                        ORDER BY        w.StartTime
                ) AS curr(StartTime, EndTime)

-- Repeat loop when an Island is found or extended, quit when no more Island is found.
WHILE @@ROWCOUNT > 0
        MERGE   #Control AS tgt
        USING   (
                        SELECT          ctrl.UserName,
                                        CASE
                                                WHEN curr.EndTime IS NULL THEN ctrl.IslandID + 1
                                                ELSE ctrl.IslandID
                                        END AS IslandID,
                                        ctrl.ExtensionEndTime AS ExtensionStartTime,
                                        curr.EndTime AS ExtensionEndTime,
                                        nxt.StartTime AS IslandStartTime,
                                        nxt.EndTime AS IslandEndTime
                        FROM            #Control AS ctrl
                        -- For every interval starting in current Island extension, get the interval that ends last (beyond extension).
                        OUTER APPLY     (
                                                SELECT  MAX(w.EndTime) AS EndTime
                                                FROM    dbo.[Sessions] AS w WITH (index = idx_user_start_end)
                                                WHERE   w.UserName = ctrl.UserName
                                                        AND w.StartTime BETWEEN ctrl.ExtensionStartTime AND ctrl.ExtensionEndTime
                                                        AND w.EndTime > ctrl.ExtensionEndTime
                                        ) AS curr(EndTime)
                        -- Check to see if an interval exists that is not an extension to current Island
                        OUTER APPLY     (
                                                SELECT TOP(1)   w.StartTime,
                                                                w.EndTime
                                                FROM            dbo.[Sessions] AS w
                                                WHERE           w.UserName = ctrl.UserName
                                                                AND w.StartTime > ctrl.ExtensionEndTime
                                                ORDER BY        w.StartTime
                                        ) AS nxt(StartTime, EndTime)
                        -- Only work with the latest Island for each UserName
                        CROSS APPLY     (
                                                SELECT  MAX(c.IslandID) AS IslandID
                                                FROM    #Control AS c
                                                WHERE   c.UserName = ctrl.UserName
                                        ) AS isl(IslandID)
                        WHERE           ctrl.IslandID = isl.IslandID
                                        AND (curr.EndTime IS NOT NULL OR nxt.EndTime IS NOT NULL)
                ) AS src ON src.UserName = tgt.UserName
                        AND src.IslandID = tgt.IslandID
        WHEN    MATCHED                 -- An Island is extended
                        THEN    UPDATE
                                SET     tgt.ExtensionStartTime = src.ExtensionStartTime,
                                        tgt.ExtensionEndTime =  src.ExtensionEndTime
        WHEN    NOT MATCHED BY TARGET   -- An new Island is found
                        THEN    INSERT  (
                                                UserName,
                                                IslandID,
                                                IslandStartTime,
                                                ExtensionStartTime,
                                                ExtensionEndTime
                                        )
                                VALUES  (
                                                src.UserName,
                                                src.IslandID,
                                                src.IslandStartTime,
                                                src.IslandStartTime,
                                                src.IslandEndTime
                                        );

-- Present the result in correct order
SELECT          UserName,
                IslandStartTime AS StartTime,
                ExtensionEndTime AS EndTime
FROM            #Control
ORDER BY        UserName,
                IslandID

-- Clean up
DROP TABLE      #Control

Itzik New 1: 17 Seconds

-- indexes
/*
CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);
CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);
*/

WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
  SELECT id, username, starttime AS ts, +1 AS type, NULL AS e,
    ROW_NUMBER() OVER(PARTITION BY username ORDER BY starttime, id) AS s
  FROM dbo.Sessions

  UNION ALL

  SELECT id, username, endtime AS ts, -1 AS type,
    ROW_NUMBER() OVER(PARTITION BY username ORDER BY endtime, id) AS e,
    NULL AS s
  FROM dbo.Sessions
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
  SELECT C1.*, ROW_NUMBER() OVER(PARTITION BY username ORDER BY ts, type DESC, id) AS se
  FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
  SELECT username, ts,
    FLOOR((ROW_NUMBER() OVER(PARTITION BY username ORDER BY ts) - 1) / 2 + 1) AS grpnum
  FROM C2
  WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT username, MIN(ts) AS starttime, max(ts) AS endtime
FROM C3
GROUP BY username, grpnum;

The technique I used in this solution to calculate the number of active sessions was inspired by a similar technique used by Ben Flanaghan to address a problem called Maximum Concurrent Sessions that I covered in the past in the magazine.

Muhammad Al Pasha: 14 Seconds

-- indexes
/*
CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);
CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);
*/

IF OBJECT_ID(N'tempdb..#EndTimes') IS NOT NULL
   DROP TABLE #EndTimes;

/*
   Calculate end time of intervals, and generate row numbers per user,
   and save the result to temp table to enhance performance.
*/

CREATE TABLE #EndTimes
(
   rownum INT NOT NULL,
   username VARCHAR(14) NOT NULL,
   endtime DATETIME2(3) NOT NULL
);

INSERT INTO #EndTimes(rownum, username, endtime)
   SELECT ROW_NUMBER() OVER(PARTITION BY S1.username
                                ORDER BY S1.endtime) AS rownum,
          S1.username, S1.endtime
     FROM dbo.Sessions AS S1
          OUTER APPLY
          (SELECT 1 AS is_overlapped
            WHERE EXISTS(SELECT *
                           FROM dbo.Sessions AS S2 WITH (index = idx_user_end_start)
                          WHERE S2.username = S1.username
                            AND S2.starttime  S1.endtime)) AS O2
    WHERE O2.is_overlapped IS NULL
    GROUP BY S1.username, S1.endtime; -- Eliminate duplicates

/*
   Calculate start time of intervals based on the previous interval end time,
   then join the result with end time of intervals based on rownum and username.
*/

WITH StartTimesCTE AS
(
   SELECT ET.rownum, ET.username, ST.starttime
     FROM (-- Use min possible time as the start time of the first interval for each user.
           SELECT 1 AS rownum, username, CAST('00010101 00:00:00.000' AS DATETIME2(3)) AS endtime
             FROM dbo.Users

            UNION ALL

           -- Use the already calculated end time of intervals for the rest of intervals.
           SELECT rownum + 1, username, endtime
             FROM #EndTimes) AS ET
          CROSS APPLY
          (--start time of interval is the min start time greater than the previous interval end time
           SELECT MIN(ST.starttime) AS starttime
             FROM dbo.Sessions AS ST
            WHERE ST.username = ET.username
              AND ST.starttime > ET.endtime) AS ST
)
-- The end result is just matching starttime with endtime of each interval based on row numbers (so it is based on time order)
SELECT ST.username, ST.starttime, ET.endtime
  FROM StartTimesCTE AS ST
       INNER JOIN
       #EndTimes AS ET
       ON ET.rownum = ST.rownum
          AND ET.username = ST.username;

       

Peter Larsson (Peso) 3: 4 Seconds

<b>-- indexes
/*
CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);
CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);
*/

;WITH cteSource(UserName, theTime, theGrp)
AS (
        SELECT          u.UserName,
                        u.theTime,
                        (DENSE_RANK() OVER (PARTITION BY u.UserName ORDER BY u.theTime) - 1) / 2 AS theGrp      -- Create artificial groups depending
                                                                                                                -- on the times. Omit duplicates.
        FROM            (
                                -- Get all usernames and first start time and last endtime
                                SELECT          UserName,
                                                MIN(StartTime) AS minStartTime,
                                                MAX(EndTime) AS maxEndTIme
                                FROM            dbo.Sessions
                                GROUP BY        UserName
                        ) AS usr
                        -- This only get the intermediate gaps
        OUTER APPLY     (
                                SELECT          s.StartTime,
                                                e.EndTime
                                FROM            (
                                                        -- Get all starttimes sorted
                                                        SELECT  s.StartTime,
                                                                ROW_NUMBER() OVER (ORDER BY s.StartTime) AS SeqID
                                                        FROM    dbo.Sessions AS s
                                                        WHERE   s.UserName = usr.UserName
                                                ) AS s
                                INNER JOIN      (
                                                        -- Get all endtimes sorted
                                                        SELECT  s.EndTime,
                                                                ROW_NUMBER() OVER (ORDER BY s.EndTime) + 1 AS SeqID
                                                        FROM    dbo.Sessions AS s
                                                        WHERE   s.UserName = usr.UserName
                                                ) AS e ON e.SeqID = s.SeqID     -- Match previous end time time against this starttime
                                WHERE           e.EndTime
        </b>

Itzik New 2: 3 Seconds

-- indexes
/*
CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);
CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);
*/

-- Inline Table Function Encapsulating Logic from Solution Itzik New 1 for Single User
IF OBJECT_ID('dbo.UserIntervals', 'IF') IS NOT NULL DROP FUNCTION dbo.UserIntervals;
GO

CREATE FUNCTION dbo.UserIntervals(@user AS VARCHAR(14)) RETURNS TABLE
AS
RETURN
  WITH C1 AS
  (
    SELECT id, starttime AS ts, +1 AS type, NULL AS e,
      ROW_NUMBER() OVER(ORDER BY starttime, id) AS s
    FROM dbo.Sessions
    WHERE username = @user

    UNION ALL

    SELECT id, endtime AS ts, -1 AS type,
      ROW_NUMBER() OVER(ORDER BY endtime, id) AS e,
      NULL AS s
    FROM dbo.Sessions
    WHERE username = @user
  ),
  C2 AS
  (
    SELECT C1.*, ROW_NUMBER() OVER(ORDER BY ts, type DESC, id) AS se
    FROM C1
  ),
  C3 AS
  (
    SELECT ts,
      FLOOR((ROW_NUMBER() OVER(ORDER BY ts) - 1) / 2 + 1) AS grpnum
    FROM C2
    WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
  )
  SELECT MIN(ts) AS starttime, max(ts) AS endtime
  FROM C3
  GROUP BY grpnum;
GO

-- Solution
SELECT U.username, A.starttime, A.endtime
FROM dbo.Users AS U
  CROSS APPLY dbo.UserIntervals(U.username) AS A;

What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.

For more details, I’m going to cover my new solutions extensively in the March edition of SQJ.

Cheers,

BG

Discuss this Blog Entry 4

Alejandro Mesa (not verified)
on Mar 19, 2011
Itzik,

Thanks for sharing the solutions with us. I had lot of fun trying to solve it and have learned good stuff from the other solutions.


Cheers,
AMB





on Mar 29, 2011
Posted.
on Mar 25, 2011
Where is my #3? ;-)

on Mar 22, 2011
Itzik, thanks again for this very interesting challenge. Your solution is absolutely brilliant.

I am sure I will use this method of calculating the number of active sessions in the future.

Please or Register to post comments.

What's Puzzled By T-SQL Blog?

T-SQL tips and logical puzzles from Itzik Ben-Gan.

Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×