TSQL Challenge – Reoccurring Visits

This is a nice little puzzle that I got some time ago from a friend and colleague, Eladio Rincón. It’s not a very difficult challenge, but I find it to be a fun one to work on, so I hope you will enjoy it too.

Suppose that you record data about people’s daily visits to a website in a table called DailyVisits. Here’s code to create the DailyVisits table and populate it with some sample data:

-- DDL and sample data for DailyVisits table

SET NOCOUNT ON;

USE tempdb;

IF OBJECT_ID('dbo.DailyVisits', 'U') IS NOT NULL

  DROP TABLE dbo.DailyVisits;

GO

CREATE TABLE dbo.DailyVisits

(

  dt      DATE        NOT NULL,

  visitor VARCHAR(10) NOT NULL,

  CONSTRAINT PK_DailyVisits PRIMARY KEY(dt, visitor)

);

INSERT INTO dbo.DailyVisits(dt, visitor) VALUES

     ('20110601', 'A'),

     ('20110601', 'B'),

     ('20110601', 'C'),

--

     ('20110602', 'A'),

     ('20110602', 'C'),

--

     ('20110603', 'A'),

     ('20110603', 'D'),

--

--

     ('20110607', 'A'),

     ('20110607', 'D'),

--

     ('20110608', 'D'),

     ('20110608', 'E'),

     ('20110608', 'F');

SELECT *

FROM dbo.DailyVisits;

dt         visitor

---------- ----------

2011-06-01 A

2011-06-01 B

2011-06-01 C

2011-06-02 A

2011-06-02 C

2011-06-03 A

2011-06-03 D

2011-06-07 A

2011-06-07 D

2011-06-08 D

2011-06-08 E

2011-06-08 F

 

Observe that there’s a key defined on dt and visitor, meaning that you store only one occurrence at most per visitor and day. Also note that there is a possibility that during some days there would be no visitors.

The challenge is to write a query that works with two input arguments @from and @to holding the start and end dates of a date range, and calculates, for each day, various statistics in respect to the previous day. Specifically, how many visitors visited the site that day, how many new visitors were added compared to the previous day, how many visitors were removed compared to the previous day, and how many remained. For the given sample data, the desired result should look like this:

dt         numvisits   added       removed     remained

---------- ----------- ----------- ----------- -----------

2011-06-01 3           3           0           0

2011-06-02 2           0           1           2

2011-06-03 2           1           1           1

2011-06-04 0           0           2           0

2011-06-05 0           0           0           0

2011-06-06 0           0           0           0

2011-06-07 2           2           0           0

2011-06-08 3           2           1           1

 

Feel free to post your solution as a comment here. I’ll post an entry next week with my solution.

Good luck!

BG

Discuss this Blog Entry 8

on Jul 2, 2011
Declare @Start date Declare @End date Set @Start = '2011-06-01' Set @End = '2011-06-09'; Declare @DailyVisits table ( dt date, visitor char(1) ) insert into @DailyVisits Select * from DailyVisits while @Start < @End Begin Select @Start, Visitor = (select count(visitor) from @DailyVisits where dt = @Start), Added = ( Select COUNT(*) from ( select visitor from @DailyVisits where dt = @Start and visitor not in ( select visitor from @DailyVisits where dt = DATEADD(dd, -1, @Start) ) ) a ), Removed = ( Select COUNT(*) from ( select visitor from @DailyVisits where dt = DATEADD(dd, -1, @Start) except select visitor from @DailyVisits where dt = @Start ) a ), Remained = ( Select COUNT(*) from ( select visitor from @DailyVisits where dt = @Start intersect select visitor from @DailyVisits where dt = DATEADD(dd, -1, @Start) ) a ) Set @Start = DATEADD(dd, 1, @Start) End
on Jul 6, 2011
Here is my stats with 100,000+ row sample data using same date interval as original (8 row(s) affected) Table 'Worktable'. Scan count 0, logical reads 0. Table 'DailyVisits'. Scan count 2, logical reads 10.
on Jul 6, 2011
I totally spaced out Pesomannen teaser which definitely is challenging. Studiomoderna's first solution's Statistics IO show's Table 'DailyVisits'. Scan count 3, logical reads 6. We can bring it down to Table 'DailyVisits'. Scan count 2, logical reads 4 by changing the logic in the CTE to not use DailyVisits table. All other alternative solutions that I have tried have Scan count 2, logical reads 4 as I am selecting from DailyVisits atleast twice. Having said that, I am very curious to see your solution Pesomannen. It would be some good learning. Thanks
on Jul 5, 2011
two solutions: -- SOLUTION 1 Tomaz Kastrun (tomaz.kastrun@studio-moderna.com) ;with dt_cte as -- insert empty/missing dates (select min(dt) as dt_ ,max(dt) as dt_m from dailyvisits union all select dt_+1 as dt_ ,dt_m from dt_cte where dt_+1 <= dt_m) -- calculating added, removed and remained users select date_.dt_ as date ,count(distinct dv.visitor) as numvisits ,count(distinct dv.visitor)-(sum( case when (dv1.visitor = dv.visitor and dv.visitor is not null and dv1.visitor is not null) then 1 else 0 end)) as added ,count(distinct dv1.visitor) - sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as removed ,sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as remained from dt_cte as date_ left hash join dailyvisits as dv on date_.dt_ = dv.dt left hash join dailyvisits as dv1 on date_.dt_ = dateadd(day,1,dv1.dt) group by date_.dt_ -- SOLUTION 2 Tomaz Kastrun -- calculating added, removed and remained users select date_.dt_ as date ,count(distinct dv.visitor) as numvisits ,count(distinct dv.visitor)-(sum( case when (dv1.visitor = dv.visitor and dv.visitor is not null and dv1.visitor is not null) then 1 else 0 end)) as added ,count(distinct dv1.visitor)-sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as removed ,sum(case when dv.visitor = dv1.visitor then 1 else 0 end) as remained from -- insert empty/missing dates ( select (select min(dt) from dailyvisits)+number as dt_ from master..spt_values as b with (nolock readuncommitted) where [type] = 'P' and number between 1 and (select datediff(day,(select min(dt) from dailyvisits), (select max(dt) from dailyvisits))) ) as date_ left hash join dailyvisits as dv on date_.dt_ = dv.dt left hash join dailyvisits as dv1 on date_.dt_ = dateadd(day,1,dv1.dt) group by date_.dt_ but I still can not beat Pesomannen teaser :)
on Jul 3, 2011
another version with recursive CTEs, MAXRECURSION can be modified if the date range spans across more than 100 days use SimTest go Declare @Start date Declare @End date Set @Start = '2011-06-01' Set @End = '2011-06-08'; Declare @DailyVisits table ( dt date, visitor char(1) ) insert into @DailyVisits Select * from DailyVisits ;with DailyVisist_CTW(dt) as ( select @Start as dt union all select dateadd(DD, 1, dt) as dt from DailyVisist_CTW where dt < @End ) Select dt, Visitor = (select count(visitor) from @DailyVisits where dt = d1.dt), Added = ( Select COUNT(*) from ( select visitor from @DailyVisits where dt = d1.dt and visitor not in ( select visitor from @DailyVisits where dt = DATEADD(dd, -1, d1.dt) ) ) a ), Removed = ( Select COUNT(*) from ( select visitor from @DailyVisits where dt = DATEADD(dd, -1, d1.dt) EXCEPT select visitor from @DailyVisits where dt = d1.dt ) a ), Remained = ( Select COUNT(*) from ( select visitor from @DailyVisits where dt = d1.dt INTERSECT select visitor from @DailyVisits where dt = DATEADD(dd, -1, d1.dt) ) a ) from DailyVisist_CTW d1
on Jun 30, 2011
Here is a teaser for my solution. -- Peso Table 'DailyVisits'. Scan count 1, logical reads 2
on Jul 6, 2011
Studiomoderna, give it a new try. Think about it, is there a way to minimize the number of joins? Is there another way to get rid of the distinct aggregate?
on Jun 30, 2011
Here is a teaser for my solution. Table 'DailyVisits'. Scan count 1, logical reads 2

Please or Register to post comments.

What's Puzzled By T-SQL Blog?

T-SQL tips and logical puzzles from Itzik Ben-Gan.

Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×