Suppose you own a piece of Web-monitoring software that has produced a table like the one at callout A in Listing A. The table shows Web site hits by site URL and caller IP address. Now suppose you want to group this data into two sessions, one starting at 7:30 a.m., the other at 2:01 p.m. In this case, let's define a session as a set of hits on the same URL from the same IP address, where each hit falls within 30 minutes of another hit in the set. For example, Listing A contains two sessions because of a gap of more than 30 minutes between 8:02 a.m. and 2:01 p.m. The time between 7:33 a.m. and 8:02 a.m. is only 29 minutes, so the 8:02 a.m. hit is part of the first session.
This problem doesn't appear to resemble the magazine-subscription problem. This problem has only one column of dates, not two. On the surface, this problem appears to be about grouping points in time rather than overlapping intervals. However, this problem is almost identical to the magazine-subscription problem. The key is to construct for each hit the time and date when the session would time out if no more hits happened. Listing B shows the web_hits table with this hypothetical timeout date added.
You can relatively easily modify Listing 5 from the main article to use the new table and show the start and end time of each session, as Listing C shows. The results of running Listing C are
|we-make-flags.com||18.104.22.168||2001-07-04 07:30:05.323||2001-07-04 08:02:14.330|
|we-make-flags.com||22.214.171.124||2001-07-04 14:01:09.220||2001-07-04 14:25:21.787|
This result set lets you calculate session length, number of hits per session, average length of session, and so on. These measurements can be valuable in calculating the effectiveness of your Web site.