Readers’ solutions use auxiliary tables to track time
Many of you sent me solutions to puzzles I presented in past columns. I presented two of the best ones for the relational-division puzzle last month in "Try, Try Again" (InstantDoc ID 39604). This time, let's look at reader solutions to two puzzles that involved tracking dates and times. First, I discuss a solution to the February 2003 "Counting the Costs" problem (InstantDoc ID 27592) sent in by Scott Smith, a principal architect at M Systems Group, Toronto. Then, I show you a solution to the January 2003 Working Days problem (from "No Table? No Problem" InstantDoc ID 27051) sent in by Sergiy Korovin of Q-Sign Pty, Greensborough, Australia. Smith's and Korovin's solutions have something in common: They use auxiliary tables, which allow simple, efficient solutions.
Counting the Costs (February 2003)
The problem I presented in February involved calculating the costs of users' dial-up calls to an ISP. Each user's call information—including user name, call start time, and duration in minutes—is recorded in a Calls table. The ISP charges different rates per minute depending on when the calls occur. A table called Rates tracks the start and end times for the rate periods and the rates per minute for each period. Run the code in Listing 1 to create and populate the Calls and Rates tables. The task: Calculate the total duration and cost of all calls for each user. Figure 1 shows the desired results.
Smith slightly reformatted the Calls table by storing call start and end times instead of start time and call duration. Running the code in Listing 2 recreates the Calls table with the new format. This minor change allows simpler and more readable queries later. The key element in this solution is an auxiliary table called RatesExpand.
The Rates table holds the ISP's daily rate periods, but it doesn't hold rate periods for specific days. The Rates-Expand table expands the Rates table by holding the rate periods for the specific days you want to calculate the costs for. For example, suppose you want to calculate the call costs for February 2003. In that case, the Rates-Expand table would hold the rate periods for each of the 28 days in February. With three rate periods each day, you should have 28 x 3—that is, 84—rows in the table. Before creating the RatesExpand auxiliary table, run the code that Listing 3 shows to create an auxiliary table called Nums and populate it with 1000 consecutive integers. You use this table in the code that populates the RatesExpand table. Now, run the code that Listing 4 shows to create the RatesExpand table and populate it with the specific rate periods during February 2003.
The code in Listing 4 first creates the RatesExpand table, then populates it by cross-joining the Rates table with the Nums table. The cross-join multiplies each row in Rates by the number of rows in Nums. The filter in the WHERE clause removes duplicates when n is greater than the number of days in February 2003, so each row in Rates is multiplied only 28 times. The expressions in the SELECT list calculate each rate period's start and end times. For example, the following expression calculates a rate period's target start time:
DATEADD(minute, DATEDIFF(minute, '19000101', fromtime), DATEADD(day, n-1, '20030201')) AS fromdatetime
The n column in the Nums table represents each day in February. The expression adds n-1 days to February 1 to calculate the target date. It then adds to the expression's intermediate result the time portion of the rate period in minutes. Because SQL Server doesn't have a separate data type for time only, the Rates table stores the rate periods' start and end times in terms of the base date January 1, 1900. To add the right time portion to the target date, you calculate the difference in minutes between the rate period's fromtime and midnight, both in terms of the base date. You calculate the target end time of a rate period similarly, but specifying the Rates table's totime column instead of the fromtime column.
Listing 4's code is just an example of populating the RatesExpand auxiliary table. One benefit of using an auxiliary table is that you can populate it with different rate periods and rates for different days of the month; in the auxiliary table, you store rates for specific dates rather than for periods of a day in general. Some ISPs, for example, have different rate periods and rates for working days versus weekends and holidays. Smith's solution lets you accommodate such scenarios by populating the RatesExpand table with the appropriate rate periods and rates.
Note that you could reduce the number of rate periods you store in the RatesExpand table and get a better-performing solution. Because the last period of a day (22:00 to 23:59) has the same rate as the first period of the following day (00:00, 07:59), you could merge the two periods as 22:00 of one day to 07:59 of the following day. To achieve this merge, you need to change the population logic in Listing 4; I'll leave that exercise to you.
The next step in the solution is to match each call from the Calls table with all the call periods in RatesExpand that it overlaps. You achieve this match by running the following query:
SELECT * FROM Calls AS C JOIN RatesExpand AS R ON C.fromdatetime <= R.todatetime AND C.todatetime >= R.fromdatetime
A call overlaps a call period if the call started before or at the same time the period ended and finished after or at the same time the period started.
Now that you've matched each call with all the periods it overlaps, you need to calculate, for each period, the rate periods' start and end times. For example, user1 started a call February 12 at 21:30 and finished it on the same day at 22:29. The matching periods for the call are February 12, 2003, 08:00 to February 12, 2003, 21:59 and February 12, 2003, 22:00 to February 12, 2003, 23:59. Within these periods, the call was active during February 12, 2003, 21:30 to February 12, 2003, 21:59 and February 12, 2003, 22:00 to February 12, 2003, 22:29. Running the code in Listing 5, page 16, creates the VCallPeriods view, which calculates the active call periods. Figure 2, page 16, shows the result of a SELECT * query against the VCallPeriods view.
The final step of the solution is straightforward: Just group the active call periods in VCallPeriods by username and calculate the total duration and cost as follows:
SELECT username, SUM(DATEDIFF(minute, f, t) + 1) AS duration, SUM((DATEDIFF(minute, f, t) + 1) * rate) AS cost FROM VCallPeriods GROUP BY username
F and t are the view's result columns that hold the from time and to time, respectively, of each active call period. The difference between them is the active call period's duration. Multiply that duration by the rate, and you get the cost.
Smith's solution is three times as fast as the solution I presented in February 2003. For example, with 300,000 rows in the Calls table, Smith's solution ran for 47 seconds on my laptop; mine ran for 155 seconds. And as I mentioned earlier, Smith's solution is more flexible than mine in that it can easily accommodate different rates for different days of the month.
No Table? No Problem (January 2003)
The problem I presented in January involved calculating the number of working days between two given dates. I defined working days as non-weekend days—that is, Monday through Friday. This definition of working days wasn't very realistic because it didn't account for holidays or other nonworking days. However, it was sufficient for the scenario I wanted to present, which was how to perform calculations without accessing tables.
Korovin's solution to that problem involves accessing an auxiliary table but has minimal I/O costs. This efficient solution accommodates more-realistic scenarios that include holidays, for example.
Korovin's idea is to use an auxiliary Calendar table that contains a row for each date within a period of interest. A period of interest spans all from and to dates included in your applications' requests for calculating working days. You need the Calendar table to include all from and to dates that appear in your requests because the working days are calculated as the difference between the value of the total number of working days that is stored in the to date and the value stored in the from date. Each row includes the date and another column called wdc, which contains the total number of working days between a base date and the row's date. Run the code that Listing 6 shows to create the Calendar auxiliary table and populate it with dates in the range January 1, 1900, through December 31, 2999, the example period of interest.
A variable called @wdc keeps track of the number of working days that have passed between the base date and the loop's date counter. The loop code increments the counter only if the date counter represents a working day. The code in Listing 6 assumes a day is a working day only if the day of the week is Monday through Friday. However, you could add logic to identify and treat holidays or other days as nonworking days and not increment @wdc in such cases.
Finally, you create the dbo.fn_working-days() UDF to calculate the number of work-ing days between two given dates as follows:
CREATE FUNCTION dbo.fn_workingdays (@d1 AS datetime, @d2 AS datetime) RETURNS int AS BEGIN RETURN((SELECT wdc FROM Calendar WHERE cday = @d2) - (SELECT wdc FROM Calendar WHERE cday = @d1 - 1)) END
The function simply subtracts the wdc value of the day before the specified from date from the wdc value of the specified to date. To test the function, run the following query:
SELECT dbo.fn_workingdays('20000101', '29991231')
Notice how quickly you get the answer, 260887—that is, the number of working days between January 1, 2000, and December 31, 2999.
This function is so efficient because no matter which two dates you provide as arguments, the function invokes two queries that perform an index seek to fetch a value for each date, then calculates the difference—requiring very little I/O. On my laptop, the function runs for 8ms when the data isn't in cache and runs for less than 1ms when the data is in cache. In January, I presented a couple of solutions that don't involve any physical I/O. Those also run for less than 1ms, but they don't have the flexibility of accommodating holidays and other nonworking days.
Using auxiliary tables is a powerful way to simplify your solutions. The best solutions to T-SQL problems are often the simple ones. Simple solutions are often efficient as well as easy to understand and easy to maintain.