Many T-SQL developers run into situations in which a simple one-column rowset containing sequential numbers (e.g., from 1 to 1000) or a set of sequential dates in a given range would be useful in performing a tricky SELECT operation or populating a table with test data. The most common solution in such cases is to create a temporary table and a simple loop to generate the required values. However, you can use Cartesian product operations to generate number, date, and time sequences without loops, then you can write user-defined functions (UDFs) that use such sequences to create sophisticated queries.
The Cartesian product of two rowsets is a rowset containing all possible combinations of rows from the first two rowsets. These row combinations are called tuples. Figure 1, shows how a Cartesian product operation works. In Figure 1, joining rowset A and rowset B produces the Cartesian product that rowset C shows.
In T-SQL, you can use the CROSS JOIN logical operator to obtain the Cartesian product of two or more rowsets. The following code example shows how to obtain the Cartesian product of two rowsets:
The next example shows how to obtain the Cartesian product of three rowsets:
Generating Number and Date Sequences in T-SQL
To generate a sequence of numbers, you can use an operation that creates a Cartesian product, as Listing 1 shows. The first block of SELECT statements, combined by UNION ALL operators, results in a one-column rowset that contains the 10 numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. The second block of SELECT statements results in a rowset containing the numbers 0, 10, 20, 30, 40, 50, 60, 70, 80, and 90. The CROSS JOIN operator after the first block of SELECT statements produces a two-column rowset, which contains the following pairs: (0, 0), (0, 10), ... (0, 90); (1, 0), (1, 10), ... (1, 90); ... (9, 0), (9, 10), ... (9, 90). The outer SELECT statement sums the members of each pair to obtain a sequence of numbers from 0 to 99.
You can use this technique to generate a wider range of sequences. For example, to obtain a sequence of numbers from 0 to 9999, add two SELECT blocks—one for hundreds and one for thousands—as Listing 2 shows. Again, the outer SELECT statement sums members of each tuple to obtain the number sequence. Each tuple now contains four members because four source rowsets are used in the operation that created the Cartesian product.
To generate a date sequence, you can use the number sequence technique and the T-SQL function DATEADD(). Suppose you need to generate a sequence of dates from 1/1/2001 to 3/1/2001. First, you generate the sequence of numbers from 0 to 59. Then, use DATEADD() to add each number to the starting date 1/1/2001 as a day count. Listing 3 shows the query that produces this date sequence. The inner SELECT statements are the sequence generators that produce numbers from 0 to 59. The outer SELECT statement uses DATEADD() to generate dates from 1/1/2001 to 3/1/2001 (a total of 59 days). By changing the value of DATEADD()'s first parameter to mm, ww, or hh, you can easily obtain month, week, or hour sequences.
UDFs for Sequence Generation
You've probably noticed that the longer the sequence you want to generate, the longer and more cumbersome the T-SQL code becomes. In fact, when you use T-SQL SELECT statements to generate a sequence, the T-SQL code segment might be significantly larger than the code segment that uses the sequence values.
Fortunately, you can manage code length by using one of the most powerful features in SQL Server 2000—table-valued UDFs. As its name suggests, this type of function returns a table; therefore, you can use table-valued UDFs anywhere that you can use a table or view in a T-SQL query. The optimal way to manage code length is to implement a sequence generator as a table-valued UDF. Then, you can simply reference the sequence generator's UDF in a FROM clause instead of writing a long, bulky code segment every time you need to generate a sequence.
Listing 4 shows two UDFs that implement a sequence generator. The fn_p() function generates one-column rowsets such as (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) or (0, 10, 20, 30, 40, 50, 60, 70, 80, 90). These rowsets serve as "dimensions" for Cartesian product operations. First, the fn_p() function uses the value of the @exp parameter and the T-SQL function POWER() to calculate the power of 10. Then, fn_p() uses a block of SELECT statements linked by UNION ALL operators to generate the resulting rowset. The outer INSERT statement copies generated rows into the return variable @result. So, with a value of 0, the fn_p() function returns the rowset (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), fn_p(1) returns the rowset (0, 10, 20, 30, 40, 50, 60, 70, 80, 90), and so on.
The fn_sequence() function performs the main job. The @start parameter defines the first element in the resulting sequence, and the @end parameter defines the last element in the sequence. Based on the value of the @end parameter, fn_sequence() builds a Cartesian product of one, two, three, or four rowsets that the fn_p() function returns. To generate numbers from 0 to 9, you need only one rowset—(0, 1, ... 9)—and you don't need any cross joins. To generate numbers from 0 to 99, you need two cross-joined rowsets: (0, 1, ... 9) and (0, 10, ... 90). To get numbers from 0 to 999, you need three rowsets: (0, 1, ... 9), (0, 10, ... 90), and (0, 100, ... 900). Therefore, to figure out how many dimension rowsets must be cross-joined, the fn_sequence() function uses an IF...ELSE IF construction to determine what range the highest requested sequence number (@end value) falls into.
For example, examine the body of the third IF statement (IF @end < 1000), which callout A in Listing 4 shows. The inner SELECT statement—SELECT (a2.id + a1.id + a0.id)—is a sequence generator you're already familiar with. Because the value of @end falls into the 0 to 999 range (as the IF condition specifies), you use the three dimension rowsets that the fn_p(2), fn_p(1), and fn_p(0) calls create. The outer SELECT statement filters the generated sequence, ensuring that the function returns values only in the requested range. So if you want to obtain sequence values from 15 to 855, the outer SELECT statement will use the BETWEEN operator to filter out numbers from 0 to 14 and from 856 to 999. In the last step, the outer INSERT statement copies the sequence rows to the @result table-type variable.
Now, you can use the fn_sequence() function to rewrite the examples that Listing 1 and Listing 3 show. You can rewrite Listing 1 as a simple SELECT statement:
And you can rewrite Listing 3 as a slightly longer SELECT statement that uses the DATEADD() function:
DATEADD(dd, seq.id, '1/1/2001')
fn_sequence(0, 59) seq
ORDER BY 1
To generate a sequence with a negative starting element, you can run the following query:
SELECT id FROM fn_sequence(1, 100)
ORDER BY 1
This query returns the integer numbers from -100 to 100. Similarly, you can use the negative sequence number to generate past dates. Using the date sequence example above, you can obtain past dates from 1/1/2001 back to 11/3/2000:
DATEADD(dd, -seq.id, '1/1/2001')
fn_sequence(0, 59) seq
ORDER BY 1 DESC
A Real-World Example: Tracking Web Site Visitor Activity
Now, let's consider an example of how you can use sequence generators to help create complex queries that ordinarily use temporary tables to store intermediate results. Suppose you're a Web site administrator who needs to know how many visitors signed in on your Web site at every hour during the day. The Web site maintains sessions for visitors, so every time a visitor signs in, the site records session start date and time to a special session_data table in the SQL Server database. When the visitor signs off, the site records session end date and time to the same session_data table.
To find out how many visitors you had each hour, you could write a stored procedure that calculates visitor activity by using three parameters: start date and time, end date and time, and the duration of the time slice (e.g., in minutes). By using this stored procedure, you could obtain daily visitor activity on a 1-hour or 30-minute basis, monthly visitor activity by day, and so on.
Let's assume that the session_data table has two datetime columns to record session start and end times. Listing 5 shows simplified table schema and sample data for the session_data table. Listing 6 shows the sp_get_activity stored procedure, which calculates the visitor activity in the specified period.
First, the procedure calculates the total number of time slices in the specified period. The procedure uses the T-SQL function DATEDIFF() to calculate the duration of the whole period in minutes and divides the result by the duration of the time slice, which the parameter @time_slice specifies. For example, in a 2-day period (@start = 1/1/2001, @end = 1/2/2001) with a specified time-slice duration of 60 minutes (@time_slice = 60), you'll have 48 time slices. If the duration of the time slice is greater than the duration of the time period, the procedure raises an error.
Next, the stored procedure uses two nested SELECT statements to calculate the resulting visitor activity. The inner SELECT statement uses the fn_sequence() function and the number of time slices calculated in the previous step to produce a sequence of date and time values. Each value in a sequence is the beginning of a particular time slice in a given period. If the duration of the time slice equals the duration of the time period (@slice_count = 0), then the inner SELECT statement returns only one row, corresponding to the whole period.
The outer SELECT statement joins the session_data table to the sequence that the inner SELECT returns. To calculate the number of sessions for each time slice, the code groups the resulting rowset by time-slice date and time. The JOIN operation determines whether the given time slice and the given session overlap. The first part of the JOIN clause—DATEDIFF(mi, dates.date, session_data.started) < @time_slice—determines that the session started before the given time slice finished (i.e., the difference in minutes between the session and the time slice starting times is less than the duration of the time slice). If the session started before the time slice began, DATEDIFF() returns a negative number, and the first part of the condition evaluates to TRUE.
The second part of the JOIN clause—DATEDIFF(mi, dates.date, COALESCE(session_data.finished, @end)) > 0—determines whether the session finished after the time slice started. The T-SQL function COALESCE() handles situations in which the session being analyzed is still active because the visitor hasn't signed off yet. The active session has the end date and time (i.e., the session_data.finished field) set to NULL. So if the session is inactive—the visitor has already signed off—the COALESCE() function returns the value of the session_data.finished field. If the session is active, COALESCE() returns the value of the @end parameter. Because all time slices, by definition, finish before the end of the time period, we assume that an active session finishes after any time slice in a period.
The outer SELECT statement uses the LEFT JOIN logical operator to ensure the return of all time slices. The statement returns a zero count for time slices that don't overlap with a session. The following example shows a use of the sp_get_activity stored procedure:
In this example, the hourly activity (time slice duration = 60) is calculated for a 24-hour period. Figure 2 shows the results that the stored procedure returns.
Sequences provide a simple and powerful alternative to loops and temporary tables traditionally used for data generation, and in some cases might significantly simplify your T-SQL code. And when you implement a sequence generator as a table-valued UDF in SQL Server 2000, the generator becomes a flexible and convenient tool for many SQL Server tasks.