The time is right to design for better BI queries
Business intelligence (BI), which includes OLAP, data warehouses and data marts, and data mining, is an area of significant ongoing growth. A common thread in BI technologies is that they fundamentally deal with time history and involve logic based on time. For example, a typical data warehouse stores sales information that a company reports according to time periods; the basic sales data may have a history of revisions. BiTemporal database designs enhance BI technologies by providing a general framework for temporal data.
BiTemporal database design is a vastly underutilized technology with widespread applicability. Most databases have, or could benefit from, significant temporal content and logic. Applications that could benefit from BiTemporal design might include information about retailing, reservations, prices, discount rates, interest rates, production scheduling, inventories, clinical medicine, and a host of others. BiTemporal technology provides a formal framework that lets you add time to your databases. In this article, I explore some concepts and examples. Implementation details and other enhancements and subtleties will follow in future articles.
The major idea of BiTemporal design is keeping track of two "flavors" of time: ValidTime and TransactionTime. Some key definitions are
As you'll see in the examples that follow, an essential concept of BiTemporal database design is that you never physically delete rows that contain your history. You logically delete rows by ending their TransactionTime period-of-belief. You add rows and logically delete rows, but never physically delete rows. Growth of table size can be a problem, but you can efficiently manage storage by partitioning tables based on history and by archiving historical data.
Adding BiTemporal Functionality to a Non-temporal Table
Let's look at the example price table that Figure 1 shows. This type of table is sometimes called a snapshot table because there is no time data or logic involved. It represents the price data at the current instant of time with nothing included about history or changes. When you execute an INSERT or an UPDATE, it takes immediate effect, and the old data is lost after an update.
Now consider the equivalent BiTemporal price table, which Figure 2 shows. To the original price column, we logically associate four additional columns of the data type datetime. ValidTime is the period from VT_Begin until VT_End, and TransactionTime is the period from TT_Start until TT_Stop.
So how do you read this table? The first row was input into the database on November 15 (TT_Start), and we intend to believe this row until "FOREVER." SQL Server 2005 and 2000 represent FOREVER as the maximum allowable datetime value of 23:59:59.997 December 31, 9999. (For background about the length of time measured in SQL Server, see the sidebar, "What Day Is It?") The first row stores the price for SKU 101 that's valid for the ValidTime period from December 1 through December 25.
The second row says that on November 17, we input a price for SKU 102 that's valid from December 1 until and including December 25. And we intend to believe this row until FOREVER as our best information about prices valid for December 1 through 25 for this SKU.
November fades away and December rolls around. Our database is now using the prices that had ValidTime periods beginning on December 1. But on December 4, we see we're getting lots of orders, so we decide to increase the price for SKU 102 to $21.95. In fact, we're going to make the price increase retroactive back to December 1. Now the BiTemporal price table looks like Figure 3.
What did we do there? We added the third row that stores the new price for SKU 102 that is valid for the period from Decem-ber 1 through De-cember 25. We input the third row on December 4 and intend to believe it until "FOREVER." But notice what happened in row 2. The TransactionTime period ends at a TT_Stop of December 4. We read row 2 to say that we "believed" the December 1 to 25 price in row 2 for SKU 102 only for the TransactionTime period of November 17 through December 4.
Note that we didn't delete the old version of the SKU 102 price for December 1 to 25 in row 2. Instead, we logically deleted it by setting TT_Stop to December 4. After December 4 on the wall calendar, we "believe" only the rows that have TT_Stop values of FOREVER.
Our First BiTemporal Query
Say that, after December 4 on the wall calendar, we're generating some invoices. Specifically, on December 5, let's generate an invoice for SKU 102, using the price that was valid on December 2. We find all rows for SKU 102 where the ValidTime period spans December 2 (rows 2 and 3 in Figure 3). However, the price table has only one row such that we believe it on December 5: row 3, which has the TT_STOP value FOREVER. Row 3 contains the valid price for SKU 102; row 2 now represents only history.
Changes for a Longer ValidTime Period
On December 6, say we make a decision to set a new price for SKU 102 that our application will use for all orders throughout the entire month of December. In a similar fashion to what we did before, we logically delete row 3 by setting its TT_Stop value to December 6. We input the new $18.99 price into rows 4 and 5, as Figure 4 shows. Row 4 is a BiTemporal UPDATE of the pre-existing December 1 to 25 ValidTime period in row 3. Row 5 is a BiTemporal INSERT of the new ValidTime period, December 26 to 31.
Notice that these last two rows could be combined into just one row with a ValidTime period of December 1 through Decem-ber 31. The convention in BiTemporal design is to update only the pre-existing De-cember 1 to 25 ValidTime period and to separately insert the December 26 to 31 extension for the longer ValidTime period. This way, we can track our thought processes and definitively account for every price change that's ever been made in the table.
BiTemporal INSERTs, DELETEs, and UPDATEs require up to several dozen lines of T-SQL to implement the needed logic; typically, they're stored procedures. We refer to BiTemporal INSERTs, DELETEs, and UPDATEs as BiTemporal modifications.
Keys in BiTemporal
Usually, the SKU would be a good candidate for primary key. However, in a BiTemporal table such as the one in Figure 4, SKU is no longer a unique identifier. The table contains multiple rows that have the same SKU, 102. To get a unique key, we need to combine some of the BiTemporal columns (say, all except TT_Stop) with the SKU to form a composite key. Alternatively, you might prefer to use a surrogate key for performance and simplicity. For examples of when you might use a surrogate key, see Michelle Poolet's February 2002 Solutions by Design column, "Surrogate Key vs. Natural Key," InstantDoc ID 23449.
Changes for Only Part of a ValidTime Period
Now suppose that on December 8, we decide to lower the price for SKU 101 for the ValidTime period of December 1 through 20 because we're overstocked with this item. Our sales terms are such that we'll retroactively give customers the lower price when we generate invoices at the end of the month. We do a BiTemporal UPDATE for SKU 101 for the ValidTime period of December 1 through 20. The table now looks like Figure 5.
As we perform this BiTemporal modification on December 8, we logically delete the first row by setting TT_Stop to December 8. We add the last two rows, 6 and 7, as Figure 5 shows. Row 6 inserts the new price for the newly specified ValidTime period of December 1 to 20. Row 7 preserves the old price for the December 21-to-25 part of the original row 1 ValidTime period that wasn't changed. In our table now, the first three rows represent only history because TT_STOP is no longer set to FOREVER. These three rows could be partitioned to separate storage. The last four rows represent what we now "believe" about prices for SKUs 101 and 102 in December.
You've worked hard to follow the logic to this point. This is a good place to stop and make certain you're comfortable with the BiTemporal concepts in these examples. For a review, try to read the meaning of the rows in the latest version of the example table in Figure 5.
Till the End of Time
BiTemporal design can have broad application, as you can imagine from the simple examples I've shown here. To transform a particular column, as in this price example, into its BiTemporal equivalent requires adding four more columns and retaining rows for all the changes you make. BiTemporal design is worth undertaking because it can provide a solid framework to handle temporal logic. Now that you've seen how a BiTemporal table evolves and how BiTemporal modification queries work, you can start considering this technique for your own databases. If you're interested in more information about BiTemporal design, see the sidebar "BiTemporal Through Time."