Write queries that always return the most recent information
The power of OLAP lies in its ability to store summarized business information. Although some OLAP solutions store forecasted information, almost all OLAP solutions store historical information—and the most recent information is usually the most relevant to supporting business decisions. So how do you write MDX queries that always return the most recent information? You need an MDX formula that moves forward in time as you add new information to the OLAP database. I call this type of query time dynamic. Unfortunately, when Microsoft developers created SQL Server 2000 Analysis Services, they didn't build in a standard way of creating time-dynamic MDX, so you have to figure out an appropriate method for solving this business problem in your environment. Let's examine four different approaches to solving this problem and discuss some benefits and limitations of each.
The key to creating time-dynamic MDX is determining and using in your OLAP cube the most recent time period that contains complete and valid data. The technique you use to select the most recent time period depends on your OLAP cube's structure and how frequently you update the information in the cube. When you're creating time-dynamic MDX, two details of a cube update are important: when you load the star schema (set of relational tables) with new data and which transactions you load. For example, say you want to compare your company's performance this January with last January's performance. You don't want to proceed with such an analysis if this January's data doesn't include the big transaction that happened on the last day of the month. So, you need to know when you've finished loading the January transactions and whether you have all of January's transactions or more are still to come.
Because Analysis Services doesn't have a standard convention for determining which time periods are complete and which aren't, you must pick your own convention and design your MDX queries to support that convention. Following are four possible conventions for determining the most recent time period and examples of the MDX script in some common time-dynamic formulas for each convention.
Convention 1: Use Only Complete Data
The first technique is simple but not always practical. With this technique, you fill the star schema with transactions only from completed months. Because no transactions in the star schema are from an incomplete month, the cube contains only members of the time dimension that have complete data. This convention makes writing MDX queries that return the most recent complete data easy. Here are some examples of MDX expressions that are set to return data from different time periods.
The following expression sets the MDX query to return data from the most recent month:
The next expression sets the MDX query to return data from the last 3 months:
CREATE SET Last3Months AS 'LastPeriods(3, RecentMonth.item(0).item(0))'
And the following expression sets the MDX query to return data from the year-to-date months:
You can use the following expression to set the MDX query to return data from the most recent month and the same month in the previous year:
This convention isn't desirable in many situations because analysts might want to access incomplete data—for example, to see how the current month is progressing. Also, if the cube contains forecasted as well as historical data, the MDX can't assume that the last month in the cube is complete—months that contain forecasted data won't yet contain historical data. But even with these limitations, this convention can be useful. An advantage of this method is that it's easy to understand for anyone who's browsing the information in the cube.
Convention 2: Load Data into Closed Months
The second convention is only slightly different from the first. With this technique, you create all the possible future dates in the Time dimension table—even though fact table records for those dates don't exist yet—but you load data into only the months that are already closed. The Sales cube in the FoodMart 2000 sample database shows an example of how this technique works. The Sales cube has data loaded for 1997 but not 1998.
You can use this convention when you mix forecasted data with historical data. In such a case, you can set up two cubes: one to contain the historical data and one to contain the forecasted data. You can configure the forecasting cube as write-enabled but leave the historical cube read-only. Then, you can set up a virtual cube that draws from both the historical and forecasted cubes to make one cube containing both historical and future information. Here's an example of how to use this convention to return data from the most recent month:
Time.Month.Members, NOT IsEmpty(
NOT IsEmpty( Time.CurrentMember ))) - 1)'
This MDX script looks complicated, but it isn't. The formula uses the Filter() function to isolate the months in the Time dimension that have data for the ActualSales measure. Then, the formula uses the .item property to return the last non-empty month in the set.
Convention 3: Create the Closed Member Property
The third convention is the most flexible of the four. To use this technique, you create a member property called Closed on the month level of the Time dimension. This member property in the cube corresponds to a column in the Time-dimension table of the star schema. When a month is closed, this column in the star schema is set to 1; if the month isn't closed, the column is set to 0. MDX formulas can test for a 1 in this member property to determine whether the month is closed.
Unlike the first two conventions, this convention lets you load data beyond the most recently closed month. I prefer this method for most situations because you can load data as it becomes available without affecting MDX formulas that return only closed months. Thus, you can analyze incomplete months.
The following MDX script returns the most recent month's data:
("Closed") = "1") 1)'
This script is similar to the MDX in the previous convention. Instead of isolating months based on whether they contain data, this formula isolates months that contain a Closed member property that has a value of 1. Both scripts use the following template to create an MDX expression that returns the last element in a set:
<set>.item( Count( <set> ) 1 )
A potential drawback of using this formula is that Analysis Services might evaluate the set twice. If Analysis Services is optimized, it might recognize that the set is repeated and only evaluate the set once. But if MDX performance is a concern in your application, you should assume that Analysis Services doesn't have this capability. To make sure that Analysis Services evaluates the set only once, you can separate the set definition from the formula that returns the last element, as the following example shows:
("Closed") = "1")'
CREATE SET RecentMonth AS
(Count(ClosedMonths) - 1)'
Convention 4: A Server-Based Solution
The fourth convention for determining the most recent month is a server-based solution. To implement this solution, you use Analysis Manager to create the RecentMonth set on the OLAP server. This set contains one month member, and you use Analysis Manager to update the set manually whenever a new month closes. Although it contains a manual step, this technique isn't hard to use. You can automate this solution by changing the RecentMonth set definition at the end of the scheduled cube-load task. You can change server-based set definitions by using the Decision Support Objects (DSO) object model from Visual Basic (VB). Check the SQL Server Books Online (BOL) topic "DSO" for more information about how to create and update server-defined sets.
All the examples I've shown in this article use months as the time period for business analysis, but time-dynamic MDX works just as well with quarters, weeks, or partial weeks. If your business analysis uses weeks (as retail sales operations do), then the fact that partial weeks are available for analysis might not matter. In such a case, the first two conventions might be the most appropriate approaches because of their simplicity. Conversely, if you get complete accounting information only quarterly, the third and fourth conventions might be more appropriate because they let you make intermediate information available. Your analysts will most likely want to see how a quarter is progressing before the quarter is complete.
Time-dynamic MDX is useful because OLAP is all about looking at history, and recent history is usually the most relevant for analysis. By writing flexible, time-dynamic formulas, you can avoid frequently changing the MDX behind your OLAP reports. And less work is good!