What you need to know to when integrating time-based calculations in cubes
Let me build a case for integrating time-based calculations into a cube. Consider the following questions:
- What are the total sales this quarter, and how does that compare to the same quarter last year?
- How much did inventory levels increase or decrease last month?
- Is the number of male student applicants starting to decline?
- When did we achieve the highest donations?
- What is our likely tax obligation in 2010?
All of these questions have something in common—time. That's not a surprise as time applies to virtually every organization. You see this manifested in SQL Server Analysis Services (SSAS); almost every cube you design or query will contain at least one dimension based on time. While certain time-oriented questions (e.g., what are the total sales this quarter) are answered by simply browsing a cube, other questions (e.g., how do sales this quarter compare to the sales in the same quarter last year) need to be calculated. This is where time-based calculations come into play. By integrating common time-based calculations into a cube, end users become empowered and more effective in their jobs.
Overview of MDX Functions Used for Time-Based Calculations
If you haven't already done so, download the accompanying samples for this article by clicking the 103356.zip link near the top of the page. The underlying data, which compares economic trends across different countries, comes from the Organisation for Economic Co-operation and Development.
Start by restoring therelational database from the DebtAnalysis.bak backup file. Next, open up the OLAP_GovtData.dwproj project in Development Studio (BIDS). Edit the project's target server property to match the location of your SSAS instance. Finally, after editing the DebtAnalysis.ds data source to reflect the location of the DebtAnalysis relational database, process the entire database using the Database menu's Process option.
For this article, I'll limit the focus to two common sets of time-series calculations: period-to-period and same-period-last-year. Period-to-period calculations (also called prior period calculations) return or use data from a prior period. To illustrate, open the file MDXQueryExamples_1.mdx (which is located in the MDX_Queries folder in the OLAP_GovtData project) in SQL Server Management Studio (SSMS) and establish a connection to the GovtDebtAnalysis cube in the OLAP_GovtData database. Scroll down and execute Query 2b, which Listing 1 shows.
Listing 1: Query 2b
The results for this query can been see in Figure 1. Note that if you'd rather see these results in Excel 2007, you can download PTPower. This free Excel 2007 add-on lets you create calculated members for pivot tables based on SSAS 2008 and SSAS 2005 cubes.
Figure 1: Query 2b's results
Query 2b creates a calculated member named \[GDP_Change\], which represents the difference in Gross Domestic Product (GDP) from one period to the next in the time hierarchy. (The time hierarchy in the cube is named \[DimDate\].) GDP_Change relies on the PrevMember function, which simply returns the previous member in a given level of the Date hierarchy. For example, in Figure 1, you can see the GDP_Change result for the year 1992 is $339.90 (i.e., the difference between the 1992 value of $6,286.80 and the 1991 value of $5,946.90). Note that the PrevMember function can alternatively be defined against a dimension attribute rather than a user-defined hierarchy, as shown in Query 2c in Listing 2.
Listing 2: Query 2c
You might have noticed a few problems with the result set. First, the GDP_Change value for the decade 1990-9 (and the year 1991) is the same as its GDP value. Second, 1990 is the first year of available data, but my date dimension spans back to 1790 (I plan to eventually load additional historical data). The calculated member isn't defined to handle this situation. Finally, the date dimension spans out to 2015 but there's only data available through 2008, which causes erroneous GDP_Change results to be returned for the year 2009 and the decade 2010-1.
I'm highlighting these problems to raise an important point: Time-based calculations that account for missing data can be difficult to design and, if poorly implemented, can cause performance problems.
Now let's switch to a same-period-last-year calculation, which is also called a parallel period calculation. Find and execute Query 3a, which Listing 3 shows.
Listing 3: Query 3a
Figure 2 shows an excerpt from the results created with the PTPower add-on.
Figure 2: Excerpt from Query 3a's results
The calculated member \[UnemploymentAvg_PP_Change\] uses the ParallelPeriod function to calculate the difference in unemployment rates by retrieving the unemployment rate from 1 year ago for a given member in the \[H-Year\] hierarchy. The ParallelPeriod function takes three arguments: a level, a number representing the number of periods to lag, and a member upon which to operate. Due to the level argument, ParallelPeriod operates only on user-defined hierarchies. To calculate same-period-last-year results on dimension attributes, you can use the Lag function instead.
Other than the erroneous results for the years of 1990 and 2010, this is a pretty nifty calculation. No matter where you navigate in the H-Year hierarchy (whether year, quarter, or month), the change from the same period 1 year ago is returned. For example, the UnemploymentAvg_PP_Change result for 1991-Feb is 1.27%, which represents the difference between the current value of 6.56% and the value 1 year ago (5.29%) in 1990-Feb.
The Need for a Systematic Approach to Time-Based Calculations
The built-in MDX functions (e.g., PrevMember, ParallelPeriod) might seem in of themselves sufficient for time-based calculations. Add a few calculated members to a cube (or even let users do it themselves with PTPower) and problem solved, right? Not exactly. There are serious shortcomings and complexities that can and should be addressed in order to truly empower end users. The shortcomings include:
- Erroneous or misleading results. When there are missing values along part of a time dimension (or the end user is looking at the first set of members in the dimension), calculated members that use functions such as PrevMember or ParallelPeriod will return misleading results. The calculated member definition needs to account for these situations. Typically, cube designers resort to fairly complicated conditional logic (e.g., IIF, CASE) in their MDXs, which can introduce performance problems along with code that's tricky to develop and maintain.
- Accounting for both attribute dimensions and dimension hierarchies. SSAS 2005 introduced the notion of dimension attributes which, from a time perspective, means end users can look at values by attribute (e.g., flat list of years) or by hierarchy (e.g., Year-Month-Day). For example, look at the DimDate dimension in our example database. End users can view measures by several attributes, including Century, Calendar Year, and Calendar Quarter. Or, they may choose to leverage a hierarchy (e.g., H-Year). The calculated member examples we've reviewed are specific to either a hierarchy or to a level. Ideally, period-to-period and same-period-last-year calculations should work regardless of what attribute or level the end user selects. (Note that a Microsoft design "best practice" recommends hiding attributes that participate in a hierarchy, but this is more of a general guideline. It doesn't address situations in which there is one or more time attributes that don't participate in a hierarchy, which is the situation in our case.)
- Calculated measure explosion. There are seven visible measures in the GovtDebtAnalysis cube. If you want to provide a period-to-period and same-period-last-year calculation for each visible measure, you now have a minimum of 21 visible measures in the cube (7 measures + 14 calculated members). However, you also have three time hierarchies. This means you'll actually need 42 (3 x 14) calculated members, which brings a new total of 49 measures (7 measures + 42 calculated members). But this total still doesn't account for visible dimension attributes. There are five in the cube, so you need an additional 70 calculated members. Finally, end users might want to see period-to-period and same-period-last-year calculations in three different forms (e.g., prior period value, change in prior period value, and change represented as a percent). You now have a cube that's bloated and difficult to understand and query.
To address these shortcomings, Microsoft and SSAS practitioners have developed various approaches to time-based calculations. My article "Optimizing Time-Based Calculations in SSAS" discusses the various approaches, including a new one that provides end users with calculations that are flexible, simple to use, and offer good performance.