Downloads
23432.zip

Sidestep cursors to a better way of stepping through object collections

If you're a database programmer, you know the value of a cursor. When you need to perform a complex operation with a collection of records, a cursor applies that operation to each record. A cursor is a type of iterator. In the world of object-oriented programming, iterator is the generic term for any object that facilitates stepping through a collection of other objects.

MDX uses the Generate() function and named sets to support iteration. The power of these methods is mostly invisible to end users, but using them lets you apply operations to each dimension member in a set, just as a cursor lets you apply operations to each record in a table. Let's look at how these iterators work and how you can use them to create useful solutions to your organization's data-analysis problems.

If you spend a lot of time with MDX, the query language of SQL Server 2000 Analysis Services and SQL Server 7.0 OLAP Services, you know that you can solve certain types of problems by applying an MDX formula to each member in a set. For example, if you want to find out which geographic regions contain your top 10 customers, you can make a set of your top 10 customers, then apply an MDX formula that returns each customer's region.

SQL Server Books Online (BOL) describes how the Generate() function steps through items in a set. For each item, Generate() evaluates an MDX formula, the results of which combine to form a new set. In the following example, Generate() steps through each item in the set called Top 10 Customers:

Generate( \[Top 10 Customers\], \{ Customers.CurrentMember.Parent \} )

For each of these 10 customers, Generate() evaluates the formula Customers.CurrentMember.Parent and replaces Customers.CurrentMember with the current item in the set so that the formula returns the parent of the current customer. The result of the Generate() function is a set containing the parents of the top 10 customers.

The following example shows another way to express the Generate() function:

Generate(  \[Top 10 Customers\] , \{  \[Top 10 Customers\].Current.item(0).Parent \} )

This example demonstrates how to use the Current property of a set. This property is similar to the CurrentMember property of a dimension hierarchy except that when Current operates on a set, the property returns a tuple instead of a member. Current returns a tuple because sets aren't limited to members from one dimension but instead each item in a set can include a member from each dimension. The example above uses the item property of a tuple to return the first member in the tuple. In most situations, a set contains members from only one dimension, so item(0) is a good way to convert a tuple to a member.

If you compare the two examples above, the first example, which uses CurrentMember, is easier to read. Both expressions are functionally equivalent—so why would you bother to use the Current property? The CurrentMember property is limited to maintaining one position within a dimension, so you can't reference two positions from different MDX functions independently. By using the Current property, you can step through more than one set on the same dimension. In other words, the Current property lets you nest MDX functions that step through the same dimension and still reference the current item in each function.

The example that Listing 1 shows illustrates my point. This nested MDX formula performs an intersection between the Top 10 BY Profit set and the Top 10 BY Sales set. You can't use the CurrentMember property to perform this intersection because both the Generate() and Filter() functions are stepping through a set of members from the same dimension.

Now let's further complicate the problem: What happens if you don't have a name for the set that you want to step through? For example, let's change the code in Listing 1 so that it finds the top 10 customers by sales who live in Washington, as the code in Listing 2 shows. I could have defined WaCustomers as a set in the query's WITH clause or by using a CREATE statement before running the query, but putting the definition directly in the query was more convenient. I used the AS keyword to give the Descendants operation the name WaCustomers so that I could refer to it later. I haven't found any documentation about the AS keyword in BOL, but since I first encountered AS in MDX code, I've used it frequently.

The examples in Listing 1 and Listing 2 that show a Filter() function nested in a Generate() function help explain the concept of the Current property, but they're not real-world examples. You can phrase the nested expression more clearly and concisely if you use the Intersect() function, as the following example shows:

Intersect(Descendants
  (\[Customers\].\[WA\], \[Customers\]
      .\[Name\]) , \[Top 10 BY Sales\])

So let's look at how you might use the Current property to solve a real-world problem. In "MDX by Example," December 2001, InstantDoc ID 22994, I presented 15 MDX examples that address the problem of sales analysis. Many of these queries require searching the Brand Name level of the FoodMart Sales cube's Product dimension. Because brands are organized into product categories, the same brand name might appear multiple times in the Product dimension. For example, Papa Joe's might make butter and lemonade, but butter and lemonade are in different product categories, so the Papa Joe's brand is duplicated in the Brand Name level. For most sales analysis, this duplication isn't a problem because you usually want to treat brands independently. You don't want the success or failure of Papa Joe's butter to affect whether you carry Papa Joe's lemonade.

But in certain cases, you want to treat all Papa Joe's brands as one item. What if you want to know the total amount of business you do with Papa Joe's? The MDX query in Listing 3 shows how to use a general approach to solve this tricky problem. The query determines the total unit sales for each brand (instead of just one brand), then it displays the brands that make up the top 80 percent of the total unit sales.

The query works by first creating a set called UniqueBrands that contains one brand member for each unique brand name. Then, the query creates a new measure called TotalBrand that sums the unit sales of all brand members that have the same name as the current selected brand.

The formula that determines unique brands uses a Generate() function to step through all the nonunique brand members. For each brand member, a Filter() function finds all the other brands that have the same name. The code then reduces this group of duplicate brand members to the first item, the Head() function. The result is that every time the Generate() function encounters a member that has a name it has encountered before, it converts that member to the first brand member with that name. So, Generate() is converting members that have the same name into duplicate members. Because Generate() automatically eliminates duplicate members, the result of the Generate() function is a set of members that have unique names.

This calculation of unique brands can be slow because some nested loops scan through the full list of brands. The query in Listing 3 is reasonably fast because the FoodMart Sales cube doesn't contain many brands. But if you're developing an application that needs to perform this type of calculation on a large set of items, consider defining the set before running your queries. You can predefine the set by using the CREATE SET command or by using Analysis Manager. You can use Analysis Manager to create a set by right-clicking a cube and selecting Edit to start the cube editor. Then, from the Insert menu, choose Named Set. In the window that appears, enter the name of the set and its formula. All client applications that have access to this cube will also have access to this set. Both approaches prevent the calculation from being evaluated every time you run a query that uses it.

The calculation that determines TotalBrand is similar to the calculation that finds UniqueBrands—with one wrinkle. What you need in this example is the set of all brands that have the same name as the currently selected one so that you can sum the brands that have duplicate names. To get this list of brands, you might write a statement such as the following:

Filter( \[Product\].\[Brand Name\].Members AS Temp2Iterator,
    \[Product\].CurrentMember.Name =
Temp2Iterator.Current.item(0).Name )

But this statement doesn't work because as the Filter() function steps through all the brand members, it changes the CurrentMember property of the Product dimension that goes with it. Thus, you lose track of where you were in the list before the Filter() function started. To record the CurrentMember property before starting the Filter() function, you can use the Generate() function to step through a set of one item, which you identify by using the CurrentMember property. Then, you can give CurrentMember an alternate name (I used Temp1Iterator) for use inside the Filter() function.

Generate( \{\[Product\].CurrentMember\} AS Temp1Iterator,
    Filter( \[Product\].\[Brand Name\].Members AS
Temp2Iterator,
        Temp1Iterator.Current.item(0).Name =
Temp2Iterator.Current.item(0).Name  ))

The examples in this article show the Generate() function's power. Because of the relative immaturity of analytic applications, little of this power is exposed through end-user analysis features. For now, you need to know MDX to tap this power. I'm sure this requirement will change as application developers learn how to expose complex MDX functionality in business terms for users.

To practice creating a calculated measure that predicts unit sales for each month in a given quarter, see the sidebar "February MDX Puzzle." For the answer to January's puzzle, see the sidebar "January MDX Puzzle Solution Revealed."