Downloads
24020.zip

Hide confidential information by suppressing measure values within a dimension

Editor's Note: Joining Russ Whitney as coauthor this month is Paul Goldy (pgoldy@proclarity.com), senior consultant with ProClarity. Paul plans and implements OLAP solutions with SQL Server 2000 Analysis Services.

Most OLAP implementations exist to provide summarized information for decision makers. When decision makers discover summarized information in the dimension's upper layers that leads to further questions, they can drill down into more detailed information. The added detail can help the decision makers determine what business factors contributed to the summarized values. However, in some OLAP implementations you must limit the detailed information that a business analyst can see either for confidentiality or for legal reasons. For example, in a corporate financial analysis, you might need to make detailed expense information available to the decision-making managers of your company, but you don't want all individual salaries available because of their confidential nature. Or secondary-education administrators might use students' standardized test scores to analyze the effectiveness of certain schools, teachers, and curricula, but because of certain laws governing confidentiality, the administrators can't view individual students' test scores. Analysis of a local criminal-justice system might provide local crime statistics to the media, but you'd need to avoid revealing detailed data about individual judges, prosecutors, and defendants.

This kind of data-hiding problem occurs frequently, so we wanted to show you the process we used to create one real-life solution. We created this analysis solution for a data brokerage that collects and sells point-of-sale retail data. Retail outlets such as Best Buy, CompUSA, and Wal-Mart track scanned data and sell the data to product manufacturers such as Hewlett-Packard (HP), Lexmark International, and Canon. The manufacturers analyze the scanned data and use the analysis to optimize their marketing and sales strategies.

If competing manufacturers will likely view the data, you need to hide certain data to prevent unfair competition between manufacturers. A manufacturer can see trends or summarized information about a competitor but not the exact pricing and volume of specific brands within individual retail chains. To help enforce this concept, data brokers implement data-suppression rules that indicate which detailed information in the OLAP system needs to be suppressed. For example, if a brand-name beer from the FoodMart 2000 Sales cube (e.g., Good, Pearl, Portsmouth) sells in two or fewer Store Names (in the Store dimension), the broker's data-suppression rules specify that the brand's measure values must be suppressed. The suppressed value doesn't disappear but falls into the ALL OTHER BRAND member.

Another requirement of this suppression rule is that individual store names from the Store Name level of the Store dimension can't be visible to the analyst. Yet, the analyst still has to check against the members of that level to count the number of retailers reporting information for the brand. So selecting the exact data to suppress can be a complex task.

Figure 1 and Figure 2 show what the results look like for queries that don't use suppression compared with queries that do use suppression, respectively. Figure 1 shows the Unit Sales of beer brands for some cities in Washington and the store names within those cities. Figure 1 also shows the member ALL OTHER BRAND, which is unpopulated because you're seeing Unit Sales as it typically comes out of the Sales cube. Contrast this level of detail with the level of detail in Figure 2. The query that generates Figure 2's results is nearly the same, but no store names are available for the cities, and using the measure Suppressed Unit Sales enforces the suppression rule that requires that a brand be reported in two or more stores within each city. Thus, in Figure 2, the ALL OTHER BRAND member is populated and the brand name values are empty. The results show only one store in each city (because each city actually has only one store), so all the Unit Sales values fall into the ALL OTHER BRAND member.

We could best illustrate this problem by showing a typical cube from a data broker. However, these cubes aren't publicly available. So, we modified a copy of the FoodMart 2000 Sales cube to show how the suppression rule is implemented. First, we made a copy of the Store dimension, renamed it Hidden Store, and set the Dimension Visible property to false. Second, we made the Store Name level of the Store dimension non-visible by setting the Level Visible property to false. Third, we added an ALL OTHER BRAND record to the product and product_class tables in the FoodMart 2000 Access database, as Figure 3 shows. (Note that the product and product_class tables are related by the product_class_id column.) After we made all our changes, we rebuilt the Product dimension and reprocessed the Sales cube.

The solution to this suppression problem has three parts. First, we hid the Store dimension's Store Name level. Hiding this level is tricky because the level must still be available to the calculated member so that the MDX that enforces the suppression rule can determine which brands to suppress. Second, we created a new Product dimension member called ALL OTHER BRAND that will display the Unit Sales values that are suppressed. We created this dimension member by adding another record in the Product dimension tables. Finally, we used a calculated measure to suppress the Unit Sales values for brands that sell into two or fewer stores. This measure calculates the Unit Sales value for ALL OTHER BRAND.

For the first part of the solution, we thought the most obvious way to hide the Store Name level would be to set the Visible property of the Store Name level to false. Unfortunately, setting the Visible property to false makes the members of that level unavailable for use in the calculation that determines whether values should be suppressed. Instead, we copied the entire Store dimension to a new dimension named Hidden Store, then set the new dimension's Visible property to false. Unlike hidden levels, hidden dimensions are still available to formulas in a calculated member. You can change the dimension's Visible property on the Advanced tab of the dimension's Properties window in Analysis Manager's Cube Editor, as Figure 4 shows. Figure 5 shows a sample of the dimension structure and members of Hidden Store. We still needed to prevent users from drilling down beyond the Store City level of the Store dimension, so we set the Visible property of the Store Name level to false for the Store dimension, as Figure 6, page 56, shows. Listing 1 shows the MDX code for the Suppressed Unit Sales calculated measure we used to accomplish the second and third parts of the solution. You can break the MDX into two sections: The first section determines when to enforce the suppression, and the second section uses the suppressed values to populate the ALL OTHER BRAND member.

To determine whether to suppress a value, the MDX code first checks the current level within the Product dimension, as callout A in Listing 1 shows. This expression uses the level's .Name property to determine whether the Brand Name level contains the selected product member. The MDX suppresses values only at this level of the dimension.

The next part of the MDX code, which callout B shows, determines how many stores have Unit Sales for this product's parent. If more than two stores have Unit Sales, the code returns the measure; otherwise it returns NULL, thus suppressing the Unit Sales value. If you suppress the Unit Sales value for this product, you must add the Unit Sales value to the ALL OTHER BRAND member. The code that callout C shows contains this summation of suppressed values.

Notice how we used the Hidden Store dimension at callout B. To determine which store names from the hidden dimension apply to the selected visible store member, we cross-joined the two store dimensions. The first argument in the cross join is the set of all members from the Store Name level in the Hidden Store dimension. The second argument is the set that the tuple of Store.CurrentMember, Measures.\[Unit Sales\] and Product.CurrentMember.Parent creates. Specifying Store.CurrentMember in the tuple while excluding empties in the count forces the inclusion of only store names that match the selected store from the Hidden Store dimension.

The final part of the calculation determines whether to suppress this value. The Count() function uses the EXCLUDEEMPTY clause to suppress the value:

Count(CROSSJOIN(...), EXCLUDEEMPTY)>2

Remember that the suppression rule specifies that if a brand is reported in two or fewer Store Names, we need to suppress it. This check, which uses the Count() function to find values greater than 2, satisfies that condition. If we knew that the client would always use SQL Server 2000 or later and no calculated members were in the Store Name level, we could create more efficient MDX with the NONEMPTYCROSSJOIN() function:

Count( NONEMPTYCROSSJOIN(...) > 2 )

At this point in the solution, we'd successfully suppressed values by returning NULL under the correct condition. Now we needed to complete the solution by adding the suppressed values into the ALL OTHER BRAND member in the Product dimension. We accomplished this step in the section of MDX code that callout A shows. The pseudocode below shows the steps the MDX will take depending on the result of the IIF() function:

IIF(Product.CurrentMember.Name = "ALL OTHER BRAND",
<true  sum the suppressed values> ,
<false  determine whether we should show the value or suppress it>)

The section of the IIF() function starts with a check to determine whether the name of the current member in the Product dimension is ALL OTHER BRAND. We left the Unit Sales measure visible in the FoodMart 2000 Sales cube and the Hidden Store dimension. Usually, in a deployed cube in which suppression is implemented, we'd hide this measure by using the Measures Visible property (as Figure 7 shows) and the dimension's Visible property in the Cube Editor. However, for this article, we wanted to show the suppressed values in the ALL OTHER BRAND member alongside the unsuppressed values.

Let's return to our first two figures to see the results of our solution. Figure 1 shows the Product dimension down the rows with brands from the Beer member selected. Our example suppression rule states, "If a brand name sells into two or fewer Store Names, it must be suppressed." With one store in each city, each of the Unit Sales values should be suppressed. Figure 2 shows nearly the same query results as Figure 1 does. We can see that the Suppressed Unit Sales column has no values except those belonging to the ALL OTHER BRAND member. The ALL OTHER BRAND member's values are large because they're the sums of all the suppressed values. That total is the sum of all the Unit Sales values because only one Store Name exists in each of the cities in the FoodMart 2000 Sales cube. Keep in mind that the Hidden Store dimension and the Unit Sales measure would be hidden in the cube if it were deployed with the suppression rule implemented.

The ALL OTHER BRAND member is at a high level in the dimension hierarchy. Having all the suppressed values sum into ALL OTHER BRAND might not be practical for analysis in a Product dimension that's used in the Sales cube because the product dimension has many levels. A more practical solution would have an ALL OTHER BRAND member as a sibling to each brand group in the product hierarchy. To try your hand at implementing such a solution, see the sidebar "April MDX Puzzle," page 55.

This example of suppressing brand detail information is only one business example in which suppression implementation is useful. Suppressing detailed data while making summarized information available for analysis is a useful solution to many business problems. In some situations, you can use SQL Server 2000 Analysis Services' cell-level security features to handle your suppression solutions, but frequently, real-world situations such as the one we tackled require a combination of approaches. We hope this example arms you with the knowledge you need to solve this type of problem. Until next time, keep sending your article ideas.