Downloads
41231.zip

To figure out which partition designs work best to reduce the number of sequential scans for Listing 2 and Listing 3 in the main article, I created six cubes, structured identically but with different partition designs. The idea is to partition along the Time and Occupant_Severity dimension members (which Figure 2 in the main article shows) that the queries in Listing 2 and Listing 3 are looking for so that they can get to those members directly without a scan. To create a new partition, you can use the Partition Wizard in Analysis Manager as Figure A shows and specify the data slice (i.e., tuple). (For more information about using the Partition Wizard, see SQL Server Books Online-BOL.) Table 1 in the main article summarizes the partitioning data slices of the six test cubes. I named the cubes by combining the names of their partitioning dimensions and the total number of partitions in the cube. For example, each of the 31 partitions of cube Year_Severity31 slices along the Time and Occupant_Severity tuples (\[Partition Year\].\[1\]), (\[Partition Year\].\[2\].\[1997\], \[Fatal\]), (\[Partition Year\].\[2\].\[1997\], \[Injury A, Incapacitating\]), and so on.

To study the effects of the number and speed of CPUs, disk I/O, and physical memory on the partitioned cubes, I executed Listing 2 and Listing 3 on six different servers. Table 2 in the main article gives the specifications of these servers ranging from the highest end to the lowest end in hardware resources. All the tests measured the response times for Listing 2 and Listing 3. Listing A shows a code snippet from the Visual Basic (VB) application that I used to drive the tests. Listing A's code uses an ADO recordset to send the query and return the result. The code computes the response time by calling the Win32 API GetLocalTime function immediately before the open and after the close of the recordset. During this window, the code retrieves one record at a time until it reaches the end of the recordset and displays each record in a graphical grid called MSFlexGrid.

Because Listing 2 and Listing 3 reference a large dimension level-\[Street\].\[Street_List\], which contains 27,691 members-and involve the (FILTER) function, they must both always execute on the server. This requirement means that we can observe the performance of both client and server. Listing 2 and Listing 3 have only two main differences. Listing 2 contains six calculated members, whereas Listing 3 involves none. And Listing 2 references three year members (1998, 1999, and 2000) in each of its six calculated members, but Listing 3 references only the \[All Time\] member.

To study the effects of server (Analysis Services) cache and client (PivotTable Service) cache and compare those effects with the effect of partitioning, I maintained both cold and warm caches in my testing. According to BOL, if you don't set the PivotTable Service property Default Isolation Mode, the cursor type that the Recordset requests determines whether PivotTable Service is in isolation or non-isolation mode. If PivotTable Service is in isolation mode, it never refreshes its cache. For my test, I set the cursor type to adOpenForwardOnly, which caused the PivotTable Service to keep the client cache warm between queries-in other words, PivotTable Service didn't flush the client cache, so I could study the performance effects of a warm client cache. Table A shows the test results I got for cold and warm caches. Here's how I achieved cold and warm caches for my tests:

  • Cold Caches: Neither the server nor client caches contain the queried data. I simulated cold caches by restarting Analysis Services.
  • Warm Server Cache: Only the server cache contains the results of the test query. I simulated warm-server and cold-client caches by restarting the client application so that the client cache was flushed but the server cache remained warm.
  • Warm Server and Client Caches: Both server and client caches contain the queried data. I simulated warm server and client caches by keeping the client connection and immediately resending the same query.

Performance Results


Figure 3 in the main article shows the performance of Listing 2's query on all the servers, and Figure B shows the performance of Listing 3's query. Surprisingly, for both queries, the performance is exactly in reverse proportion to server power. For example, in Figure 3, you can see that for the Default1 and Severity6 cubes, the high-end servers are twice as slow as the low-end servers. For the other cubes, which all have partitions in the Time dimension, the gap decreases, but the low-end servers are still faster. Figure B shows that for Listing 3, the high-end servers still perform significantly worse than the low-end servers. Partitioning doesn't reduce the performance gap between high- and low-end servers for Listing 3 on any cube. Note that these results are consistent regardless of how I change memory and thread settings. For example, I changed the settings on High1 from the default 16 threads to 2 threads to mimic the low-end servers. I also tried setting a higher value for minimum allocated memory-from the default 511MB to 2GB. The changes made no difference in High1's performance.

From these results, we can draw an important conclusion: Partitioning along queried data slices leads to the best performance. For example, the partitions in the Year_Severity31 cube and the PartitionYear_Severity7 cube produced the best performance for Listing 2. The Severity6 cube produced the best performance for Listing 3's query (which doesn't ask for Time members).

To verify the fact that high-end servers can't guarantee better performance for Listing 2 and Listing 3, I reproduced a similar result by using a larger version of the Foodmart 2000 sample database that comes with Analysis Services installation. I named this expanded database Foodmart_85_99. It's an exact mirror of Foodmart 2000, except that I expanded the fact table and Time dimension by replicating their original records 14 times, each time changing the Year member from 1997 (the original value) to 1985 through 1999. I also created a new query, which Listing B shows, which is similar to the structure of the query in Listing 2 but targets the sales cube. I then created a new 30 percent-aggregated MOLAP cube called Sales_25 (in addition to the default cube Sales) that has partitions along data slices that Listing B references-for example, (\[1995\], \[Bachelors Degree\]). The results in Figure C again confirm that partitioning along queried data slices (as in the Sales_25 cube) generally improves performance. However, high-end servers still don't perform as well as low-end servers.

These results defy a couple of common beliefs: first, that more CPUs provide better parallelism for cubes that have multiple partitions and second, that more memory buffers more data, producing faster computation. The results of my test appear to show that more memory and more CPUs have no effect on the queries in Listing 2, Listing 3, and Listing B. Instead, one fast CPU outperforms multiple CPUs. Because low-end servers performed better in all of the above tests, I decided to use only the Low3 machine in the rest of my studies.

Table B shows the effect of varying aggregation levels. The response time stays at 52 seconds and 27 seconds for Listing 2 and Listing 3, respectively, except at 100 percent aggregation. This result supports my hypothesis that for complex queries (such as the one in Listing 2) or queries that return many records or columns (as Listing 3's query does), increasing the aggregation level won't improve performance. Since the aggregation level isn't a factor, I decided to fix the aggregation level at a common value-30 percent-for the rest of my tests.

A partition-design question that no one has ever answered is Should you keep an existing partition when you further divide it into new ones, or should you discard it and replace it with new divisions? Keeping the original partition would mean an overlap-duplication of data among the slices. The same problem can happen when you're creating new partitions. For example, when you create new partitions along Time dimension members, you might be tempted to specify these data slices: (\[Time\].\[All Time\]), (\[Time\].\[1997\]), (\[Time\].\[1998\]). You might think that the (\[Time\].\[All Time\]) partition includes only the \[All Time\] member as it appears in Figure A. But specifying the \[All\] member is exactly the same as leaving that dimension blank-which includes the entire dimension in the slice. You can verify this behavior by looking at the slice column in the Meta Data tab of the partition in Analysis Manager. So in this example, the (\[All Time\]) partition would overlap with the data in the (\[1997\]) and (\[1998\]) partitions. Overlapping partitions not only wastes storage space, it also significantly degrades performance, as Table C shows. The addition of the \[All Occupant_Severity\] partition overlaps with all of its descendants' partitions-for example, the (\[Fatal\]) partition-and makes Listing 2's query twice as slow.

I made sure that the six cubes I designed for my tests contained no overlapping partitions. Table A shows the performance of these cubes on server Low3, with the addition of cache performance. If you have enough memory to maintain warm caches for all your queries, partitioning has no effect on performance, regardless of the level of aggregation in your cube. Unfortunately, most of us must work with cold caches, so partitioning is a good choice. For Listing 2, the PartitionYear_Severity7 cube's response time is reduced to 9 seconds because of the partitioning I used and is nearly six times as fast as the default cube (Default1). More important, it's nearly as fast as carrying a warm server cache.

When I applied Analysis Services Usage-Based Optimization to add more aggregations for Listing 2 and Listing 3's queries, the performance of Listing 3 on every cube improved slightly, as Table D shows. However, partitioning appears to still have a negative effect on Listing 3. Interestingly, Usage-Based Optimization has no effect on Listing 2's performance on any cube, possibly because the aggregation of the three year members requires the use of calculated members-you can't preprocess any of the aggregation. Therefore, Usage-Based Optimization couldn't add any more aggregations for Listing 2.