Try these unorthodox but effective best practices on your next project
It's been 4 years since Microsoft redefined the OLAP market with the introduction of Analysis Services (originally named OLAP Services), the multidimensional analytical database bundled with SQL Server. During this time, I've collected a list of lesser-known tips and best practices that can save Analysis Services developers time and trouble. These nine tips and techniques, ranging from the mundane to the sublime, go against the grain of common practice. However, common practice doesn't necessarily equate to best practice, and one of these tips might prove critical to the success of your next project.
1. Use Views as the Source for Cubes and Dimensions
Always use views as the data source for dimension tables and fact tables. In addition to providing a valuable abstraction layer between table and cube, views let you leverage your staff's expertise with relational database management systems (RDBMSs). When you use a view as a fact table, you can manage incremental updates by altering the WHERE clause within the view instead of assigning the WHERE clause to an OLAP partition. When you use a view to source a dimension, you can define logic inside the view that otherwise would have to be defined in Analysis Services (e.g., formulated member names, formulated member properties).
2. Leave Snowflakes Alone
Analysis Services lets you source dimensions from either a normalized snowflake schema or a flattened star schema. Microsoft recommends flattening snowflake dimensions into stars for performance reasons, a practice that most Analysis Services developers follow. However, unless the relational data mart is consumed by something other than Analysis Services, this practice has few benefits and considerable drawbacks. For these reasons, resist the urge to flatten:
- A snowflake schema provides the benefits of a normalized design. With a star schema, managing attributes for the repeating non-leaf members is awkward at best.
- A snowflake gives you unique keys at each level. This lets you import data into a cube at any level of granularity, a critical ability in financial-planning applications, for example.
Because dimension tables aren't queried at runtime (except for in the notoriously slow relational OLAP—ROLAP—mode), snowflake dimensions have no impact on query performance. The only downside to a snowflake dimension is that it (the dimension, not the cube) is slower to process than a star because of the joins that are necessary. However, the time it takes to process dimensions is a minor factor compared to the time necessary for cube processing. Unless the dimension is huge and the time window in which processing must occur is tight, snowflakes are the way to go.
3. Avoid Crippled Client Software
Can you imagine using a front-end tool for an RDBMS that doesn't let you specify a SQL statement? Of course not. Yet somehow that's what developers are faced with in the OLAP space. Remarkably, many shrink-wrap query and reporting tools that work with Analysis Services are crippled in a fundamental sense—they don't let developers supply an MDX SELECT statement. The problem is this: None of the commercial clients, even the most robust, come close to exposing the full power of MDX. Maybe simple cube browsing is all your users require. Nonetheless, to avoid painting yourself into a corner, choose a front-end tool that lets the developer specify custom MDX SELECT statements.
There's a catch to this advice, however. The client tools that don't expose MDX tend not to be tightly bound to Analysis Services—they provide connectivity to other data sources. However, I don't think it's asking too much for these vendors to expose an MDX SELECT query string as a pass-through.
4. Get Level Names Right from the Get-Go
When you first build a dimension, level names default to the same names as the column names in the dimension table (except that Analysis Manager replaces special characters with spaces). This means that you wind up with level names like Cust Code, or worse. Then, after the cube is processed, you can't change the level names without reprocessing the dimension, which in turn requires that you reprocess the cube. Because it's painful to rename levels after the cube is processed, many cubes go into production with frighteningly cryptic level names. To compound matters, MDX formulas are often written with dependencies on the unfriendly level names, adding another hurdle to the level-rename task. Cubes are supposed to be easily usable right out of the box, so avoid this pitfall by getting the level names right from the beginning. As soon as you build a dimension, change the default level names to user-friendly names before placing the dimension into the cube.
5. Enter Count Estimates
When you first build a dimension, Analysis Services stores the member count for each level as a property of the level. This count is never updated unless you explicitly update it (manually or by using the Tools, Count Dimension Members command). In addition, it's typical for cubes to initially be built against a subset of the data warehouse. In this case, the cube will likely go into production with the count properties understated by an order of magnitude. Here's the gotcha: The Storage Design Wizard uses these counts in its algorithm when you're designing aggregations. When the counts are wrong, the Storage Design Wizard is less effective at creating an optimal set of aggregations. The solution is simple—when you build the dimension, manually enter estimated counts for each level.
6. Create Grouping Levels Manually
No dimension member can have more than 64,000 children, including the All member. This limit isn't as onerous as it sounds; usability is apt to nail you before the hard limit does. A member with even 10,000 children usually presents a usability problem—that's a lot of rows to dump on a user drilling down into the dimension.
Whether you're fighting the limit or simply working to design your dimension so that it provides bite-size drilldowns, the solution is to build deep, meaningful hierarchies. But when there's no raw material from which to build a meaningful hierarchy, you must resort to a grouping level, aka a Rolodex level, such as the first letter of the last name for a customer dimension. Analysis Services has a feature (create member groups in the Dimension Wizard) that can create a grouping level for you automatically. Don't use it! You won't have control over the grouping boundaries. Instead, construct the level manually. This entails adding a new level to the dimension, then modifying the Member Name Column and Member Key Column properties. For instance, you might define the member key column and member name column for the grouping level as follows:
LEFT("CustomerDimTable"." CustomerName", 1)
This expression bases the level on the first letter of the customer name, providing Rolodex-style navigation. Bear in mind, however, that this is a SQL pass-through; the expression is passed to the RDBMS, so the RDBMS dictates the syntax. That is, T-SQL has a LEFT() function, but another RDBMS might not.
7. Use Member Properties Judiciously
When you start up the OLAP server, the server loads every dimension—including member keys, names, and member properties—into server memory. Because Analysis Services is limited to 3GB of RAM, this is one of the primary bottlenecks for enterprise-scale deployments. For this reason, limit member properties to the bare essentials, particularly when the level has lots of members.
8. Understand the Role of MDX
Did you ever try to swim without getting wet? For all but the simplest of databases, that's what it's like when you try to design an OLAP solution without using MDX. Because shrink-wrap client software often negates the need to write MDX SELECT statements, many developers think they can successfully avoid MDX. This is folly. Sure, not every project requires MDX SELECT statements; commercial software is adequate for many situations. But MDX calculations should play an important role in most Analysis Services solutions, even those that aren't calculation-intensive on the surface.
Perhaps the most common example is a virtual cube that's based on two or more source cubes. Calculated members are usually required to "glue" the virtual cube together into a seamless whole. Although the MDX isn't necessarily complex, developers unaware of the role of MDX wind up making costly mistakes. Either they avoid virtual cubes entirely, or they shift logic that's easily implemented in MDX to the extraction-transformation-load (ETL) process, where it's more complicated and rigidly set.
9. Use Caution
Be very careful when dealing with dimensions. Look before you leap into Analysis Services' very large dimension feature, which places large dimensions into a separate memory space. This feature is buggy, so avoid it. Also be careful with ROLAP dimensions, which the server reads into memory as needed at runtime. Because you can place a ROLAP dimension only into a ROLAP cube, performance will suffer mightily. In theory, ROLAP mode supports larger dimensions, but it's non-functional in my experience.
Because Analysis Services is a relatively new product and somewhat lacking in documentation, we're all in the same boat—we have to learn as we go. I hope these tips make a difference on your next project. Have some Analysis Services tips of your own? Send them to firstname.lastname@example.org.