Using Dynamic Properties to Stabilize DTS
To cut down on coding and thereby minimize errors, Microsoft added the Dynamic Properties task to Data Transformation Services (DTS) in SQL Server 2000. With the assistance of this task, you don't have to create bulky ActiveX Script tasks to dynamically set a DTS property, such as a username that you use to establish a connection. This task lets you change the value of any nonidentifying property that’s accessible through the DTS object model (e.g., non-name/ID properties of a step, connection, task, package, or global variable). What once took 3 weeks to stabilize, you can now write and stabilize in less than a day. Using the Dynamic Properties task gives you faster performance than writing the same process with an ActiveX Script task because DTS doesn't resolve the ActiveX Script task until runtime.
Leave Snowflakes Alone
Analysis Services lets you source dimensions from either a normalized snowflake schema or a flattened star schema. Microsoft recommends flattening snowflake dimensions into stars for performance reasons, a practice that most Analysis Services developers follow. However, unless the relational data mart is consumed by something other than Analysis Services, this practice has few benefits and considerable drawbacks. For these reasons, resist the urge to flatten:
- A snowflake schema provides the benefits of a normalized design. With a star schema, managing attributes for the repeating non-leaf members is awkward at best.
- A snowflake gives you unique keys at each level. This lets you import data into a cube at any level of granularity, a critical ability in financial-planning applications, for example.
- Because dimension tables aren't queried at runtime (except for in the notoriously slow relational OLAP—ROLAP—mode), snowflake dimensions have no impact on query performance. The only downside to a snowflake dimension is that it (the dimension, not the cube) is slower to process than a star because of the joins that are necessary. However, the time it takes to process dimensions is a minor factor compared to the time necessary for cube processing. Unless the dimension is huge and the time window in which processing must occur is tight, snowflakes are the way to go.
Create Grouping Levels Manually
No dimension member can have more than 64,000 children, including the All member. This limit isn't as onerous as it sounds; usability is apt to nail you before the hard limit does. A member with even 10,000 children usually presents a usability problem—that's a lot of rows to dump on a user drilling down into the dimension.
Whether you're fighting the limit or simply working to design your dimension so that it provides bite-size drilldowns, the solution is to build deep, meaningful hierarchies. But when there's no raw material from which to build a meaningful hierarchy, you must resort to a grouping level, aka a Rolodex level, such as the first letter of the last name for a customer dimension. Analysis Services has a feature (create member groups in the Dimension Wizard) that can create a grouping level for you automatically. Don't use it! You won't have control over the grouping boundaries. Instead, construct the level manually. This entails adding a new level to the dimension, then modifying the Member Name Column and Member Key Column properties. For instance, you might define the member key column and member name column for the grouping level as follows:
This expression bases the level on the first letter of the customer name, providing Rolodex-style navigation. Bear in mind, however, that this is a SQL pass-through; the expression is passed to the RDBMS, so the RDBMS dictates the syntax. That is, T-SQL has a LEFT() function, but another RDBMS might not.
Understand the Role of MDX
Did you ever try to swim without getting wet? For all but the simplest of databases, that's what it's like when you try to design an OLAP solution without using MDX. Because shrink-wrap client software often negates the need to write MDX SELECT statements, many developers think they can successfully avoid MDX. This is folly. Sure, not every project requires MDX SELECT statements; commercial software is adequate for many situations. But MDX calculations should play an important role in most Analysis Services solutions, even those that aren't calculation-intensive on the surface.
Perhaps the most common example is a virtual cube that's based on two or more source cubes. Calculated members are usually required to "glue" the virtual cube together into a seamless whole. Although the MDX isn't necessarily complex, developers unaware of the role of MDX wind up making costly mistakes. Either they avoid virtual cubes entirely, or they shift logic that's easily implemented in MDX to the extraction-transformation-load (ETL) process, where it's more complicated and rigidly set.
Using NON EMPTY to Include Empty Cells
Many multidimensional cubes have empty cells, which occur because a user didn't load data into the cube for these members. For example, if you inspect the Sales cube in the FoodMart sample, its creators didn't load any data for 1998. You must use the NON EMPTY modifier to write an MDX query that includes 1998, as Listing A shows.
Formatting Financial Reports
When creating a financial report such as an income statement, you need to display the subtotals (which are parent members of a dimension) at the bottom of the statement—after the details (which are children). You can use the MDX Hierarchize() function with the POST option to force parent members to appear after their children. The following example shows the Hierarchize() function on the FoodMart 2000 Sales cube:
SELECT Hierarchize(MySet, POST) ON Columns
WHERE \[Sales Count\]
How can you change this query to sort the set MySet in ascending order while making sure the parents appear after their children?
Thanks to Shahar Prish of Microsoft for providing the clever answer that Listing B shows. First, he sorted the items in descending order while preserving peer groupings (i.e., keeping children of a common parent together). Then, he used a Generate() function to reverse the order of the set. The result maintains the peer groupings, keeps the items in ascending order, and places the parents after the children. Notice that Shahar uses the AS keyword to name the sorted set MySetIterator. He also uses the Count and Current properties on the named set.
Analyzing Store Revenue
Retail businesses sometimes evaluate their stores' performance by analyzing revenue per square foot or cost per square foot. Use the FoodMart 2000 Sales cube to determine what the total store square footage is for each Store State (Store State is a level in the Store dimension). Note that what makes this problem unique is that square footage is stored as a member property in the Store dimension.
Listing C shows a query that solves this problem. This query is interesting because it demonstrates how to aggregate (sum) a numeric member property. The query creates a new measure that returns the value of the Store Sqft member property. If the selected store member is above the Store Name level in the dimension, the query sums all store square footage values below the selected member in the hierarchy to determine the square footage value. Because the MDX treats member properties as strings, the query uses the VBScript function VAL() to convert the member property from a string to a number before summing all the store square footage values.
Use Counts to Analyze Textual Information
You can analyze any database—with or without numeric information—by using counts. Count measures can be easy or complex. In the FoodMart database, setting up the Sales Count measure is simple. You just create a new measure based on the fact table's primary key and set the measure's type to count. But let's look at a more complex example. Say a table called fact contains line items (entries) for invoices. An invoice can contain one or more entries. So you probably want a count measure that counts invoices, not entries. To count invoices, you want to count only the groups of entries that make up an invoice.
One way to solve this problem is to create a distinct count measure based on the fact table's invoice number column. This measure will give you the invoice count values you want, but distinct count measures have two serious limitations. First, each cube can contain only one distinct count measure. Second, Analysis Services doesn't aggregate distinct counts through dimension levels as it does other measures. The distinct count values are predetermined and stored at each cell with other measures, so you can't create new aggregations during an MDX query's execution. In an MDX query or a calculated member definition, using a function such as
Aggregate( \[USA\], \[Mexico\] )
won't work with a distinct count measure selected; determining the result of the function would require rescanning the fact table because the cube doesn't contain enough information to determine the function's result. Analysis Services can't rescan the source table, but even if it could, the process would be prohibitively slow. The effect of this restriction is that distinct count measures don't generally work well with other calculated members or sets.
A second solution is to create an extra column in the source table to store an invoice count. Fill one entry for each invoice with a value of 1; fill all other instances of the invoice count field with values of 0. You can then create an Invoice Count measure that's a sum of this invoice count column. This solution works as long as you select in the cube a cell that encompasses a group of entries that make up complete invoices. If your cell includes only some of the entries in an invoice, the invoice count column might not include the entry that contains the 1 value and thus would produce a sum of 0 instead of 1 for that invoice.
A third solution is to use a combination of the two approaches. Create an invoice distinct count measure, an invoice sum count measure, and an invoice count calculated measure that helps you determine which of the other two measures to use based on the cell that's selected. The invoice distinct count measure will return the correct answer when only some of the entries in an invoice are selected, and the invoice sum count will work in all other situations. The invoice sum count also gives you the benefit of working when custom combinations of members are selected. This invoice count example shows that, in real-world situations, count measures can get complicated because the count might depend on a distinct combination of a group of fact table columns.
A common type of retail sales analysis is consolidation analysis. One example of consolidation analysis is customer consolidation analysis: If fewer customers are buying more products, your customers are consolidating. If more customers are buying fewer products, your customers aren't consolidating. In the FoodMart 2000 Sales cube, you can use the Store Sales measure to determine the top 10 customers. Then, you can write an MDX query to determine whether the top 10 FoodMart customers are consolidating throughout the four quarters of 1997. But first, you need to create an MDX query that includes the four quarters of 1997 on the columns of the query's result. Then, create two rows. The first row should display the total number of store sales that the top 10 customers purchased. The second row should display the percentage of total store sales that the top 10 customers purchased. Listing D shows the code that produces this result.
I made this query a little easier to read by first creating a set with the top 10 customers based on Store Sales, then using this set in the other two calculated measure definitions. The first calculated measure sums the store sales for the top 10 customers to determine the store sales that the top customers are responsible for. Next, the Top 10 Percent measure determines what percentage of the total store sales comes from the top 10 customers. The query then displays both the Top 10 Amount and the Top 10 Percent for each quarter of 1997.
The query's result shows that the top 10 customers are consolidating slightly. During first quarter 1997, the top 10 customers were responsible for 1.41 percent of all store sales; during fourth quarter 1997, that group accounted for 1.77 percent of store sales.
Working with Analysis Services Programmatically
Analysis Services has three programmatic interfaces that you can use from your analysis application. The first two interfaces are client-side interfaces: ADO MD and OLE DB for OLAP. Both of these programming interfaces offer functionality for data exploration, metadata exploration, write-back capabilities, and read-only analysis functions. Only the write-back capabilities affect the contents of the cube that other users in the system share. If you want to make other types of changes to Analysis Services data programmatically, you have to use the administrative programming interface Decision Support Objects (DSO). DSO lets you create and alter cubes, dimensions, and calculated members and use other functions that you can perform interactively through the Analysis Manager application.
Filtering on Member Properties in SQL Server 7.0
Even if your OLAP client application doesn't support Member Properties, you can still filter based on their values by using the virtual dimensions feature of SQL Server 7.0 OLAP Services. Virtual dimensions expose Member Properties as another dimension in which the members of the dimension are the individual values of the Member Property. After you've defined a Member Property in OLAP Manager, you can use that property as the basis for a virtual dimension. For example, the Store Size in the SQFT dimension in the FoodMart sample database is a virtual dimension based on the Store Sqft Member Property in the Store Name level of the Store dimension. By using OLAP Manager, you can tell the difference between a real dimension and a virtual dimension by looking at the icon in the cube editor. Figure A shows the three virtual dimensions based on Member Properties of the Store Name member. Virtual dimension icons have a small calculator as part of the image. Virtual dimensions include all the unique values of the underlying Member Property as dimension members, and these members aggregate to an ALL member. Thus, virtual dimensions have only two hierarchical levels. In the client application, the virtual dimensions show up as just another dimension and don't significantly increase the size of the cube. Unfortunately, in the current release of OLAP Services, virtual dimensions are slow compared to real dimensions. Still, virtual dimensions are worth using because they let you filter OLAP queries on Member Properties even when the client application might not directly support that capability.
Improving Query Performance
As we explain in the article "Introducing Poormon," February 2003, InstantDoc ID 37468, when members of our DBA team were preparing our data for graphing, we executed some preliminary queries to pull data from the System Monitor, generated CounterData and CounterDetails tables, and received some interesting results. First, we found that pulling data from the default table structures was slow. Then, we added a calculated field and index to CounterData and found that queries performed significantly faster when CounterDateTime was an indexed datetime field rather than a non-indexed char(24) field. (We appreciate the assistance the SQL Server Magazine technical editors gave us in figuring this out.) But when we modified the structure of the CounterData table with the appropriate indexes and calculated fields, System Monitor wouldn't log the data at all, although our queries performed somewhat better. It turns out that System Monitor tries to recreate the tables when it finds structural changes in them. We also tried creating an INSTEAD OF trigger to route the data entry into another table. However, when we did so, SQL Server bulk-loaded the data and ignored triggers. We thought about modifying the tables, but you can't expect assistance from Microsoft if you change the system tables, so we recommend that you don't alter them.
In the Microsoft Platform Software Development Kit (SDK) under the Performance Monitor heading (at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/perfmon/base/performance_data.asp), Microsoft describes the fields of the CounterData table as Table A shows.
However, the description of CounterDateTime is incorrect. If you investigate the System Monitor tables CounterData and CounterDetails, you'll find that the counter names are stored in CounterDetails and counter values are stored in CounterData, using one column for every counter and logged one row at a time. For example, if you logged the 12 counters for 2 minutes, CounterDetails would contain 12 records for the names of the counters, whereas CounterData would contain 24 entries for each minute the data was logged. One way to make pulling data from this format more efficient and effective is to transform the data into a pivot-table format in which one column exists for the date and time and additional columns exist for each counter whose data you want to view. Interestingly, this is the same format that a System Monitor CSV file uses.
Using SQL ALIAS for the AS/400
The AS/400 supports a file concept known as multiple-member files, in which one file (or table) can possess several different members. Each member is a part of the same file or table and shares the same schema, but the members are uniquely named and have unique data. ODBC and OLE DB have no built-in mechanism for accessing multiple members. By default, ODBC always accesses the first member in a multimember file. To enable ODBC-based applications such as Data Transformation Services (DTS) to access multiple-member files, you need to use the AS/400's SQL ALIAS statement. The ALIAS statement lets you create an alias for each member you need to access. Then, your ODBC application can access the alias, which in turn connects to the appropriate member. These SQL aliases are persistent, so you need to create them only once. The following statement shows how to create an alias:
CREATE ALIAS MYLIB.FILE1MBR2 FOR MYLIB.MYFILE(MBR2)
This statement creates a new alias named FILE1MBR2 for the multimember file MYFILE. The ODBC or OLE DB application then connects to that specific member, using the alias name FILE1MBR2 to access the second member in the file MYFILE.