OpenSchema can help solve COM interoperability problems

Now that Visual Studio .NET and the Microsoft .NET languages are commercially available, many developers are going beyond experimentation and using Visual Studio .NET to develop production applications. If you're like me and you use SQL Server 2000 Analysis Services to develop applications, you probably hit a snag when you tried to combine it with Visual Studio .NET. The .NET Framework doesn't natively support data access for multidimensional (i.e., OLAP) data sources. So, OLAP developers have to use existing COM-based ADO MD for application development. At first glance, this problem doesn't seem important because .NET languages interoperate with COM. But have you tried making .NET and ADO MD work together? I have—and I found that it's not straightforward. Here are some of the problems I ran into and some ways you can avoid similar complications when you try to integrate .NET with your existing COM applications.

Programming with ADO MD

Programming with ADO MD is much like programming with ADO. Like ADO, ADO MD has two primary ways of retrieving information: You can retrieve metadata from the schema rowsets, or you can execute queries. However, ADO MD applications differ from ADO applications in that ADO MD applications typically rely more heavily on schema rowsets than ADO applications do. Schema rowsets in ADO MD contain dimensions, levels, and members that not only describe the structure of the data but are also a large part of the data. You can get answers to some user queries by using ADO MD's OpenSchema method to retrieve a schema rowset instead of using an MDX query. For example, in the FoodMart 2000 Sales cube, you can use an OpenSchema call to answer the question "What are all the product departments within the Drink product family?" OpenSchema is the method call that retrieves schema rowsets. With ADO—and relational databases in general—the structure of the tables typically doesn't change with the contents of the tables. With ADO, questions such as this one always require an SQL query.

To start programming with ADO MD from Visual Studio .NET, you need to import the ADO MD type library into your .NET project. You can do this the same way you add a reference in Visual Basic (VB) 6.0. In Visual Studio .NET, open the Project menu and select Add Reference. In the resulting dialog box, click the COM tab and scroll down to select Microsoft ActiveX Data Objects (Multi-dimensional) 2.x Library. You also need to select a non-multidimensional version of ADO named Microsoft ActiveX Data Objects 2.x Library. After you've selected both of these type libraries, click Select, then click OK. You've added the type definitions from these libraries to your project. Finally, if you're using C#, you can include the ADO MD type library so that you don't have to prefix all ADO MD type references in your source code with the name of the ADO MD type library. You can include the type library in each C# source file by adding the appropriate USING clause to your source code (e.g., USING ADOMD).

Problems with C#

Because ADO MD relies heavily on schema rowsets, you need to be able to fully utilize the OpenSchema method. Other than MDX, OpenSchema is the only mechanism that ADO MD provides to retrieve the contents of the schema rowset tables. Unfortunately, the OpenSchema method is difficult to call from a .NET language such as C# because of the complexity of the OpenSchema parameters. Two factors create this complexity: The parameters are optional, and the criteria parameter is an array. Each of these factors creates a problem when you try to integrate .NET and COM.

OpenSchema is designed so that you call it with one, two, or three parameters, as the following example calls show:

OpenSchema QueryType
OpenSchema QueryType, Criteria
OpenSchema QueryType, Criteria, SchemaID

However, in C# you can't issue the first two of these calls because you can't omit parameters—even when they're declared optional in the COM type library. This restriction is part of C# syntax. In other functions that take optional parameters, this type of restriction isn't a problem because you can issue the call with all parameters, passing default values for any parameters you would have omitted if the language had supported optional parameters (like VB does). But with OpenSchema, passing default parameters is difficult and risky because the values for the SchemaID parameter aren't available in the ADO MD type library. You'd have to determine what the default value for SchemaID is and declare the equivalent value in C#. Using SchemaID's default value is risky because a future version of ADO MD could change the values of these constants, and your code would no longer work because its declaration would contain an old value. The methods I used to solve these problems don't include redeclaring any ADO MD SchemaID constants in C# that could possibly change in the future. Let's look at each of the two problems you face with OpenSchema.

Solving Integration Problems

First, let's look at how to deal with the problem of OpenSchema's optional parameters. As I just explained, C# doesn't support omitting parameters, and the values for the third parameter, SchemaID, aren't included in the ADO MD type library. Thus, you have to determine an innovative way to call OpenSchema. To omit a parameter, you have to take advantage of late binding. Usually, binding takes place during a program's compilation, but late binding means that a method call and its parameters aren't validated against its definition until runtime. Listing 1 shows an example of how to use late binding with only one parameter to call the Open method on the Connection object. The example works because the InvokeMember method in the .NET runtime library can determine how to issue a method call with some parameters omitted, which the C# compiler can't do. Calling the Open method with just a connection string is convenient because you rarely use other parameters (e.g., UserID, Password, Options) with Analysis Services. The UserID and Password parameters are typically unnecessary because Analysis Services uses Windows authentication to determine the identity of the client application.

The use of InvokeMember() in Listing 1 deserves an explanation. InvokeMember() is a flexible and powerful method that lets you perform late binding to access an object property (e.g., GET or SET) or lets you call a method in which the structure of the call is determined at runtime. Because ADO and ADO MD support retrieval of type information about the available methods, parameters, and properties at runtime, InvokeMember() can determine how to package the parameters you provide it so that it creates the correct call to the Open method on the Connection object.

The InvokeMember method is available on each Type object in .NET. Don't confuse a Type object with an object of a particular type. A Type object in .NET contains information about the type but isn't an instance of the type. (The fourth parameter of InvokeMember, Conn in Listing 1, is the instance of the type on which you issue the method call.)

The first parameter of InvokeMember, Open in Listing 1, is the name of the method or property to be accessed. The second parameter, BindingFlags.InvokeMethod, tells InvokeMember what type of operation you want performed. In the example that Listing 1 shows, you want InvokeMember to invoke a method. The third parameter is the binding object. If you pass NULL as the third parameter, as Listing 1 shows, you get the default binding object. The binding object controls the manner in which the .NET runtime library accesses the object (which is the fourth parameter, Conn). Finally, the fifth parameter, new Object\[\], is the list of parameters that you pass to the method. I could have added multiple items to the array to pass more parameters to the method, but instead I passed an array containing just the connection string.

The second problem you encounter when you issue method calls to OpenSchema is figuring out how to declare and construct the Criteria parameter in C#. When I was trying to integrate my COM applications with .NET, I encountered this problem because I didn't have any examples of how to declare the elements of the criteria array in C#. Usually, you can use the Visual Studio .NET object browser to see a parameter's type. But in this case, the types can change with different calls to OpenSchema (depending on what restrictions you want to use) and the parameters are nested inside an array—the object browser doesn't show types inside an array parameter. C#'s data types aren't exactly the same as those in COM, so to match the parameters in a COM method call correctly, you need to know how the COM interoperability layer in .NET will translate the .NET data types into COM data types. Unfortunately, if you get the data types wrong, you get a generic error message that doesn't give you a clue about how to fix the problem. In COM, the Criteria parameter is a safe array of BSTR variants; you'd never need to know that information if you used ADO MD from VB 6.0. You can pass a regular VB array to OpenSchema, and it just works. Explaining a safe array of BSTR variants is beyond the scope of this article, but I'll demonstrate how to define the equivalent criteria array in .NET.

Listing 2 shows how to use a criteria array to call OpenSchema from C#. You declare a standard C# array of objects and fill it with either NULL or strings. The criteria are values for some of the columns in the schema rowset; criteria define which rows you want returned from the rowset. For example, if you want to return the dimension rowset rows in which the cube name is Sales, your criteria would specify a restriction on the Sales cube's name column.

Each schema rowset has its own possible criteria, which SQL Server Books Online (BOL) lists in the "Schema Rowsets, OLAP" topic. For each rowset, a section of BOL lists the columns in that rowset that you can use as criteria. The column order is important. For example, the dimension schema rowset has the criteria columns CATALOG_NAME, SCHEMA_NAME, CUBE_NAME, DIMENSION_NAME, and DIMENSION_UNIQUE_NAME. If you want to use criteria to retrieve specific information from any schema rowset, your array of criteria must be in the order that BOL specifies. In Listing 2's code, I restricted the dimension schema rowset by cube name, so I had to include space in the array for the preceding array locations CATALOG_NAME and SCHEMA_NAME. However, I didn't want to restrict the resulting rowset to those columns, so I placed a NULL in the first two array locations.

Let's look at another example of how to construct OpenSchema's criteria array. One of the most complicated schema rowsets is the members rowset. To filter the members returned in an OpenSchema call, you can use 11 restriction columns and a tree operator. You use the first 11 restriction columns the same way that Listing 2 shows, but the final members rowset restriction is different. The tree operator restriction lets you retrieve dimension members by placing a related member in the MEMBER_UNIQUE_NAME restriction. For example, as Listing 3 shows, if you wanted to retrieve all the dimension members that are the immediate children of the CA member, you could place CA's unique name—\[Customers\].\[All Customers\].\[USA\].\[CA\]—in the MEMBER_UNIQUE_NAME criterion, then place the MDTREEOP_CHILDREN constant in the tree operator criteria. The criteria array is flexible; each element of the criteria array has a meaning that depends on which schema rowset you're accessing. This particular example differs from those I described previously because the element in the criteria array is an operator rather than a rowset column value.

Notice in Listing 3 that I defined constants for the different tree operators because the operators aren't available in the ADO MD type library after you import it into Visual Studio .NET. I don't know why the constants don't appear, but redefining the constants helps you issue OpenSchema calls on the members rowset.

I recommend using the tree operator with the OpenSchema call instead of running MDX queries to find the children of a dimension member because OpenSchema is significantly faster. Regardless of the technique you use, always be aware that dimensions can contain large numbers of members, and retrieving more dimension members than you need can be costly. For example, avoid retrieving all of a member's descendants or even all the members of a dimension level. You could end up retrieving a million or more members if you're using your application against a large cube.

Now that you know how to issue OpenSchema calls to ADO MD from C#, you can use the code samples in this article as a guide for using Visual Studio .NET to develop your next analysis application. Be aware that making COM method calls from .NET does incur some overhead because marshaling (a data and code transition between process address spaces) is necessary. However, this overhead won't materially affect your application's performance unless you call ADO MD thousands of times in succession.