DataSets provide a relational mapping for XML documents
SQL Server 2000 and XML for SQL Server 2000 Web releases (SQLXML) provide three ways in which you can store XML data. XML Bulk Load and Updategrams, two client-side technologies, use annotated schemas to specify the mapping between the contents of an XML document and the tables in your database. OpenXML is a server-side technology that lets you define a relational view on an XML document. With OpenXML's relational view, you can use T-SQL code to query the data in the XML document and store the results in your SQL Server database.
Each of these three storage technologies is designed for a particular purpose. XML Bulk Load stores data from very large XML documents in SQL Server. Updategrams perform optimistic updates of SQL Server data. (Optimistic updates are updates without locks, in which the system checks to see whether another user has changed the data after it was originally read.) And OpenXML provides familiar relational access for XML data.
Of these three technologies, OpenXML is the most flexible because it provides a programming model (T-SQL) that you can use to write business rules or perform computational logic on the XML data before storing it in your SQL Server database. However, because OpenXML is a server-based technology, if you use it frequently or with large documents, it can degrade SQL Server's performance. But if you've adopted the Microsoft .NET Framework, you can work around these performance and scalability limitations by using ADO.NET's DataSet, which gives you a powerful technology—including a full programming model—for storing XML data in SQL Server.
DataSets, DataTables, and XML Mapping
Last month, in "XML Query Results in .NET" (InstantDoc ID 39160), I showed you an easy way to generate XML query results from SQL Server by using a DataSet. By providing a relational cache that you can use on client and middle-tier machines, the DataSet can load and manipulate data from a variety of sources, including SQL Server, other relational databases, and XML.
When you load a DataSet from an XML document, the DataSet must map the data that's stored in the hierarchical XML representation into the DataSet's relational representation. For example, if you have an XML document that contains a list of Order elements that has nested LineItem elements as children, that document would most commonly be mapped to Orders and LineItems DataTables in the relational representation. The mapping is similar in purpose to the way OpenXML uses XPath queries to construct a relational view on the XML document. But instead of using XPath specifications, DataSets have their own way of mapping data.
DataSets use XML Schema Definition (XSD) schemas to map data from an XML document into the DataSet's relational cache. DataSets give you two ways that you can specify a schema to map the XML data. First, you can reference an XSD schema that defines the elements, attributes, and relationships that are used in the XML document. Alternatively, you can infer the schema directly from the XML document's structure. In other words, the DataSet can build a schema by examining the structure and content of the XML document.
When you reference an XSD schema, the DataSet uses the elements and attributes that are defined in the schema along with the relationships that are defined between the elements to construct the DataTables, DataColumns, and DataRelationships in the relational cache that you use to store the mapped XML data. I refer to the structure, or schema, of the relational cache generically as the shape of the cache. When processing the schema, the DataSet applies a set of rules, similar to default mapping rules that Updategrams and XML Bulk Load use when no annotations are specified in their mapping schema, to create the tables that the DataSet uses to store the mapped XML data. You can summarize the DataSet's mapping rules as follows:
- Complex elements—those that contain other elements or attributes—are mapped to tables.
- Attributes and simple-valued subelements—elements that contain only data, not other elements or attributes—are mapped to columns.
- Data types are mapped from the XSD types to .NET types.
The useful ADO.NET DataSet documentation, "Generating DataSet Relational Structure from XML Schema (XSD)" at (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/_generating_dataset_relational_structure_from_xsd.asp), contains full details of the mapping rules. By referencing the schema of your choice, you can control the shape of the cache that the DataSet creates.
Inference is a quick, easy way to load an XML document into a DataSet. Tables, columns, and relationships are created automatically by introspection—a process whereby the DataSet examines the XML document's structure and content. Although using inference significantly reduces your programming effort, it introduces unpredictability in your implementation because small changes to the XML document can cause the DataSet to create different-shaped tables. These changes in shape can cause your application to break unexpectedly. Therefore, I recommend that you always reference a schema for production applications and limit your use of inference to building prototypes.
Now let's look at an example of how you can easily use a schema to build a client-side DataSet cache that you can use to update your SQL Server database.
Mapping an XML Order
Suppose you're writing an application that accepts orders from your customers in the XML format that the XSD schema in Figure 1 defines. The schema defines three complex types that provide the order's customer data, order data, and line items. A top-level Customer element defines the XML document's root. The containment hierarchy defines relationships between the elements: An Order element contains a LineItem element, and a Customer element contains an Order element. Figure 2 shows an instance of an XML document that matches Figure 1's schema.
The C# code in Listing 1, page 38, uses the ReadXmlSchema method to load the schema from Figure 1 into a DataSet called orderDS. ReadXMLSchema creates three DataTables that correspond to the Customer, Order, and LineItem elements that the schema defines. So that you can verify that the schema created the expected tables in the relational cache, the printDSShape method writes the table name for each table to the console, followed by a list of columns and the data type for each column.
Look closely at the column names in Figure 3, page 38. The Customer_Id and Order_Id columns are present in the DataTables although they aren't specified in the schema. The ReadXmlSchema method automatically adds these columns to the DataSet. The DataSet uses the columns as foreign keys to model the relationships between a Customer element and its Order element and between an Order element and its LineItem element. Because XML typically uses nested relationships instead of foreign keys, the DataSet automatically generates its own primary and foreign keys between the DataTables and stores them in these columns.
Also look carefully at the data types in Figure 3—the DataSet has mapped the data types from XML Schema data types to the corresponding .NET data types. When you load an XML document into the DataSet, the DataSet converts each value from the XML to the corresponding .NET type.
After loading the schema into the DataSet, all you have to do to complete the relational mapping is load the XML data into the DataSet. Listing 1's ReadXml method opens the file named Order.xml, which Figure 2 shows. Then, it reads the data from the file into the three DataTables that the DataSet created when you read the schema in the previous step. Your XML order is now accessible through the DataSet.
To demonstrate how to access the data in the DataSet, Listing 1's printDSData method navigates through the DataTables and, for each table, displays the column names, followed by all rows in the DataTable. Figure 3 shows the automatically generated values for the Customer_Id and Order_Id columns that the ReadXmlSchema method added to the DataSet.
Also notice that three elements that appear in Order.xml—PO, Address, and Description—aren't mapped into the DataTables. This data is omitted because the schema you supplied to the DataSet didn't contain these elements, and the DataSet simply ignores any data not described in the schema when it's creating the shape of the relational cache and loading the XML data. This convenient feature lets your code work properly even if additional data you didn't anticipate is included in the XML order you receive from your customer.
Building Applications That Use the Cache
Now that you've learned how to use the DataSet to build a relational cache for XML data, you can apply this knowledge to implement applications that execute business logic and update SQL Server. Implementing business logic is relatively straightforward when you use the DataSet programming model. ADO.NET gives you several alternatives for updating data in SQL Server, including using DataAdapters, writing your own queries, and executing stored procedures. DataSets make mapping XML data to a relational model easy; the rest is up to you.