Inserting hierarchical XML data into IDENTITY columns
You routinely encounter hierarchies in organization charts, parts lists, invoices that have line items, and the file system on your computer. The advent of XML provided a convenient way to represent these hierarchies by nesting entities (XML elements) inside one another. This approach is convenient when you're designing applications that present data hierarchically. But inserting hierarchically formatted XML data into your SQL Server database can pose a challenge, especially when the tables in your database use IDENTITY columns for primary keys, making it difficult to properly store the auto-generated key values in related tables. Fortunately, OpenXML meta-properties provide a solution for obtaining these key values and correlating them to related entities in the XML.
OpenXML and @mp:id
With SQL Server 2000, Microsoft released the OpenXML keyword, which lets you use T-SQL code to extract data from an XML document. You can think of OpenXML as providing a relational view or rowset (similar to the OPENROWSET T-SQL keyword) of the XML document. Defining a view using OpenXML is similar to creating a standard relational view except that you use a special syntax to select data from the XML document. Using OpenXML, you can query the data in the XML document as if it were stored in a table in your database. Because you can query the data, you can also insert the data into your database or update existing records in your database by using SELECT INTO or INSERT INTO commands. But what if you need to insert hierarchically related data such as an order and its line items into separate tables in your database, where a line item is related to its order through a foreign key that references an IDENTITY column?
Consider the database schema that the T-SQL code in Listing 1 represents. The schema defines two entities: an order stored in the Orders table and a line item stored in the LineItems table. In the Orders table, the schema defines an OrderNumber IDENTITY column, which holds a unique key for each order. The LineItems table also contains an OrderNumber column that's a foreign key to the OrderNumber column in the Orders table. Rows stored in the order and line item correspond closely to the entities in the XML document at callout A of Listing 2, page 42, which nests LineItem elements inside their parent order. It's simple to write an INSERT INTO statement that inserts all orders from the XML document into the Orders table and all line items from the document into the LineItems table. But associating each line item with its parent order presents a challenge because SQL Server dynamically generates the order numbers when each order is inserted. In this case, the @mp:id OpenXML meta-property proves useful for properly establishing the primary key-foreign key relationships in the database.
The @mp:id meta-property provides a unique identifier for each node in the XML document. Nodes correspond to the in-memory representation of the elements and attributes that store data in the textual representation of the XML document. You can think of the in-memory representation of an XML document as a tree of nodes in which the parent-child relationships between the nodes correspond to the nesting of elements in the textual XML representation. OpenXML uses this in-memory representation of an XML document when building a view. Another way to think about @mp:id is as a pointer, or handle, to the node. You can use the unique value that @mp:id returns to insert the orders and line items through two updates: one that inserts all orders and another that inserts all line items. Here's how.
Before you write the INSERT statements that use OpenXML and @mp:id, you must first build the in-memory representation of the XML document. You do this by calling the sp_xml_preparedocument system stored procedure and passing the textual representation of the XML document as a parameter (alternatively, you can pass a variable that contains the XML). Sp_xml_preparedocument parses the XML document and returns a numeric reference to the in-memory representation. The code at callout A in Listing 2 assigns the reference to the T-SQL variable @doc.
The next step is to build a relational view of the XML document that contains the order data. This is easy to do with the OpenXML statement at callout B in Listing 2. The OpenXML statement specifies the reference to the document in the variable @doc and specifies the XPath query //Order as the row selector. The WITH statement specifies the columns for each row that OpenXML generates. The last column definition uses the @mp:id meta-property to select the unique ID for the order. You then include the OpenXML statement as a row source for the INSERT INTO statement that, when executed, inserts all orders from the XML document into your database's Orders table, including the value of @mp:id for each order in the ImportID column. When inserting the orders into your database, SQL Server assigns unique order numbers to each order. In the next step, you obtain those order numbers so that you can associate each line item with the proper order.
To insert the line items, you use OpenXML to build another relational view on the XML document, as callout C in Listing 2 shows. Note the row selector at callout C—it's now //LineItem, which selects all line items in the XML document. Also note that the WITH statement uses the @mp:parentid meta-property instead of the @mp:id property it used when inserting the orders. The @mp:parentid meta-property is similar in function to @mp:id except that it returns the unique ID of the parent element, which for a LineItem element is the Order. To obtain the database-generated order number that relates to the line item, you join the XMLParentID column, which contains the value of @mp:parentid, to the ImportID column in the Orders table, which contains the unique ID for the order stored in the previous step. You can then insert the order number into the LineItems table along with the other data from the XML document.
To complete the process, you release the XML document from memory by calling the sp_xml_removedocument system stored procedure. It's important to call this stored procedure when you finish processing the XML document so that the memory you used for the XML documents doesn't starve your database of memory and, consequently, harm runtime performance.
You might take any of several alternative approaches to inserting hierarchical XML data into SQL Server. You could use a cursor to insert individual orders, then capture the order number by using the special @@IDENTITY variable or one of SQL Server's IDENTITY functions. You could then use the system-generated order number to insert line items for the order. You can download example cursor code by entering InstantDoc ID 27473 at http://www.sqlmag.com. Depending on your database schema, other approaches might also work for you. Just remember to choose a method that meets your performance requirements and works well with your database schema. Regardless of the approach you choose, using @mp:id is a good way to easily identify and correlate related items in an XML document.