Downloads
16310.zip

Storing, indexing, and retrieving XML

Editor's Note: Send your XML questions to xmlquestions@sqlmag.com.

During the past two months, readers' questions have centered on using ADO and XML. This month, let's explore a different area of SQL Server 2000—the ability to store, index, and retrieve XML.

Can I use SQL Server 2000's XML support to store serialized objects in an XML representation that I can then retrieve and deserialize back into objects?

Absolutely! SQL Server 2000's XML support includes OpenXML, a T-SQL keyword that you can use to store an XML document or fragment in a column that you can subsequently query. OpenXML provides a rowset abstraction over an XML document. In other words, OpenXML lets you treat an XML document as if it were a table (or, more precisely, a view) in your database.

Let's first look at a simple example that uses the Pubs database to illustrate how you can use OpenXML to create a tabular view of an XML document. Consider the XML document that Figure 1 shows. The document contains a list of authors' names and cities of residence. Suppose you want to query this document for authors from Utah. Using OpenXML, you could produce on this XML document a logical view that uses the values of the name and state elements for the values of the NAME and STATE columns, as Table 1 shows. You could then use the following SQL statement to select the authors who are from the state of Utah:

SELECT NAME, STATE
FROM logical view
WHERE STATE = 'UT'

OpenXML is a function syntax that takes a set of parameters that specify the shape of the logical view you want to construct. OpenXML requires, as a first parameter, a handle to a previously prepared XML document. You prepare an XML document by using a system-supplied stored procedure. In the following example, the T-SQL variable @document contains the text of the XML document you're preparing:

DECLARE @docHandle int
EXEC sp_xml_preparedocument @docHandle
        OUTPUT, @document

The stored procedure sp_xml_preparedocument parses the XML document and builds an internal, in-memory representation. You refer to this representation through a handle that you pass to the stored procedure as an output parameter. Note that you should always pair a call to sp_xml_preparedocument with a corresponding call to sp_xml_removedocument to remove the in-memory representation after you've finished processing the document.

The second parameter that you pass to OpenXML specifies the row selector, which is an XPath expression. By passing a parameter to specify the row selector, you're selecting rows for the logical view. XPath is a path-oriented query language for XML that provides a syntax for matching a set of elements or attributes (generically called nodes) in an XML document. The XPath expression that you specify for the row selector is applied to the XML document that corresponds to the handle passed in the first parameter when you execute OpenXML. Each node that the XPath expression matches corresponds to a single row in the logical view. For example, you can use the XPath expression //authors as the second parameter to OpenXML to specify that any author element found anywhere within the document corresponds to a row in the logical view that OpenXML provides.

After you've selected the rows for your logical view, you need to define the columns. OpenXML provides several options for selecting the columns. However, in this article, I discuss only the WITH clause syntax because it's the most frequently used. (For details about the other options, see SQL Server Books Online—BOL.)

To define the columns for the logical view, you use a set of tuples. Each tuple contains the name and SQL data type for the column, along with an XPath expression. You use the XPath expression to refer to an element or attribute relative to the node that the row selector specifies. For example, you would specify the XPath expression residence/state to obtain each author's state of residence.

Using the document handle, row selector parameters, and the column definitions, you can rewrite the first query above to use the OpenXML syntax, as Listing 1 shows. With this basic understanding of OpenXML concepts, you can now turn your attention to solving the problem of storing and retrieving an object.

Let's say that the object you're storing is an author object that has a name and a residence sub-object. The residence sub-object has city and state properties. (This example is a bit contrived and oversimplified, but it illustrates how to use OpenXML to store objects.) Let's say you want to store the author object in the database, and you also want to index the author objects by state. First, you need to write the OpenXML syntax to select the state value from each author in the document (as I demonstrated earlier). Next, you need a way to select the XML representation of the author from the document so you can store it in the database. In addition to specifying an XPath expression in the column tuple of the OpenXML function's WITH clause, you can also specify one of several special metaproperties.

The @mp:xmltext metaproperty selects the textual representation of the XML corresponding to each element that the row selector selects. Using @mp:xmltext, you can select the XML representation of the author at the same time that you select the value of the state element. You now have a logical view of the XML document, as Table 2 shows.

Using this logical view, you can use standard SQL syntax to store the data in a table in your database. Listing 2 shows the code you use to select the author's state of residence and store it, along with the XML representation of the author object, in the AuthorObjects table. After you've stored the XML representation of your author objects in the database, you can use standard SQL syntax to retrieve the XML representation.

Although OpenXML doesn't provide a way to automatically store and index arbitrary XML, it does provide a general mechanism to store and index XML documents while leveraging the efficient storage and query mechanisms that SQL Server 2000 provides.