XML is fast becoming the lingua franca of the new technology world—the universal markup language you can use to represent any other language running on any platform. Although SQL Server 2000 is the first SQL Server release to provide integrated XML support, Microsoft's XML technology preview, available at http://msdn.microsoft.com/downloads/betas/default.asp, runs under any SQL Server release. The technology preview, refined through 11 revisions, is robust, and many SQL Server installations are using it to integrate XML and SQL Server. (For a quick overview of the technology preview, see the sidebar "Microsoft's XML Technology Preview," page 38.) However, the XML functionality in the technology preview, which Microsoft says it will continue to support and enhance, is different from the functionality in SQL Server 2000 and gives you some features, such as XML update grams and an automation object model, that the integrated functionality doesn't provide. In turn, SQL Server 2000 provides some functions, such as XPath queries, that aren't available in the preview. Microsoft says it will merge the Internet Server API (ISAPI) application that provides the technology preview's XML functionality with SQL Server 2000's ISAPI application into a Web release after SQL Server 2000 ships. To help you avoid confusion and optimize the performance of each product's XML support, let's look at how SQL Server 2000's integrated XML functionality differs from the technology preview's XML functionality. (For a look at the XML support in an early version of SQL Server 2000, which implemented the technology preview's XML functionality, see Paul Burke's "XML and SQL Server 2000," May 2000.)

Composition and Decomposition


SQL Server 2000 delivers XML support through several features. You can implement some XML functionality through a new T-SQL function, the FOR XML extension to the SELECT statement, and two new system stored procedures. Other integration features are available only through a specific data access API or by using an ISAPI application running on Microsoft IIS.

You use SQL Server's XML features to decompose and compose XML documents. In document decomposition, you map fields in a hierarchical XML document to fields in one or more SQL Server tables. If you receive an XML document from a Web browser, an Active Server Pages (ASP) program, or a business-to-business (B2B) application, you can map multiple repeating sets of fields in the document to multiple SQL Server tables. You can then pass an entire XML hierarchy into a stored procedure as a varchar or text parameter and parse the hierarchy inside SQL Server to add, update, and delete multiple records in multiple tables within one stored procedure call. This functionality not only reduces round-trips to the database, it also lets you accept input in a variety of formats. SQL Server 2000 supports document decomposition through the new T-SQL OpenXML() function. OpenXML lets you decompose XML documents to a rectangular rowset, in which each row contains the same number of columns. Later in the article, I show you how to use OpenXML to decompose an XML document.

You can also use data in SQL Server tables to compose an XML document in a specific format. You can select data from a single table, but because XML data is intrinsically hierarchical (an XML document can contain only one root element, and you must nest all other elements under that root element), you usually combine data from multiple related tables into one XML document. You map multiple tables in a JOIN to specific attributes or elements in an XML document hierarchy. SQL Server 2000 supports document composition through the SELECT statement's new FOR XML clause.

Unlike the alpha version of SQL Server 2000, which offered six system stored procedures for manipulating XML data, the final version of SQL Server 2000 has only two stored procedures for XML: sp_xml_preparedocument and sp_xml_removedocument. I describe how to use these stored procedures later in this article.

SQL Server 2000 also lets you submit an XML-based command to SQL Server directly instead of using SQL text-based commands. Directly submitting data through a Web page in XML format to ASP programs on Web servers is common practice now that most modern browsers can format data into XML for transmission through HTTP. Although SQL Server 2000 can't directly handle XML-based requests for data, you can use the OLE DB provider for SQL Server (SQLOLEDB) to route the requests through the OLE DB API, then convert the XML-based queries to SQL queries that have the FOR XML suffix.

In addition, you can submit XPath (the XML query language) queries and SQL queries wrapped in XML templates. XPath queries are XML Views that use an XML schema to map SQL Server tables, fields, and relationships into what appears to the XML programmer as an XML document. Sending queries embedded in XML is more than a semantic convenience for programmers who don't know SQL. The ASP Request object (HTTP input) and Response object (HTTP output) support the COM IStream interface. And Microsoft has enhanced OLE DB and ADO 2.6 so that they allow direct IStream-based input and output through the ADO Command object. For maximum performance, you can use minimal ASP code to stream XML input, process the input through the SQLOLEDB provider, submit the input to SQL Server as FOR XML queries, and stream the XML results directly back to the Web client. You can also use these streams directly in an ADO program without further programming.

Underlying Differences


Although most of the XML functionality in SQL Server 2000 and the technology preview are the same, the different versions' ISAPI application and FOR XML functionality have different implementations and produce slightly different output. You'll notice the first difference between the two versions of XML support when you invoke

select * from authors for xml auto

from Query Analyzer and study SQL Profiler's output for this statement. Unlike the technology preview, SQL Server 2000 recognizes FOR XML as a valid SQL clause. SQL Server 2000's ISAPI application passes the entire query directly to SQL Server, whereas the technology preview's ISAPI application must translate the query to SQL before passing it to the relational database management system (RDBMS). Instead of returning a result set of rows and columns, SQL Server 2000 returns one column—XML_\{GUID\}—with \{GUID\} representing a globally unique ID (GUID). Because the output column is usually longer than Query Analyzer can display, Query Analyzer splits the output into two lines, but the output is really a single string that SQL Server can return through a stream.

Another subtle difference between the technology preview's and SQL Server 2000's XML support is that the technology preview, by default, wraps the result in a top-level XML element, guaranteeing that the result is valid XML; SQL Server 2000 doesn't. Although having to supply the element or code it into the SQL statement can be a hassle, this implementation lets you combine multiple queries' results into a single XML document.

Let's look more closely at SQL Server 2000's ISAPI application to get a better understanding of how the integrated functionality works. The Microsoft Management Console (MMC) snap-in Configure SQL XML Support in IIS, which lets you configure SQL Server 2000's XML support, looks similar to the technology preview's snap-in. However, Microsoft wrote SQL Server 2000's ISAPI application to process input from a Web browser in a slightly different way. SQL Server 2000's ISAPI application, sqlisapi.dll, passes an XML query to the SQLOLEDB provider, which passes the query as a stream to a component called sqlxmlx.dll. Sqlxmlx.dll preprocesses the query before it goes to SQL Server and postprocesses SQL Server's return set. Sqlxmlx.dll extracts the query string from the XML input document, which is in wrappers, and passes the string to SQL Server.

Almost any XML processing you can do through SQL Server 2000's Web-based ISAPI application you can also do through OLE DB or ADO code with the SQL Server provider. You can submit XML requests to SQL Server as a stream, postprocess results with a style sheet, and submit data as XPath queries or SQL queries wrapped in XML. However, neither SQL Server 2000 nor OLE DB or ADO code supports update grams, which are XML-based update, insert, and delete statements. (In late July, though, Microsoft released its XML Updategram Preview for SQL Server 2000, available at http://msdn.microsoft.com/downloads/default.asp.)

XPath Queries


As I mentioned earlier, besides letting you wrap SQL queries in XML, SQL Server 2000 supports XPath queries, which let you query a virtual XML document. XPath is the World Wide Web Consortium (W3C)-approved method for selecting a set of nodes from an XML document. The following statement shows an example of an XPath query:


   Customer\[@CustomerID="ALFKI"\]/Order\[@OrderID=10643\]

The name XPath comes from its syntax, which is reminiscent of file-system paths. For example, the query /root/customers/orders produces a node set that contains the orders elements (rows), which are children of the customers elements, which are children of the root element. You can use XPath queries to query XML documents you have on your file system. You can also use either XPath or FOR XML to create XML documents, then use XPath to extract data from the documents' subfields.

SQL Server 2000 also lets you use XPath queries to query SQL Server tables. The result is an XML document that contains the XPath query's results. But how do you know which SQL tables correspond to which XML hierarchy level and which relationships among SQL tables map to which XML terms? In SQL Server 2000, you can define the relationships by using an XML View to create mapping schemas.

XML mapping schemas are XML schemas in XML Data Reduced (XDR) format combined with a set of annotations that map elements and attributes of a virtual XML document to SQL Server tables, keys, and so on. The simplest (and default) schema maps a SQL Server table to an XML element and SQL Server columns to XML attributes. Listing 1 shows the mapping schema for the discounts table in the Pubs database. You can use this mapping schema with XPath queries to find all discounts elements (rows) matching a certain search criteria, such as /discounts\[@stor_id > '8000'\]. You can express more complex mapping schemas by using annotations to specify which elements map to which table or column, which data types the elements and tables use, and which relationships exist between tables. Figure 1 shows a diagram for mapping the relationship between the Pubs database's stores and discounts tables. The schema element in Listing 2, which you can add to the schema in Listing 1, uses the sql:relation annotation to map the XML store elements to rows in the stores table. The sql:relationship annotation describes a foreign key relationship between the stores table and the discounts table, where stor_id is the key field that specifies the relationship. You can also use the sql:field annotation to map an XML attribute of type storename to the database field stor_name.

The mapping schema I show here represents all SQL Server columns as attributes in an XML document—a schema called attribute normal form. You can also use a schema in element normal form, which represents each field as a subelement of the named element. Attribute normal form looks like


Element normal form looks like


  val
  val
  val

The XML mapping schema not only represents, for query purposes, which fields in the database tables map to which XML elements and attributes, but it also defines the shape of the resulting XML document. By carefully creating your XPath query and mapping schema, you can use SQL Server to build XML documents suitable for B2B exchange, for example. Using the mapping schema from Listings 1 and 2, for example, the XPath query

/stores\[@stor_id > "7500"\]

produces the XML output


 
 
    stor_id="8042" discount="5.00" />
 

Note that the ISAPI template file adds the tag.

In implementation, SQL Server 2000 databases don't directly support XPath queries. When you submit the above query (either through the ISAPI program on the Web server or directly through ADO), the SQLOLEDB provider intercepts the query and transforms it into a FOR XML EXPLICIT query. The FOR XML EXPLICIT query is a specialized form of UNION query that specifies a particular XML tree shape for the results. Note, however, that the FOR XML EXPLICIT query that the XPath query and mapping schema generate might not give you as much control over the UNION query as a custom-written FOR XML EXPLICIT statement. For example, an XPath-generated FOR XML EXPLICIT query can't use an ORDER BY clause.

Decomposing with OpenXML


As I noted earlier, T-SQL's OpenXML function decomposes XML hierarchies into relational structures. For example, suppose someone emails you an XML document that contains student registration information and you need to use that information to update nine SQL Server tables. You can use OpenXML to map the XML data to relational rows and columns. The mappings that OpenXML uses look like a hybrid of SQL's CREATE TABLE syntax and an XPath mapping schema. The following shows an example mapping of data in an XML document to relational rows and columns:

CustomerID  varchar(10) '../@CustomerID',
ProdID      int         '@ProductID',
Qty         int         '@Quantity'

These tuples represent the name of the database table column, the column's data type, and an XPath expression that maps the column to one or more nodes in the XML document. You use this syntax in the OpenXML call's WITH section. But if the columns map the XML document subset directly to one table, you only need to specify the XPath row pattern and substitute the table name for the mapping syntax.

You can use this mapping with OpenXML to produce an output rowset, or you can use it as input to insert, update, or delete statements. To use the mapping with OpenXML, you must first use SQL Server 2000's sp_xml_preparedocument stored procedure to convert the XML document into an internal XML document object model representation suitable for decomposition. Sp_xml_preparedocument parses the XML document, adds some helpful annotations, and returns a document handle, which you use in the OpenXML statement. When you're finished with the internal document, system stored procedure sp_xml_removedocument frees the memory allocated for it. Listing 3 shows an example of using OpenXML to add a row to a database table from an XML document. In this example, because of the one-to-one relationship between attributes of the authors elements and the columns in the underlying SQL Server table, the WITH schema clause doesn't contain the column names, types, or XPath expressions that identify document locations.

Because you can use one XML document to represent data in one or more tables, you can use XML documents as input to stored procedures to avoid multiple round-trips to the database. You might want to use an XML document in a stored procedure, for example, to add a customer record, 1-to-n order records, and 1-to-n line items per order in a single round-trip to the database. Because the number of orders and line items varies, specifying each order and item column as a separate stored procedure parameter is unwieldy at best. Listing 4 shows a simplified example of how you can use OpenXML to insert into multiple tables from a single XML document. Although the example declares the document inline, you would typically pass the document into the stored procedure.

Coming Together


SQL Server 2000's XML Views and mapping schemas let you easily integrate XML into your relational database environment. T-SQL's new OpenXML feature lets you decompose an XML document into multiple rows in multiple SQL Server tables. And SQL Server 2000's ISAPI application doesn't have to perform the translation that the technology preview's ISAPI application does. Still, SQL Server 2000's XML implementation lacks some important functions found in the technology preview. Understanding the differences between the two XML implementations can help you select which best fits your needs until the merged XML functionality is available.