If you develop database-centric e-commerce applications, you know that passing multiple rows of structured data from a Web application to SQL Server in one network round-trip can be difficult. This process is especially complex when the number of rows you need to pass between the Web application and SQL Server is variable, depending on the user's actions. Consider the business logic of a typical business-to-consumer (B2C) e-commerce application. Let's say that customer Heidi places several items in her shopping cart and proceeds to the checkout process. First, she enters her unique identity (typically her email address), then confirms her identity by entering her password. She enters or confirms demographic information (e.g., name, address, and phone number) and payment information (e.g., credit card number and expiration date). Then, she might be able to choose among several additional options, such as shipping method, gift message, and so on. Finally, the application asks her to review her order information before making her purchases final. In this scenario, the e-commerce application must send data associated with multiple database tables to SQL Server. These tables typically include Customer, OrderHeader, and OrderDetail, at a minimum. And several other tables might be involved in the transaction if Heidi used e-coupons, bought a gift certificate, or selected the gift-wrapping option. Figure 1, page 56, shows an entity relationship (ER) diagram depicting this set of relational data.
If you're using SQL Server 7.0 or earlier, inserting and updating data in these tables from a B2C e-commerce application requires you to make multiple round-trips between the application tier and the data-storage tier. You must make one round-trip for each table or stored procedure for which an insert or update takes place. Because these logically separate application tiers typically reside on physically separate servers, inserting and updating data requires multiple network round-trips, as well as coordination of these multiple database calls within a transaction to ensure that the set of calls either succeeds or fails as a unit. You don't want SQL Server to commit only part of the order if a database or application error occurs.
Wouldn't you like to simply place all this customer, order, and promotional data into one hierarchical "information package" and send it over the wire from the application to SQL Server? The ideal solution would also enable the database to then take the data out of this package and perform all the inserts, updates, and deletes on the appropriate tables in one fell swoop. One of SQL Server 2000's new XML-integration features—the OpenXML rowset provider—lets you send multiple rows of structured data from the application to the database in one database call. Let's look at how OpenXML lets you treat a well-formed XML document as a set of rows with which T-SQL statements and functions can interact. But first, to provide a real-world scenario in which to examine the use of OpenXML, let's look at Microsoft Commerce Server components that ease the burden of passing Web-centric data between a user's browser and a Web server.
Commerce Server's OrderForm Component
A customer places an e-commerce order through a browser, such as Microsoft Internet Explorer (IE) or Netscape Navigator. Most large-scale e-commerce Web sites ensure that their checkout process supports the most popular versions of each browser. The browser then uses the HTTP (or HTTP over Secure Sockets Layer—HTTPS) protocol's Get or Post method to send the customer-entered data as text to the e-commerce application that's running on the Web server. After the Web application receives and parses the text data, it passes the data to the business tier. Most e-commerce developers separate their applications' functions into three logical tiers: a presentation tier, which handles all user interaction; a business tier, which applies business rules to the data coming from and going to the user; and a data tier, which efficiently receives data into and sends data out of the back-end database.
The Active Server Pages (ASP) Request object makes this data-passing process relatively simple. The Request object lets the Web application access data elements that it receives in an HTTP or HTTPS Get or Post operation by exposing them as name/value pairs. The presentation tier can then package these name/value pairs into a data structure and send them to the business tier. However, as you saw earlier, the purchase process typically comprises more than one HTML page. What you need is an efficient way to store this partially gathered data between page requests and a way to package the data after the user completes it so that the data can pass en masse to the business tier.
Microsoft created Commerce Server's OrderForm component so that developers could easily store purchasing-related information, such as customer, order-header, and order-detail data. The OrderForm component is simply a COM object that encapsulates a flexible in-memory data structure. Developers can store and access order data by referring to name/value pairs that the OrderForm component holds. You can think of the OrderForm component as a collection of Commerce Server Dictionaries and SimpleLists that resembles an order-headerorder-detail relationship. The Commerce Server Dictionary and SimpleList components are analogous to multiple- and single-linked lists, respectively. The OrderForm component comes with several fields (dictionary nodes) built in, and developers can add fields to the header and detail portions by simply referring to them by name in the application code. Figure 2 shows a graphical representation of the OrderForm component.
How does the OrderForm component help you collect purchase-related information in a multistep browser-based application, and where does the OrderForm component store the data between Web page calls? The classic problem of a browser-based Web application has always been finding the best place to store session state between HTTP requests. OrderForm uses streaming data to persist and reconstitute itself. You can persist the data in the OrderForm component in either binary or XML format. You would use the binary format to store the order form in a relational database binary large object (BLOB) column, such as SQL Server's Image column. And you would use XML format to store the data anywhere you would store text and to transport the text data through the HTTP protocol.
Passing OrderForm to Business and Data Tiers
Now, let's look at how you can use the Commerce Server framework to pass the OrderForm data to the business tier and, ultimately, to the database for storage in the Customer, OrderHeader, and OrderDetail tables. The Commerce Server framework provides another COM component, the Pipeline component, to help process purchase-related data. The Commerce.MtsTxPipeline and Commerce.MtsPipeline components simplify the developer's job of applying business rules to the data in the OrderForm component. The first Pipeline component uses COM+ services to apply all data updates within a transaction; the second component doesn't offer transaction services.
The Pipeline is a conceptually simple way to sequentially call components that support the IPipeline COM interface (for information about this interface, see the Commerce Server software development kit—SDK). Each component that the Pipeline calls interacts with the OrderForm component, then passes the OrderForm back to the Pipeline, which calls the next component, and so on.
You can use the graphical Pipeline Editor to configure the Pipeline by specifying which components it hosts and the order of their execution. The Pipeline Editor saves these configuration parameters to a file called a Pipeline Configuration File (PCF). You can create any number of PCFs for a Web application. For example, you might want to configure one PCF for the shopping-cart page, one for the order process, one for order confirmation, and so on.
After the business tier has applied the business rules to the OrderForm component, you need to pass this data to the data tier for storage in the relational database management system (RDBMS). As I noted at the beginning of this article, this process traditionally involves multiple round-trips, or database calls, from the data-tier components to the RDBMS.
Consider a PCF called Purchase.pcf, which defines the purchase process. This particular pipeline file includes components for acquiring a unique order number, ensuring that all required OrderForm fields are present, authenticating the customer's credit card information, and so on. When all this prep work is done and the completed order is ready to go to the RDBMS, the business tier calls one or more data-tier components, passing in the OrderForm component as an input parameter. These data-tier components then disassemble the OrderForm, map its data to the tables and stored procedures in the RDBMS, and send this data a piece at a time to the database. Figure 3 shows pseudocode that depicts this flow of data from the data tier to the RDBMS, highlighting the actions that require a network round-trip.
SQL Server 2000's new XML-integration features let developers read data from and write data to SQL Server as XML. XML describes the structure and content of data as simple text. The OpenXML rowset provider lets you send a set of data to the database in the form of an XML document, then expose this data to SQL Server 2000 as a relational rowset. You can then extract the data in this rowset and use it to perform multiple inserts, updates, and deletes. Let's look at how you can use OpenXML to simplify and streamline the process of getting the order data from the data tier and into SQL Server.
Putting OpenXML to Work
You can think of the OpenXML rowset provider as a function because it returns a relational rowset that you can use as you would use a table. This feature lets you use data in XML documents as the source for queries, inserts, updates, and deletes.
To use OpenXML, you must first load an XML document into an in-memory tree—or Document Object Model (DOM)—structure by using the sp_xml_preparedocument system stored procedure. This stored procedure returns an integer handle, which the OpenXML function can then use to access the DOM. The basic OpenXML syntax is
OpenXML(idoc int \[in\], rowpattern nvarchar \[in\], \[flags byte \[in\]\]) \[WITH (SchemaDeclaration | TableName) \]
The only required parameters are idoc (the integer handle to the DOM) and rowpattern (which you specify by using XPath pattern syntax). The XPath rowpattern parameter identifies the XML nodes within the XML document that are to be processed. Some nodes map to database table rows; others map to columns. The optional flags input parameter defines the mapping that OpenXML will use between the XML document nodes and the relational rowset. By default, OpenXML uses attribute-centric mapping, which corresponds to a value of 0. A value of 1 also specifies attribute-centric mapping, and a value of 2 specifies element-centric mapping.
You use elements to mark up sections of an XML document, as this example shows:
content of an element
Elements can contain content, but they also might be empty or contain child elements, as the following example shows:
West All Stars East All Stars
Attributes, in contrast, add descriptive information to an element, such as
In this case, type is the attribute and has a value of "thumbnail".
The format of the SchemaDeclaration parameter is similar to a column declaration. You define a rowset column name along with a data type and, optionally, a column pattern. If you use a column pattern, it overrides the mappings specified in the flags parameter, as I show later in the article. You can use the TableName parameter in place of SchemaDeclaration if a SQL Server table with a desired schema already exists, in which case you don't need column patterns.
The following example shows how to use an XML document as the data source for a call to OpenXML. First, you must prepare the XML by loading it into an in-memory tree representation, then store a handle to this DOM document in a local integer variable:
EXEC Master..sp_xml_preparedocument @idoc OUTPUT, @XML
Let's suppose that you've stored the XML in Listing 1 in the @XML variable. With the XML in Listing 1 loaded into a DOM structure, you can use the @idoc variable to access the DOM. The following SELECT statement shows you how to use OpenXML to turn this XML data into a relational rowset:
SELECT * FROM OPENXML (@idoc, '/ROOT/Customer',1) WITH (CustomerID varchar(10), ContactName varchar(20))
Here is the resulting rowset:
CustomerID ContactName ---------- ----------- VINET Paul Henriot LILAS Carlos Gonzlez
You can use OpenXML in a variety of ways to convert XML documents into relational rowsets. For a full explanation and examples of OpenXML's different uses, see SQL Server 2000 Books Online (BOL).
Combining OpenXML and Commerce Server
Now that you understand how to manage e-commerce orders within the Commerce Server framework and how to use OpenXML to treat XML data as a relational rowset, you're ready to combine these two technologies to transmit and process e-commerce orders in one network round-trip.
First, let's consider some sample purchase data. Let's say that a sample shopping cart contains two unique product items and the customer has selected a quantity of two for the first item. The total price for the order is $87.00. In addition to product, SKU, and price data, the sample order contains customer and credit card data, which was gathered from a Web address information page populated with customer and order-header information.
The OrderForm component, which stores and manages order data for a particular order for a particular customer, can persist itself by streaming its data as an XML document. Web Listing 1 shows the XML produced by a Commerce Server OrderForm component containing the customer and order data gathered from the customer order. (For download instructions, see the More on the Web box.) Note a few things about the XML that the OrderForm component produces. First, OrderForm contains a very flexible data structure, capable of storing data for almost any type of order. Remember that OrderForm consists of several Dictionary and SimpleList components; the number depends on how many individual order items the order contains. Because of this flexibility, the XML that describes and contains the data held in an OrderForm component is verbose and hierarchical.
Note also that the XML includes two primary sections, corresponding to the header and detail portions of the order. The header section also contains the customer data. The XML document starts with a
One last note about the XML that represents the OrderForm component: You can't pass it as is to SQL Server 2000's OpenXML function. The XML namespace value dt is never defined in the XML document. So, if the data tier passes the XML to SQL Server 2000 as is, you'll get the error message XML parsing error: Reference to undeclared namespace prefix: 'dt'. To make the XML document acceptable to SQL Server 2000, I simply declare the dt namespace in an element called
The Uniform Resource Name (URN) value can be any unique value. In this example, I used my personal home page URL as the URN value.
So, before you pass the XML that OrderForm produced to the data-tier component that calls your SQL Server 2000 stored procedure, you prepend to the document a
Processing the XML with OpenXML
Now, you're ready to use OpenXML to process the OrderForm component's XML. First, let's look at the spOpenXML_PlaceOrder stored procedure in Web Listing 3, which contains the OpenXML statements that process the order data in the XML document.
First, spOpenXML_PlaceOrder creates a local in-memory representation of the XML that the data-tier component passes to it, as callout A in Web Listing 3 shows. At callout B, the procedure begins a transaction, then uses OpenXML to insert a row into the OrderHeader table. Note that because the table column names don't match the XML document node names, you have to use the XPath column-pattern parameter, DICTITEM\[@key="order_id"\]/VALUE. The column-pattern parameter gives OpenXML the information it needs to appropriately map column names to XML document nodes. Also note that because the /ROOT/DICTIONARY/DICTITEM path isn't repeated in the local XML document, OpenXML returns only one header row from this SELECT statement.
At callout C, the stored procedure uses another OpenXML statement to populate the OrderDetail table. This statement's structure is nearly identical to that of the statement that populated the OrderHeader table. Note, however, that because /ROOT/DICTIONARY/DICTITEM/VALUE/SIMPLELIST/LISTITEM/VALUE/DICTIONARY repeats in the XML document, this statement returns more than one row of data. You can see the power of combining the OpenXML rowset function with an INSERT, UPDATE, or DELETE statement. With one call to OpenXML, you can retrieve multiple rows of data from an XML document and store that data in a table, a local T-SQL variable of type TABLE, and so on. After extracting the data you need from the local XML document, you no longer need access to the document. You can remove the XML document from memory by using the system stored procedure sp_xml_removedocument, as callout D shows. Figure 4 shows the resulting data in the two permanent SQL Server tables OrderHeader and OrderDetail.
Voilà! You've now seen how to pass a structured, hierarchical set of rows from an application to a SQL Server 2000 database. You've also seen how to populate multiple database tables—all in one call to the database—by sending one simple text input parameter to a stored procedure. To make the sample application relatively simple, I limited the data sent to the database to just a few rows. You can see the power and efficiency of this technique even more when you're working with complex applications.
Packaging hierarchical or variable-size data into an XML document, then sending the XML to SQL Server gives you two main benefits: increased efficiency because of fewer network round- trips and a data tier that's conceptually simple and easy to code and test. The increased efficiency you gain by limiting the number of round-trips between the application and SQL Server might not be apparent where the volume of data exchange is relatively low. Web and client/server architectures typically call for a high-speed connection between the application and SQL Server anyway. But in high-volume situations and environments that use low-speed connectivity between the application and SQL Server, you'll see significant performance advantages of using XML to write data to SQL Server.
However, the second benefit—an easy-to-code data tier—is just as important as the first. By sending XML to SQL Server as a single input parameter to a stored procedure, the data tier no longer needs to be aware of the underlying database structure. That knowledge now lies in the stored procedures that use OpenXML to map XML nodes to tables, rows, and columns. Even in this article's simple example, the data-tier components had to call at least two stored procedures to insert a new order. The components had to know in what order to call these stored procedures and what parameters to pass to each. In addition, the components had to call the stored procedure responsible for inserting the order-detail rows a variable number of times, depending on how many line items the order contained. But if you package the data as XML and use OpenXML to map the data into SQL Server, the data tier needs to know only the one stored procedure that inserts an order and the one input parameter that this procedure uses.