Native XML support and a new data type provide a flexible structure for data
Last month in "Seeing the CLR's Potential" (May 2005, InstantDoc ID 45753), I explained some fundamental changes that the CLR in SQL Server 2005 brings to application architectures. Let's look at another feature set that will blur the traditional boundaries of application and database design: native XML support.
XML has become the standard format for transporting data over the Internet and has also found its way into other application-design areas (e.g., data storage). XML standards simplify sharing data with various systems, regardless of platform or architecture. Another advantage is that XML is self-describing. Traditional binary data-storage formats require that you have an application that understands the format. But with XML, you actually describe and store the data in the XML format. And XML is a human-readable data-storage format; I can open an XML file and understand the data. This readability is an advantage because it's not important to define exactly what data is in an XML file before you share the data. Because the XML file carries the data description with it, you can send data that contains extra pieces of information for certain data items. This type of data, often called semi-structured data, has an implicit structure, but the attributes of each item in the XML can be different.
Another distinguishing feature of XML data is the structure you store it in. Rather than attempting to duplicate a traditional relational data model—in which complex relationships between data items are defined along with the data—XML uses a hierarchical structure in which data is arranged as a set of nodes in a tree. Each data item can have its own associated set of data, described as sub-nodes for the item, and one XML structure can contain many top-level data items.
This storage structure is different from a relational store consisting of tables, columns, and rows. Although XML is unlikely to ever completely replace binary data formats or the traditional relational data store, not having to predefine what data items will contain or restrict data or data types before reading them has significant advantages. When you combine XML data with relational data, you can create more flexible application designs. For example, imagine a table that stores data about magazine articles. Each article is associated with data items such as title, text, author, and editor. In a purely relational model, you'd need to create an Articles table containing fields for each piece of data. But what if some articles have multiple authors or editors, have a subhead in addition to the title, or are actually a series of articles? Suddenly, the relational structure becomes more complex. You now need an Articles table containing an extra field for subheads, an Authors table, an Editors table, and a table for article series. Along with these extra tables, you likely need many linking tables because editors and authors can be associated with many articles. Although such complex relational models are common, XML might provide a better storage mechanism. You can represent all possible article data in one XML file similar to the one that Listing 1 shows.
XML's flexibility is wonderful, but it comes at a price. Searching XML files can be time-consuming because of the data's semi-structured nature, and XML is verbose. Ideally, a storage mechanism would let you combine the flexibility of XML with the power, speed, and efficiency of the relational store.
SQL Server 2005 introduces a new, native data type to help solve these problems. SQL Server 2000 lets you store XML on the server by storing the XML text in a BLOB field, so you can't work with or reference the XML on the server. To work with the XML, you have to extract it to an application layer, then use a standard XML parser or Document Object Model (DOM)—a programming object for handling XML documents—to work with the data. The SQL Server 2005 XML data type removes this limitation because it's implemented as a first-class native data type. The new data type lets the SQL Server engine understand XML data in the same way that it understands integer or string data. The XML data type lets you create tables that store only XML or store both XML and relational data. This flexibility lets you make the best use of the relational model for structured data and enhance that data with XML's semi-structured data. In our articles example, you'd retain the relational model for storing articles, then add an XML column to that data to store information that's not required or could change, such as subtitles.
To help you get the most out of this combination of semi-structured and relational data, the native SQL Server 2005 XML data type supports several built-in methods that let you query and modify the XML data. These methods accept XQuery, an emerging World Wide Web Consortium (W3C) standard language, and include the navigational language XPath 2.0 along with a language for modifying XML data. You can combine query calls to the XML data type methods with standard T-SQL to create queries that return both relational and XML data.
For more structure or validation of XML data, SQL Server lets you associate schema with a particular XML column. If an XML schema is associated with an XML column, the schema validates the XML data as it's inserted into the field. But SQL Server 2005 supports many schemas grouped together in a schema collection, which lets you apply different schemas to an XML column. The server will validate all incoming XML against all the schemas. If the XML is valid for any of the collection's schemas, it can be stored in the XML field.
To improve performance with XML, SQL Server 2005 lets you create indexes on XML data. These indexes work the same way standard SQL Server indexes do and can significantly increase your system's performance when you're working with XML data.
The native XML data type in SQL Server 2005 lets you better model your real-world data structures. In life, nothing is certain; but by combining XML with relational data, you can now account for this inherent uncertainty, which will help your systems better react to changes and give them a longer life. Next month, I'll look at how native Web Service support in SQL Server 2005, combined with XML features, further closes the gap between the database and the applications that use it.
Editor's Note:: If you're a developer working with SQL Server 2005, consider entering the Microsoft Connected Systems Development Competition, which is going on through the end of July. For more information, visit http://msdn.microsoft.com/devcompetition.