Simple API for XML (SAX)

Microsoft has released the latest version of its XML parser, Microsoft XML Engine (MSXML) 3.0, and it's chock full of capabilities. MSXML's Document Object Model (DOM) implementation is solid, and the release contains Extensible Style Language Transformations (XSLT) and XPath that stand up to the latest World Wide Web Consortium (W3C) standards. You can use MSXML on the client side or the server side, in COM objects or in scripting. Microsoft has included some new things in the package, and one in particular caught my eye: Simple API for XML (SAX). (The SAX specification is currently version 2.0 and is often referred to as SAX2.)

SAX lets you access the information in an XML document, but it's very different from DOM. When you use DOM, you instantiate a DOM object, load an XML document, and access elements and attributes as needed from the data tree. You can transform the document, add to it, and output it in any style you choose. SAX is event driven. The full XML document doesn't load at the start. Instead, it loads section by section, serially. An event executes at each stage as the section processes. For example, consider the following XML code:

<?xml version="1.0" ?>
        <company>
   <name>Interknowlogy</name>
</company>

The parser steps through and produces these events:

startDocument
startElement: company
startElement: name
characters: Interknowlogy
endElement: name
endElement: company
endDocument

The basic idea is that you can create content handlers and attach them to the events. MSXML provides SAX objects for Visual Basic (VB) and Visual C++ (VC++). SAXXMLReader is the parser object. You create a content handler to implement the events you need and then attach the content handler to receive parsing events from SAXXMLReader. You can also create an error handler to receive error events. MSXMLWriter, which is the producer object, can create another XML document. Once bound as the contentHandler, it can capture selected events from the reader and output the new data tree.

Why would you use SAX when you can use DOM? Resources. Processing a 10MB XML document is very resource intensive with DOM because the entire file is loaded into memory. SAX, however, is good for processing very large files. SAX is also faster than DOM because it has less overhead and you can stop parsing at any time. It's well suited for creating a new document tree because with it, you don't have to parse out the document just to build another document. You can also use SAX to extract a content summary.

However, SAX is limited because it works serially only. It doesn't allow for random access to the document content, which means that you must save off data you need for further processing. If the task is complex, use DOM. Also, SAX runs on the server side only; it doesn't currently include native client support.

The MSXML 3.0 software development kit (SDK), which you can download from the Microsoft Web site, presents examples of how to do all these things I've talked about in VB and VC++. For more information about SAX and to download a SAX 2.0 Java Distribution, go to the Megginson Technologies Web site.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Mike Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
Vision road sign
May 6, 2013
blog

Cheaters Never Win, Even in TPC Benchmarks

In this portion of the series on database benchmarking, I want to tell you about one of my favorite aspects of the TPC benchmarks – CHEATING....More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.