Downloads
23357.zip

Getting Around an XML Bulk Load Bug That Truncates Data Without Notification

If the XML data's length exceeds the database table's field size when I'm running XML Bulk Load, the program truncates the data but doesn't return an error message. How can I make XML Bulk Load return an error message in this situation?

Microsoft confirms the existence of an XML Bulk Load bug that truncates data without returning an error message. Even if your mapping schema declares the element or attribute by using the XML Schema Definition (XSD) maxLength facet, XML Bulk Load doesn't detect or return an error. (For an explanation of XSD mapping schemas, see "Converting XDR Schemas to XSD," October 2001, InstantDoc ID 22325.) To avoid the bug, combine the Simple API for XML version 2 (SAX2) validation with XML Bulk Load.

Microsoft XML Core Services (MSXML) 4.0, formerly known as Microsoft XML Parser, has been released to manufacturing (RTM). You can download the RTM version at http://msdn.microsoft.com/downloads/ by clicking Web Development, XML, MSXML 4.0 RTM. MSXML 4.0 updates Microsoft's core XML technology with support for several new World Wide Web Consortium (W3C) XML standards, including XSD (http://www.w3c.org/XML/Schema#dev). XSDs let you specify more constraints on the elements and attributes you use in an XML document than previous schema languages, such as XML Data Reduced (XDR) and Document Type Definitions (DTDs), allowed. For example, XSDs support data-type facets, which let you specify the minimum and maximum length of the value of an element or attribute. MSXML 4.0 also lets you use XSD to validate a document.

Validation verifies that an XML document matches the set of constraints defined within XSD, XDR, or DTD. An XSD validating parser uses XSD's expanded set of constraints to ensure that an XML document not only meets structural requirements (aka "well-formedness") but also complies with constraints defined within XSD. The constraints might include data types, ordering, and cardinality, for example. Constraints work in a way similar to how a relational schema declares a column's data type to ensure that the query processor and storage engine store the appropriate data type. By combining XSD validation with SAX2 interfaces, you can verify XML data before executing XML Bulk Load.

For example, as Listing 1 shows, the XSD mapping schema maps a simple XML document containing author elements that have first name and last name attributes to a database table named OverflowAuthors. Listing 2, page 44, shows the code that creates the database table, then defines the fields as strings of five characters each. To load the XML data into your database, you need to use the XML Bulk Load Object Model (XBLOM—see "Converting XDR Schemas to XSD" for an example of how to use XBLOM). After you use MSXML 4.0's SAX2 interface to validate the XML data, load the input data into the OverflowAuthors database.

Let's use Visual Basic (VB) to invoke MSXML 4.0 and implement a SAX2 error handler. Listing 3, page 44, shows the VB code, which parses the XML document by using XSD to validate the contents, checking that all data strings within the document are of the proper length. Then, XML Bulk Load loads the data into the database table specified in the mapping schema. First, the VB code in Listing 3 creates a SAXXMLReader object, a SAX2 reader (which calls the event handlers), and an XMLSchemaCache object (which contains schemas and their associated namespaces). Next, the code in Listing 3 creates an instance of the custom VB class MyErrorHandler that implements the callback functions that the SAX2 parser invokes when a validation error occurs.

Then, after creating the MyErrorHandler class, which Listing 4 shows, the VB code creates the XML Bulk Load object and configures the connection settings to access the database. The code calls the Add method to include XSD, which Listing 1 shows, in the schema collection. The first parameter is the Uniform Resource Identifier (URI) for the namespace you want to associate the schema with. The Add method reads the schema and builds in-memory data structures that MSXML uses when it validates an XML document.

Next, the VB code configures MSXML so that it uses the schema cache to perform validation. The MyErrorHandler class is set as the error handler for the SAXXMLReader, which enables MSXML to call the class's methods when a validation error occurs. MyErrorHandler implements the IVBSAXErrorHandler interface—a VB version of the ISAXErrorHandler interface from the SAX2 specification—and a read-only Boolean property named validationError. The IVBSAXErrorHandler interface methods set the validationError property when MSXML finds an error while parsing the XML document. MSXML validates the XML document and calls the error handler when it finds a violation of the constraints in XSD. After successfully parsing the document without error, the code calls XML Bulk Load by using the XBLOM Execute method to load the data from the XML document into the OverflowAuthors table in the database. In practice, you might want to enhance the methods of MyErrorHandler to return more detailed error messages.