I'm trying to retrieve data in XML format from SQL Server 2000 by using FOR XML AUTO and FOR XML EXPLICIT queries sent through Java Database Connectivity (JDBC) from a Java program. However, I'm getting split lines and inserted spaces. What am I doing wrong?

The behavior you report is similar to the problem of spurious spaces and data truncation that I described in June 2001. The problem, which occurred when the reader returned XML results using Query Analyzer, stemmed from the difference in the way SQL Server 2000 returns XML results and the way Query Analyzer displays those results. The XML result spanned multiple rows in the resultset, and Query Analyzer incorrectly concatenated the rows to obtain the XML result. In your scenario, the behavior is similar but the problem results from a character-encoding mismatch between SQL Server and Java. The good news is you can fix or avoid the problem altogether by properly setting the encoding in your Java program.

Listing 1 shows the source code for a Java class that you can invoke from your system's command line after you compile the code. The class's queryODBCBridge method invokes a simple FOR XML AUTO query and writes the results to a file in the current directory. First, queryODBCBridge creates an instance of the JDBC-ODBC bridge driver that accompanies Java Development Kit (JDK) 1.3. I used the JDK included with BEA Systems' WebLogic, but other vendors' JDKs should work equally well. Alternatively, you can use other net or native protocol JDBC drivers (types 3 and 4 drivers, for you Java gurus) instead of the JDBC-ODBC bridge. When you use a JDBC-ODBC bridge, you must configure an ODBC Data Source Name (DSN) to access the database. In Windows 2000, you can create a DSN through Start, Administrative Tools, Data Sources (ODBC). For this example, I created the DSN sqlnorthwind to access the Northwind database on my SQL Server 2000 installation.

Using the newly created JDBC driver instance, the code opens a connection to the SQL Server 2000 database, then uses the connection to create an instance of the Statement class. The Statement class instance executes the FOR XML AUTO query, then generates a JDBC resultset.

SQL Server 2000 returns the result of a FOR XML query in one ntext-type column. If the XML result is larger than 2033 Unicode characters, SQL Server splits the result among multiple rows. However, a properly formatted XML result requires concatenated data from each consecutive row, so the split lines and inserted data appear.

After obtaining the JDBC resultset, the code in Listing 1 opens a FileOutputStream wrapped in a BufferedWriter, which writes the XML query result to a file on the file system. The output begins with an XML declaration, followed by a root tag. You must include a root element because the query result doesn't automatically contain a root element, which all well-formed XML documents require. The while loop iterates through the resultset, fetching each row's data by obtaining a binary stream on the column value. You want to retrieve a binary stream to avoid the decoding problems that occur when you use String or CharacterStream JDBC column types. Next, the code wraps an InputStreamReader around the binary stream to properly decode the stream. Specifying the proper encoding on the InputStreamReader is crucial for obtaining a correct result. You add UnicodeLittle to signify that the stream consists of Unicode characters with little-endian byte order. You need to specify this byte order because the byte ordering of the result is different from Java's standard big-endian byte ordering and because the stream doesn't include a byte-order mark to indicate the byte order that the stream uses. SQL Server 2000 doesn't include a byte-order mark with the data because each row's value is a fragment of a larger XML document. The rest of the example copies characters from the InputStreamReader to the BufferedWriter.

To use Listing 1's code, first compile it by using a Java development tool such as Borland Software's JBuilder or Oracle 9i's JDeveloper. After you compile the code, you can invoke the JDBCSQL class by using the Java runtime, for example:

java  jdbcsql.JDBCSQL

The code creates a file called ODBCResults.xml, which contains the query's XML result.

Although the coding technique in Listing 1 solves the problem of split lines and inserted spaces, note that the code depends on one of SQL Server 2000's internal implementation details: returning XML data in a single-column, multirow result. However, Microsoft doesn't guarantee support for this method of retrieving XML results in future SQL Server releases. Before placing this or similar code into production, you should understand and accept that changes in future SQL Server releases might render this retrieval technique unworkable. For now, this example is a good way to use Java and JDBC to leverage SQL Server 2000's XML capabilities.