Using Service Broker to move data asynchronously from one application to another, I've become comfortable with parsing and loading XML data into SQL Server tables. If the number of rows is relatively small, Xquery is fast and effective. In my experience, however, when the number of rows grows, so does the amount of time it takes to parse the XML, and so another solution is in order.
To understand the problem, let's look at a table called Products, which you're going to load in T-SQL using Xquery. Here's the DDL schema for the table.
[ProductID] [int] NOT NULL,
[ProductName] [nvarchar](40) NOT NULL,
[SupplierID] [int] NULL,
[CategoryID] [int] NULL,
[QuantityPerUnit] [nvarchar](20) NULL,
[UnitPrice] [money] NULL,
[UnitsInStock] [smallint] NULL,
[UnitsOnOrder] [smallint] NULL,
[ReorderLevel] [smallint] NULL,
[Discontinued] [bit] NOT NULL,
CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED
The source XML file matches the table, with the column data defined as elements, and looks like this.
Now, using T-SQL, you need to define an XML variable, load it from the source file, then use Xquery to parse and load the file contents into the Products table. Here's the T-SQL code.
SELECT @messagebody = BulkColumn
FROM OPENROWSET(BULK 'C:\Work\Products1000.XML', SINGLE_CLOB) AS X
INSERT INTO [dbo].[Products]
select a.value(N'(./ProductID)', N'int') as [ProductID],
a.value(N'(./ProductName)', N'nvarchar(40)') as [ProductName],
a.value(N'(./SupplierID)', N'int') as [SupplierID],
a.value(N'(./CategoryID)', N'int') as [CategoryID],
a.value(N'(./QuantityPerUnit)', N'nvarchar(20)') as [QuantityPerUnit],
a.value(N'(./UnitPrice)', N'money') as [UnitPrice],
a.value(N'(./UnitsInStock)', N'smallint') as [UnitsInStock],
a.value(N'(./UnitsOnOrder)', N'smallint') as [UnitsOnOrder],
a.value(N'(./ReorderLevel)', N'smallint') as [ReorderLevel],
a.value(N'(./Discontinued)', N'bit') as [Discontinued]
from @messagebody.nodes('/Products/row') as r(a);
Using SET STATISTICS TIME ON, you see that if the XML data has 1,000 rows, these are the times, first to load the file into the XML variable, then to parse and load the XML data into the Products table.
CPU time = 0 ms, elapsed time = 11 ms.
SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 171 ms.
Not too bad, but it's only 1,000 rows. For comparison, let's look at the same process in PowerShell.
To begin, you need an XML schema file, to define the data types for each of the columns. If you don't define the schema, the columns will all be defined as String values, and you won't easily be able to load the data into the table. Here're the contents of the XML schema file for the Products.xml file.
<xs:schema id="Products" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="Products" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="ProductID" type="xs:int" minOccurs="0" />
<xs:element name="ProductName" type="xs:string" minOccurs="0" />
<xs:element name="SupplierID" type="xs:int" minOccurs="0" />
<xs:element name="CategoryID" type="xs:int" minOccurs="0" />
<xs:element name="QuantityPerUnit" type="xs:string" minOccurs="0" />
<xs:element name="UnitPrice" type="xs:decimal" minOccurs="0" />
<xs:element name="UnitsInStock" type="xs:short" minOccurs="0" />
<xs:element name="UnitsOnOrder" type="xs:short" minOccurs="0" />
<xs:element name="ReorderLevel" type="xs:short" minOccurs="0" />
<xs:element name="Discontinued" type="xs:boolean" minOccurs="0" />
Now, this may look complicated, but you can let PowerShell do most of the work to generate this schema file. Start by connecting to SQL Server and defining a query against the Products table that returns no results.
$cmd = New-Object System.Data.SqlClient.SqlCommand
$cmd.CommandText = "SELECT * FROM dbo.Products WHERE 1 = 2"
$cmd.Connection = $cn
$da = New-Object System.Data.SqlClient.SqlDataAdapter
$da.SelectCommand = $cmd
Then, fill the DataSet object using the query, and use the WriteXmlSchema() method to write out the file.
$da.Fill($ds, "Products") | Out-Null
You'll have to modify the xsd file to match the XML structure in your data file. The generated xsd file has a schema ID and top level element value of "NewDataSet," and that needs to be changed in this case to "Products." Then, change the element name of "Products" to a name of "row" for the generated schema to work. The column definitions will all work for you.
Use the same method to bulk load the data into the table used in the article "Bulk Copy Data into SQL Server with PowerShell." To begin, you need to have the data in an ADO.NET DataTable—this works in your favor, as the DataSet object, which contains the DataTable, has two useful methods called ReadXML() and ReadXMLSchema().
Next, you need to create a DataSet object and use the ReadXMLSchema() method to load the schema into the DataSet.
Then, you can use the ReadXML() method to load the XML data into the DataSet. Note that because you've already loaded the XML schema, the process ReadXML() uses to determine the schema is ignored. You'll then load the lone table in the DataSet into the $dtProd variable.
$dtProd = $ds.Tables
Now, you have a DataTable with our Product data and you can use the SqlBulkCopy method to quickly load the data into the table.
$bc = new-object ("System.Data.SqlClient.SqlBulkCopy") $cn
$bc.DestinationTableName = "dbo.Products"
Now, you need to measure the time it took to load the data. PowerShell provides a way to do this using the Measure-Command cmdlet. To simplify the process, I put the preceding PowerShell code into a script called Load-XmlData.ps1, and I executed the following command.
This returned some interesting results.
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 62
Ticks : 628365
TotalDays : 7.27274305555556E-07
TotalHours : 1.74545833333333E-05
TotalMinutes : 0.001047275
TotalSeconds : 0.0628365
TotalMilliseconds : 62.8365
This shows that the total time for loading the 1,000 row XML file using T-SQL was 182ms (11ms to load the file, 171ms to parse and load the table), where the total time for running the script to load the XML file to the DataSet, then bulk copy the data into SQL Server was 63ms, roughly one third of the time, to do the same thing.
If you bump up the size of the XML file from 1,000 rows to 10,000 rows, the T-SQL timing returns these results.
CPU time = 593 ms, elapsed time = 775 ms.
SQL Server Execution Times:
CPU time = 1607 ms, elapsed time = 1728 ms.
Running the same data through the PowerShell script using Measure-Command returns these times.
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 742
Ticks : 7429740
TotalDays : 8.59923611111111E-06
TotalHours : 0.000206381666666667
TotalMinutes : 0.0123829
TotalSeconds : 0.742974
TotalMilliseconds : 742.974
So, the T-SQL solution ran in 2503ms total, where the PowerShell solution ran in 743ms, more than three-and-a-half times faster.
As you can see, PowerShell is more appropriate for loading large data sets into SQL Server from XML than T-SQL.