Using an SSIS Script Component to Import Data to a Wide Table

How to work around SSIS's column set limitation

What is in this article?:

  • Using an SSIS Script Component to Import Data to a Wide Table
Downloads
142170.zip

Wide tables are a feature introduced in SQL Server 2008. Their primary advantage is implied by their name: A wide table can have 30,000 columns, as opposed to the 1,024 columns you can have in a non-wide table.

A wide table is defined as a table that has sparse columns and a column set defined. Sparse columns are columns that are optimized for the case when many rows will have NULL for a value for that column. Column sets are special untyped XML columns that represent all the populated sparse columns in a row as XML. Updating the sparse column causes the column set XML to be automatically updated, and updating the column set (through XML methods or some other means) causes the sparse columns it represents to be updated accordingly.

Having 30,000 or even 1,024 columns in a table is a lot of columns, so thoughtful review should be done of any design that requires that many in one table. However, there are at least two circumstances in which that many columns can be called for:

1. Sometimes information from other systems can come in a wide format, over which you have no control. I've personally worked with comma-separated value (CSV) files that had more than 7,800 columns.

2. Data warehouses often use denormalized versions of tables to improve performance, where querying a normalized table of many rows would take unacceptably long. This is, incidentally, a situation in which sparse columns could save considerable space.

There's at least one drawback to using a column set: SQL Server Integration Services (SSIS) won't let you target sparse columns directly when your destination table is a wide table. You can target the column set and populate it with XML representing the populated sparse columns, but you can't directly target the sparse columns that compose it.

You also can't add a column set to a table that already has sparse columns but not a column set. An error will result. Thus, it isn't possible to populate your sparse columns through SSIS, then add a column set. In any case, unless you have a column set, your table isn't a wide table and therefore can only have 1,024 columns.

Let's suppose you need to target a wide table in SSIS. Since you can't target the sparse columns, your only option is to populate the column set XML with an XML fragment representing the populated columns. How can you do that? An SSIS Script component will let you build an XML fragment, at the cost of some light programming. I'll walk you through a simple example, which involves the following steps:

1. Create a wide table.

2. Add a Script component.

3. Add a column set.

4. Map the column set.

5. Create an XML mapping document.

6. Make the XML mapping document an assembly resource.

7. Modify the source code.

Step 1: Create a Wide Table

You first need to create the wide table. Listing 1 shows the CREATE TABLE statement you can use to create the wide table I'll be discussing here. This very small wide table has only six columns, including the ID column and the column set. Figure 1 shows the source CSV file that contains the data for the wide table.

Figure 1: Examining the source CSV file for the wide table
Figure 1: Examining the source CSV file for the wide table 

Step 2: Add a Script Component

The next step is to create an SSIS package and add a Script component to it. I'll assume that you already know how to open SQL Server Business Intelligence Development Studio (BIDS), create a new SSIS package, create a Data Flow task, and set up source and destination connections on the Data Flow tab.

After you create your SSIS package, set up a Flat File connection manager to connect to the CSV file. Afterward, open up the Flat File connection manager and select the Advanced option. Check that your input columns' data types are appropriate for each column and that the OutputColumnWidth values are large enough to accommodate the data. You'll probably find that the columns are all being imported as text. In that case, you might want to edit them manually or use the Suggest Types option to have the package analyze the data and set data types and column widths. If you use Suggest Types, be sure afterward to set the data type for the ID column to four-byte signed integer [DT_I4]. Otherwise, you might get error messages about incompatible data types later on.

 »

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.