How to work around SSIS's column set limitation
Wide tables are a feature introduced in . Their primary advantage is implied by their name: A wide table can have 30,000 columns, as opposed to the 1,024 columns you can have in a non-wide table.
A wide table is defined as a table that has sparse columns and a column set defined. Sparse columns are columns that are optimized for the case when many rows will have NULL for a value for that column. Column sets are special untyped XML columns that represent all the populated sparse columns in a row as XML. Updating the sparse column causes the column set XML to be automatically updated, and updating the column set (through XML methods or some other means) causes the sparse columns it represents to be updated accordingly.
Having 30,000 or even 1,024 columns in a table is a lot of columns, so thoughtful review should be done of any design that requires that many in one table. However, there are at least two circumstances in which that many columns can be called for:
1. Sometimes information from other systems can come in a wide format, over which you have no control. I've personally worked with comma-separated value (CSV) files that had more than 7,800 columns.
2. Data warehouses often use denormalized versions of tables to improve performance, where querying a normalized table of many rows would take unacceptably long. This is, incidentally, a situation in which sparse columns could save considerable space.
There's at least one drawback to using a column set: SQL Server Integration Services (SSIS) won't let you target sparse columns directly when your destination table is a wide table. You can target the column set and populate it with XML representing the populated sparse columns, but you can't directly target the sparse columns that compose it.
You also can't add a column set to a table that already has sparse columns but not a column set. An error will result. Thus, it isn't possible to populate your sparse columns through SSIS, then add a column set. In any case, unless you have a column set, your table isn't a wide table and therefore can only have 1,024 columns.
Let's suppose you need to target a wide table in SSIS. Since you can't target the sparse columns, your only option is to populate the column set XML with an XML fragment representing the populated columns. How can you do that? An SSIS Script component will let you build an XML fragment, at the cost of some light programming. I'll walk you through a simple example, which involves the following steps:
1. Create a wide table.
2. Add a Script component.
3. Add a column set.
4. Map the column set.
5. Create an XML mapping document.
6. Make the XML mapping document an assembly resource.
7. Modify the source code.
Step 1: Create a Wide Table
You first need to create the wide table. Listing 1 shows the CREATE TABLE statement you can use to create the wide table I'll be discussing here. This very small wide table has only six columns, including the ID column and the column set. Figure 1 shows the source CSV file that contains the data for the wide table.
Step 2: Add a Script Component
The next step is to create an SSIS package and add a Script component to it. I'll assume that you already know how to open SQL ServerDevelopment Studio (BIDS), create a new SSIS package, create a Data Flow task, and set up source and destination connections on the Data Flow tab.
After you create your SSIS package, set up a Flat File connection manager to connect to the CSV file. Afterward, open up the Flat File connection manager and select the Advanced option. Check that your input columns' data types are appropriate for each column and that the OutputColumnWidth values are large enough to accommodate the data. You'll probably find that the columns are all being imported as text. In that case, you might want to edit them manually or use the Suggest Types option to have the package analyze the data and set data types and column widths. If you use Suggest Types, be sure afterward to set the data type for the ID column to four-byte signed integer [DT_I4]. Otherwise, you might get error messages about incompatible data types later on.