New features in Microsoft’s Business Intelligence Development Studio simplify modifying packages
In SQL Server Integration Services (SSIS), the Business Intelligence Development Studio (aka the designer) is perhaps the most visible new addition, and because the designer is how you'll generally interact with SSIS, it's also one of the most important. In "Making Package Magic," October 2004, InstantDoc ID 43805, I introduced the new Integration Services Import/Export Wizard and showed you how to use it to create a simple data-import package. This time around, I show you how to use the SSIS designer to load and modify the packages you create with the wizard. But first, let's look at some of the new features that make creating and editing packages simpler.
SQL Server 2000 data transformation services (DTS) allows only three options for precedence constraints: Success, Failure, and Completion. In addition, all the precedence constraints for the same task use an AND operator, which means that all precedent constraints for a common task must have their precedence conditions met before the task will execute. (For a brief explanation of how precedence constraints work in DTS, see the Microsoft article "DTS Package Workflow" at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dtssql/dts_elemwkflow_659z.asp.)
But SSIS provides a logical OR operator that lets the target container or task execute if any of the precedence constraint conditions are satisfied. This OR operator makes it easier to do things such as execute a task if one of several tasks fails. Figure 1 shows the Precedence Constraint Editor, in which you can choose constraint options and select whether SSIS uses an AND or an OR operator to process a package's constraints. You open the Precedence Constraint Editor by double-clicking a precedence constraint.
Figure 2 shows the workflow of tasks in a typical package. In the figure, you'll notice a Send Mail Task (SMTsk), which I've configured to send mail to me if either the SQL Task (SQLTsk) or the Data Flow Task (DFTsk) fails. The dotted lines in the figure indicate an OR precedence constraint, and the solid line indicates an AND relationship. In Figure 2's package, if either SQLTsk or DFTsk fails, SMTsk will execute. However, DFTsk will execute only if SQLTsk succeeds.
The Precedence Constraint Editor is also where you specify an expression. SSIS lets you use Boolean expressions to determine whether precedence constraints fire. You can also use expressions to see the execution result of the previous task or container. In Figure 2, you can see how the designer shows the expression @ResultCount>0 on the designer surface. In the Evaluation operation field, which controls how constraint and expression relate, you can select from four options: Constraint, Expression, Constraint and Expression, and Constraint or Expression. If you select Constraint, SSIS considers only the constraint to determine whether the next dependent container or task should execute. With the Expression option, Integration Services evaluates only the expression. With the Constraint and Expression option, both must evaluate to true for the precedence constraint to fire. And when you select Constraint or Expression, if either evaluates to true, the precedence constraint will fire. In Figure 2, the Data Flow Task will execute only if the SQL Task succeeds and the variable ResultCount is greater than 0. If either of those two tasks fails, the Send Mail Task will execute.
In Figure 2, notice the yellow indicator in the Send Mail Task. That icon means that a warning occurred during validation. Warnings tell the package writer that something might be wrong with the package or task, but that the problem probably won't cause the package to fail. In this particular case, I have an incorrectly formatted email address in the task. When I correct the email address, the warning indicator will go away. The red dot on the SQL Task indicates that there's a breakpoint on that task. Breakpoints are part of the suite of debugging features in SSIS 2005 that make it possible to perform real debugging of packages.
One of the first things you should do when you start the Business Intelligence Development Studio is right-click everything. Just about everywhere, you'll find context menus giving you quick and convenient ways to access options. Most of the menu items are self explanatory, but Figure 3 shows a few interesting context-menu options. In particular, notice the new Disable option, which lets you disable one task, a whole container, or multiple tasks simultaneously by selecting the tasks you want to disable and right-clicking one of them. Disabling means that when you execute the package, the disabled task or container won't execute. The default behavior for disabled tasks and containers is to return as though they succeeded. This feature is available for custom tasks without any special coding and is helpful when you're diagnosing troublesome packages. You can also use the Disable option with the ForceExecutionResult property (which you find in the property grid for all containers and tasks) to simulate certain conditions or isolate parts of a package for specific purposes, such as forcing sequences or execution paths. For example, you can force a task to simulate failure by setting the ForceExecutionResult property to Failed in the property grid and disabling the task. Sometimes, while you're building a package, a task isn't yet functional because you don't have all the inputs or information you need. In such a situation, you can disable the task until the needed inputs become available.
Source-control integrated support is another new option. Figure 4 shows the Business Intelligence Development Studio's Solution Explorer. Notice the icons next to the files, projects, and solution. The icons are visual indicators of each file's source-control status. Right-clicking any file brings up a context menu that lets you check out, check in, compare, and synchronize files—if you have Visual SourceSafe installed.
Creating the Solution
Some users never do anything more with the Import/Export Wizard than enter settings and run the results. But, as I mentioned in "Making Package Magic," the wizard can also generate packages, so it's a good starting point for building more complex packages for chores such as cleaning data, moving data from one schema to another, or getting your files in the right place for a load. Knowing how to modify existing packages is a valuable skill because it saves you from having to start from scratch every time you need a slightly different package than the one before. To start, let's to load a package into the designer. Then, we'll open the package and see how the settings from the wizard flow into the package. Finally, we'll modify the package to perform a slightly different function than its original function, which is a common approach many people take when building new packages.
You can download the complete sample package from "Making Package Magic" at InstantDoc ID 45092. To open the package, launch the Business Intelligence Development Studio from the SQL Server 2005 Start menu. The designer is a project-driven environment in which you create solutions, which provide a framework for managing all your projects, packages, and other files. Solutions are also the most common way to manage your Reporting Services and Analysis Services projects. Although you can edit packages without creating a solution, we'll want to create a new solution for this example.
To create a new solution, select New, Project from the File menu to bring up the New Project dialog box. To put your project into the solution, type a name and location for the project and click OK. Later, if you want, you can create another project and add it to the solution by right-clicking the solution and selecting Add, New Project.
Now we've created a new solution containing a new package, which will be in the Solution Explorer. Because we want to use a wizard-generated package, we can simply delete the new package that the designer creates. To load the package, right-click the SSIS Packages node in the solution, and select the Add Existing Package menu item. The resulting dialog box lets you load packages from SQL Server, the SSIS package store, or the file system.
Modifying the Package
In "Making Package Magic," we exported some tables from the AdventureWorks sample database and used the wizard to load the flat file. The wizard created the destination database and also did some work in the package to create a table to load into. We also set up source and destination connections and data-flow adapters to move the data from the flat file to the new table.
Figure 5 shows the work-flow portion of the package the wizard created, which includes a SQL Task and a Data Flow Task. The wizard uses the SQL Task to generate the table. The Data Flow Task is what actually loads the data from the flat file to the table. Figure 6 shows the data-flow portion of the wizard-generated package. The Data Flow Task contains a flat-file source adapter and an OLE DB destination adapter. This simple Data Flow Task is about as far as the wizard can take us in terms of starting a package.
But suppose we wanted to filter the customers in some way. For example, say the AdventureWorks people have an overstock of bicycle fenders and they want to target customers in rainy states such as Oregon and Washington. We need a way to filter customers who live in ZIP codes ranging from 97000 to 99000. We first need to delete the data flow between the Source and Destination adapters. Next, add a Data Conversion transform to convert the ZIP code from a string representation to an integer. Then, we can add a conditional split to separate the rainy states from the rest. The Conditional Split transform is the one that actually filters the rainy-state residents from the rest of the AdventureWorks customers. Then, in the Conditional Split transform, enter as the condition the following expression:
\[PostalCode as Integer\] > 97000 && \[PostalCode as Integer\] < 99000
This expression states that for all rows that have a PostalCode as Integer column value greater than 97,000 and less than 99,000, the Conditional Split transform will send those rows down the Rainy States output. The rows with customers who live outside the specified ZIP code range will get sent down the Default output. Figure 7 shows the resulting data flow.
You'll notice in Figure 7 that I've added a Row Count transform. A Row Count transform is an easy way to terminate data flows when you're not interested in the results. I've also added two data viewers, which appear as embedded screens in the figure. Data viewers let you view data as it flows from one transform to the next so that you can instantly see how the data is changing. Using the Data Conversion transform, I converted the PostalCode column from a unicode text column to an unsigned 4-byte integer. The problem with this conversion is that not all postal codes are convertible to integers—Canadian postal codes, for example, contain letters. However, for this simple example, the conversion works because I'm interested in the ZIP codes for only Oregon and Washington. The transform will raise an error for rows that contain postal codes with letters in them. To get around those errors, I clicked Configure Error Output and directed the transform to ignore errors and truncations—which is OK because I know my data better than the transform, and I know that I'm not interested in the rows that will cause errors.
Running the Package
When you execute the package, you get something like the view that Figure 7 shows. The data viewers pop up, and you can see all the data flowing from the Data Conversion transform to the Conditional Split transform and from the Conditional Split transform to the destination. Notice that the Data Conversion transform converted postal codes with letters into NULLs. You can also see from the row counts that AdventureWorks has 3146 customers living in Oregon and Washington. We could have made the selection of rainy states simpler by filtering on the StateProvinceID column for IDs 58 and 79, but the results wouldn't allow us the flexibility of further filtering—say, if we later decide to filter to only certain parts of Oregon and Washington or to filter urban versus suburban locations.
A Package of a Different Color
So, there you have it. We used the SSIS designer to load a wizard-created package and modify it to do something better tuned to our needs. You can use this pattern to save yourself some package-creation time, at least until you're more familiar with the designer and how to create your own package from scratch. Get to know the designer—I've only skimmed the surface of its functionality. And as always, please send your feedback to the SQL Server Business Intelligence team at firstname.lastname@example.org, or post your comments to my blog at http://sqljunkies.com/weblog/knight_reign. We welcome your suggestions for improvements and your success stories.