Use these 4 tips to create machine-independent SSIS packages
Picture this: You've just finished building a package in SQL Server Integration Services (SSIS). You've tested it with various inputs, and everything seems to work fine. You have error handling just right, you share packages that isolate common logic as subpackages, you've got event handlers that notify you of problems in the package or that handle error output from the dataflow task. Your dataflow task is screaming fast and everything is running without a hitch in your development environment. Then, you move your package to the production server and everything breaks. You're getting errors everywhere and nothing seems to work. Sound familiar?
SSIS packages are tightly bound to the environment in which they run. They reference folders and files on certain drives, connect to specific servers, listen for specific events, and perform other environment-bound functions. Although creating a simple package is fairly easy, it can be a challenge to write the package in such a way that when you to deploy it to another machine, it will continue to execute correctly. This is one of the most common challenges SSIS users must face. Although SSIS provides some tools that address these issues, it's not always obvious which approach is appropriate or how to otherwise apply these tools.
Let's look at how to use SSIS configurations and property expressions to address the location-dependent package problem. I'll present a general approach you can apply to simplify package deployment and an approach to the most common package-portability problems. By applying these concepts and practices to the situations that you encounter in your environment, you can reduce the incidence of packages that fail when deployed.
Problems with moving a package arise when the package references resources (such as folders and files) in locations that are specific to a given machine. The problem is that when you deploy the package to another machine, the hard-coded references to resources may no longer be valid. For example, on one machine, you might have a folder on the D drive where you place incoming flat files to be processed. When you move the package to another machine, that machine might not have a D drive, so you'd need to use a different location, say the Z drive, for those flat files. Sometimes there are simple solutions to these kinds of problems. For this problem, you might use the system "subst" command to create a new drive letter. But that kind of solution can be confusing and difficult to manage long-term.
These hard-coded references are the most common cause of the location-dependence problem. Even if you never move a package to another machine, you could still have the same problems because the environment changes around the package. A server is renamed, hard drives fail, shares change, and users leave the company. You need a way to easily update all your packages to reflect those changes. If references to such changing resources are hard-coded in the package and any of the resources changes, you'll need to update each package that references those resources with the new settings. If you have more than a few packages with such hard-coded settings, this management can become a nightmare.
What you need is a way to isolate the package from the environment in which it functions, or "keep it in the dark" so that it has no hard-coded references to resources. You also need a simple way to configure the package so that it will continue to function in a shifting environment. It would be even better if you could provide parameters through configurations that cause the package to adjust itself. I call these self-reconciling or self-healing packages. Such a package takes certain base settings as input and builds its own valid references to resources. Though the subject of package configuration and deployment is quite broad and could touch on many different solutions, this article provides a pattern you can use to isolate your packages from the environment while providing a simple mechanism for adjusting to environmental changes.
Package configurations are a way to modify properties of objects in a package at load time; they come in several flavors, including SQL, XML, Registry, Environment Variable, and Parent Package. You can access the Configuration Wizard for generating configurations in the Package Designer by right-clicking the Control-Flow Designer surface and selecting the Package Configurations menu item. Search for "Package Configurations" in Books Online (BOL) for more information.
Generating a configuration that changes a property on a task or connection manager is simple, and package configurations are flexible. In fact, I think that because they are so flexible, it's easy to get the idea that they're all you need to configure packages. That's rarely the case. For example, some people generate a configuration file for each package. As their packages multiply, so do their configuration files which they must modify every time the environment changes.
I suggest you use configurations in a simpler, focused, yet more powerful way. Use package configurations to describe your environment so that your packages can configure themselves. Configurations should be just part of the solution to the problem of package portability; use them to make packages location-independent.
Machine-Resident Configurations: Stage One
The way to describe a machine is to create a configuration that reflects the machine's environment. I call these configurations machine resident because they don't move with the package. When you build packages, you add a reference to the machine-resident configuration on the development machine. You also add certain standard variables to contain the values for machine-specific settings. For example, you could have a variable named User::TempDir that contains the directory where temporary files should be stored for that machine. You could have another variable named User::SQLServer that contains the name of the server to use. Another variable called User::Root Drive might point to the root drive where you store all your packages.
Each machine should have a resident configuration in the same location by the same name that configures these variables. This way, when you move a new package to a different machine, the package will automatically pick up the machine-resident configuration for that machine and configure these common variables.
Property Expressions: Stage Two
Property expressions are a way of automatically computing a property value from an expression. The expression can include one or more variables, so you can use the value of variables to influence the expression result and thereby influence the property value. For example, to generate the subject line of an email message during package execution, you could create a property expression and associate it with the Subject property of the Send Mail task like this:
successfully on " + (DT_WSTR, 20)GETDATE()
When the Send Mail task sends the email message,
the subject line will look like this:
The package :MyTestPackage completed successfully
on July 13, 2005 05:17:32.
This technique gets interesting when you use property expressions together with configured variables. For example, the Dataflow task has a property called BufferTempStoragePath. This property tells the Dataflow task where to spool buffers of data when it runs out of memory. It's a good practice to specify a location on a dedicated, high-performance disk drive. Developer machines are unlikely to have a separate disk for this purpose, so the temporary path will likely point to the C drive. However, on the production machine, you might have such a drive available and want to have this property point to a folder there. If you have a machine-resident configuration that configures the User::TempDir and User::RootDrive variables, you can use the following property expression for the BufferTempStoragePath property to automatically point the Dataflow task to the correct location to spool buffers. The expression
"\\" + "BufferTempStorage"
might evaluate to C:\Temp\Buffer-TempStorage on one machine, but when the package is moved to another machine with a different configuration, it might evaluate to K:\ PackageTempStorage\BufferTempStorage. Using property expressions in this way finishes the configuration of the package and is the second stage of the two-stage configuration.
Let's look at the folder-reference problem in a little more detail. Say you have a package that runs nightly. It inserts the data from each flat file that an automated process drops to a folder and, when the package has processed each flat file, it moves the file to another location. Even in this simple scenario, if you were to move the package, several things can go wrong. Most of them have to do with the hard-coded references to folders on a hard drive.
You have several requirements for this project. First, you want your package to deploy seamlessly from development, test, quality-assurance (QA), and production machines. Second, you have a policy that only signed packages may be moved to the production server. You can't change the package after it's signed. Modifying it requires new regression tests to double-check that nothing has changed; then you must get QA to sign off on it again and resign the package. And third, many packages use the same folder for similar functions and you want it to be easy to modify the location of the dropped files without a management headache. In other words, you'd like to be able to change the file location in one place and have all the packages automatically reflect that change.
How to Do It
First, on the Advanced tab in the Properties Control panel of My Computer, click Environment Variables and create an environment variable called RESIDENTCON-FIGURATION that contains the fully qualified name of your resident configuration. Specifying the location of the XML configuration in an environment variable is like setting up configurations. Hard-coding the location of the configuration file brings the same problems as hard-coding the location of data files or servers. Using an environment variable to point to the resident configuration, however, allows even the resident configuration location to vary from machine to machine. Using an environment variable to reference the location of a configuration in this way is called indirect configuration.
To verify that the environment variable is correctly set, use the set command on the command line to ensure that the correct fully qualified configuration file name has been set. It should look something like this:
On each machine where you build, test, or otherwise deploy packages, you create this environment variable. Every package you create should reference this environment variable with an indirect XML configuration. If the resident configuration location varies on any machine where packages are deployed, that's OK because the package looks in the environment variable to "indirectly" find the location of the resident configuration.
Next, right-click the Control Flow Designer surface and select Package Configurations from the context menu to bring up the Package Configurations Organizer dialog box that Figure 1 shows. After ensuring that the Enable package configurations checkbox is selected, click Add to bring up the Package Configuration Wizard that Figure 2 shows. (I'm assuming you've already created an XML configuration file that configures your variables and that the XML configuration is in the location to which the RESIDENTCONFIGURATION environment variable points.)
In Figure 2, notice that I've selected an XML configuration file configuration type. I've also selected the Environment Variable option, which tells the package that while loading configurations, it should go find that environment variable, retrieve the location for the XML configuration, and configure itself from the XML configuration it finds there. In this way, you meet the requirement that the package never change once in production. If the location of the configuration file was directly set in the package, and if the location of the configuration file ever changed for some reason, I'd need to edit the package to reflect its new location and re-sign it. This way, the location of the package configuration may change, but the package is insulated from that detail. This is one more way of keeping the package in the dark.
In the resident package configuration, I have one configuration. It configures the "User::FlatFileDropDir" variable in my package to contain the drop folder location, which is Z:\FFDROP. That's configuration step 1. Next, I have a multi–flat file connection manager that points to the files that need to be processed in the Data Flow task. To avoid one-stage configuration, I'll use a property expression to build the location from the configured variable. The property expression looks like this:
@User::FlatFileDropDir + "\\" + "*.txt"
This expression will evaluate to Z:\FFDROP\*.txt. Now, as long as this package is moved to a machine with the resident configuration, it will successfully find the dropped flat files.
To ensure packages consistently conform to the appropriate policy, build a template package and add all the commonly used variables and the configurations that configure those variables as described. Give the variables a distinctive namespace such as RESIDENT or CONFIGURED. Also, name the configurations with a convention so that those who are unfamiliar with the template can easily identify which configurations modify which variables. Then, every time you need to build a new package, use the template package as a starting point. This will help you ensure that all your packages are consistently location-independent.
Other Causes of Location Dependence
I've covered perhaps the most common and problematic causes of non-portable packages: hard-coded references. But you should be aware of several others as well. Here's a summary of the more common causes of non-portable packages, with suggestions for eliminating those causes.
- Hard-coded references to files, servers, and other resources: Eliminate such references as I explained.
- Incorrect package-protection level: For production, use server storage if possible.
- Directly referencing configuration files: Use indirect or SQL Server configurations.
- Using standard security: Use integrated security if possible.
- References to unavailable subpackages: Store packages on SQL Server.
- Using tasks or other components not installed on a machine: All components must be installed on the machine where the package that references them will run.
- Directly referencing parent package variables from sub-packages: Use Parent Package configurations if possible.
- Using a different user account to test the package than the one you use to execute it in production: When testing, use an account with identical permissions as the one you use in production. In production, use the Microsoft Agent SSIS subsystem to execute the package, using a proxy with the identical account. (For more information, see my August 2005 article "Security in SSIS," InstantDoc ID 46723.)
Creating easily deployable packages isn't difficult, but it does require some effort, forethought, and discipline. Every environment is different, so there's no way to articulate one overarching package-creation policy or strategy that will work for everyone. However, what I've explained here is a general approach you can use for building packages that automatically adjust to their environment. When you build packages without hard-coded references to resources, you keep them in the dark about where they're running, where resources and data are stored, where to find files, and even where configuration files are stored. For keeping packages location-independent, the less the package knows, the better.