A Strategy for Dealing with SSIS in Clusters

How to make configuring SSIS as a cluster resource a viable option

Using Microsoft Server Clustering to run multiple instances of SQL Server is common. For example, suppose you have an online instance, an extraction, transformation, and loading (ETL) instance, and a reporting instance. You can serve these SQL Server instances as clustered virtual machines (CVMs) on four physical machines in an active/active/active/passive configuration. (See Figure 1.) The passive machine observes the status of the other three machines, ready to assume their functions if necessary. (Note that CVMs use clustering technology that has been around since SQL Server 7.0. They aren't related to virtualization platforms, such as VMware ESXi or Microsoft Hyper-V.)

Each SQL Server instance takes care of different needs, so each instance should run SQL Server Integration Services (SSIS). However, complications arise when you use SSIS in a cluster because it is machine based rather than instance based like its predecessor DTS. In other words, in a cluster, SSIS runs on a physical machine, when most everything else runs on a CVM, as Figure 1 shows.


Figure 1: Sample CMV configuration (click to enlarge)



As a result, using SSIS in a cluster presents several challenges to a solution architect:

  • In a production environment, you have limited control over which physical machine an instance is running on. An instance can move from one physical machine to another very quickly in response to an event (e.g., hardware failure).
  • In clusters, it's common for SSIS packages to be scheduled using SQL Server Agent. SQL Server Agent is (and always has been) an instance-based tool. So, in a failover, SQL Server Agent will move, but SSIS will not.
  • Development environments often have multiple SQL Server instances running on a single physical machine but only one instance of SSIS.

Because SSIS isn't a cluster-aware service and doesn't support failover, Microsoft doesn't recommend configuring SSIS as a cluster resource (although they provide instructions on doing so in the "Configuring Integration Services in a Cluster" web page. However, I've come up with a two-part strategy that addresses the challenges, making it a viable option.

The Strategy, Part One


The first part of my strategy is to store production packages in the file system rather than in msdb. Although this is a major deviation from past SQL Server 2000 practice, it's an important one. Here's why.

SSIS determines the location of msdb by looking in C:\Program Files\Microsoft SQL Server\100\DTS\Binn\MsDtsSrvr.ini.xml. When SSIS is installed (regardless if the installation is local or clustered), Microsoft places the tag

<ServerName>.</ServerName>

in this file. This tag indicates that msdb can be found in the default instance of this physical machine. (Remember, SSIS is machine based.) However, there is no default instance on the physical machine. The instance runs on the CVM.

You could change the tag on the physical machine that prodcvm01\online typically runs on to

<ServerName>prodcvm01\online</ServerName>

You could even make the same type of change on the physical machines that your ETL and reporting instances typically run on. Everything will be fine as long as the CVM/physical machine relationship remains in its typical configuration. But what if there are serious hardware problems and the CVM with the online instance ends up running on the physical machine that typically hosts the reporting instance? Further, what value do you put into the MsDtsSrvr.ini.xml file on the passive physical machine?

To avoid these issues, here's what you can do:

  1. For each environment (e.g., production, development, quality assurance—QA), select one msdb.
  2. Create a package named Do Not Store Packages in msdb and store it in each selected msdb. This will serve as an eye catcher in case someone browses the msdb for packages. It has no other purpose.
  3. Point the MsDtsSrvr.ini.xml files on all four of the physical machines to the location of the one selected msdb. So, for example, if the msdb is in the online instance on the CVM named prodcvm01 in the production environment, the tag would look like

    <ServerName>prodcvm01\online</ServerName> on all of the physical machines in the cluster group.

The Strategy, Part Two


I'm a big believer in using a formalized, consistent directory structure in which I store all my database files (e.g., data files, transaction logs, scripts, dumps). I copy this structure to every disk (or LUN) on a database server. The root of the structure is simply SqlData, which is followed by the name of the instance being supported (e.g., D:\SqlData\Online\).

In that structure, I introduce an SSIS directory, which includes a subdirectory for every logical SSIS application supported on that instance. For example, D:\SqlData\ETL\SSIS\PartnerETL would be the directory for an SSIS application known as PartnerETL located on the ETL instance. In the application directories, I store SSIS packages as .dtsx files.

In the root of the SSIS directory, I maintain and centrally manage a number of configuration files that are commonly used by the SSIS applications. I maintain one configuration file for every database in our architecture. Applications reference these configuration files by simply "reaching back" to them. For example, the common connection to the Online database can be used by referencing

..\Online.dtsconfig

within an SSIS package.

I use SQL Server Agent jobs to schedule the execution of SSIS packages. SQL Server Agent provides a job step type of SQL Server Integration Services Package. There are two critical tabs when using this type of job step. In the General tab, which Figure 2 shows, you need to identify the package to be run.


Figure 2: Identifying the SSIS package to be run (click to enlarge)



In the Configurations tab, which Figure 3 shows, you need to identify the configurations needed by the package.


Figure 3: Identifying the configurations needed by the SSIS package (click to enlarge)



You can use the GUI to create the job in a development environment. It then can be scripted. If everything is done properly, you can use the script as is in your nonclustered environment. To run the script in a clustered production environment, you'll only need to change the drive letters. Because the package is stored on a LUN that's part of the cluster group and because the package is scheduled by SQL Server Agent (which is also part of the cluster group), the package will run regardless of what physical machine your CVM is located on.

The Strategy's Benefits


Besides effectively dealing with the challenges associated with implementing SSIS in a cluster, I've found that this strategy also has other advantages:

  • The centralized management of configuration files lets you use different connection strings in the various environments (e.g., development, production, QA). Further, SSIS packages can be moved between environments without alteration—and this is accomplished without the use of variables.
  • The process of "reaching back" promotes standardization in package development.
  • The architecture is flexible. If one SSIS application needs additional configurations (e.g., the LogDataPath in Figure 3), you can simply place that configuration file in the application directory instead of in the root of the SSIS directory. (Note that you would then have to modify that configuration file if the application moves between environments.)
  • When using clusters in a production environment, all the resources needed to run an SSIS application—including SQL Server Agent (which schedules the job) and the SSIS packages (which are located on a clustered file system)—will fail over as a unit.
  • In a development environment where you run multiple SQL Server instances with multiple SQL Server Agents on one machine, the agents can share a single SSIS instance. For example, D:\SqlData would contain three directories (Online, ETL, and Reporting), each of which would have its own SSIS directory. You'd configure the various agents to run the load it would be required to run in production. That is, the agent for the ETL instance wouldn't run jobs for the online instance in QA, even though both agents exist on the same physical machine.
  • The .dtsx and .dtsconfig files are simple XML files. Deploying and maintaining these as files (as opposed to data in msdb) necessitates the enforcement of source-code management best practices.

Discuss this Article 2

dpbakerdpbaker@...
on Oct 31, 2011
John, we would like to implement a similar strategy. I'm not sure based on your article how to change the msdtssrvr.ini (xml) file to accomodate the different istances we have available. We have two physical server Active/Active and mutiple instances. For example named instance (INSTA,INSTB,INSTC) would be development instances, and (INSTD,INSTE,INSTF) are Testing and finally(INSTG,INSTH,INSTI) are pre-porduction or QUALITY instances. We promote the packages from one instances to the next, so not sure how to make the configuration changes within the MSDTSSRVR.INI for this portion to maintain a proper change management system.
tranqy
on Feb 26, 2010
Great article. Thanks for posting it!

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.