Migrating is a daunting prospect, but there's a light at the end of the tunnel
How many SQL Server 2000 DTS packages are you responsible for? Perhaps 10? Maybe 50? When I asked this question at a conference last year, a few attendees shocked me by telling me that their environments boasted more than 2000 DTS packages! Was it irresponsible of me to wish them good luck in their migrations? Perhaps. But if you have several hundred DTS packages, you should know that you’re not alone. DTS is well known for its simplicity and usefulness, and in fact, many organizations adopted DTS for its ability to quickly and simply generate data processing.
Whether you have 10 DTS packages or several hundred, you’re probably preparing for an inevitable migration to SQL Server 2005 Integration Services (SSIS). However, in your testing and research, the pain points have become apparent: The product’s architecture differs from that of SQL Server 2000 DTS—making the migration more complex than a straight upgrade—and even after you migrate a couple packages, you might see only a little performance gain. And if you’re looking at dozens or hundreds of packages, you can expect several sleepless nights.
Your First Step
A good starting framework for a DTS-to-SSIS migration strategy includes first getting your DTS packages to SQL Server 2005. Don’t let your DTS migration hold up your relational engine upgrade to SQL Server 2005.
- Move your DTS packages to your SQL 2005 environment (through the setup.exe SQL upgrade or manually through Management Studio), and continue to run them as DTS packages.
- Use SSIS to build any new packages you need.
- Use the DTS Migration Wizard as a starting point for strategic packages that can take advantage of SSIS features or for packages that have trouble during the migration.
- Plan for a rolling strategy to rework all packages, leveraging the complete SSIS feature set in the redesign.
There’s a light at the end of the tunnel—a bright one. And getting there might not be as difficult as you think. In fact, now that SQL Server 2005 is more than a year old, more resources and knowledge are available to help you through the process. (For a helpful primer, see “Step Up to SQL Server 2005,” InstantDoc ID 47749, as well as this article’s sidebar, “Your First Step.” ) But don’t settle for merely getting your packages migrated to SSIS. Rather, prime yourself for taking full advantage of SSIS’s improved features and functionality.
When you’re ready to take on your DTS-to-SSIS migration, you’ll need to have at your disposal the tools and resources you need to make informed decisions and perform a clean migration. One such tool is the Upgrade Advisor, which can analyze your existing DTS packages and report about concerns that you must address before and after your migration. The most recent version of Upgrade Advisor is downloadable with the “Feature Pack for Microsoft SQL Server 2005 - April 2006” (http://www.microsoft.com/downloads/details.aspx?FamilyID=df0ba5aab4bd-4705-aa0a-b477ba72a9cb &DisplayLang=en).Even more valuable is the “SQL Server 2005 Upgrade Technical Reference Guide” (http://www.microsoft.com/downloads/details.aspx?FamilyID=3d5e96d9-0074-46c4-bd4f-c3eb2abf4b66&DisplayLang=en), which covers the entire SQL Server platform upgrade in 350 pages, with 38 pages dedicated to DTS package migration to SSIS—invaluable to understanding the nuances of the upgrade. Both of these resources cover the basics of using SQL Server’s built-in DTS Migration Wizard to convert DTS packages to SSIS. To invoke the DTS Migration Wizard, you right-click the SSIS Packages folder in Development Studio, as Figure 1 shows. (To prepare your packages for the DTS Migration Wizard, see the sidebar “Preparing for the Wizard.”)
One of the biggest challenges involved with migrating DTS packages to SSIS arises when the DTS Migration Wizard can’t convert a DTS component, such as a data pump that does more than copy column operations. In some cases, the converted SSIS package will retain a portion of the original DTS package, embedded in an Execute DTS Package task in SSIS. Other tasks, such as the Dynamic Property Task and some of the logic embedded in an ActiveX task, will need to be rewritten with SSIS functionality. Each of the migration nuances will be identified by the Upgrade Advisor tool or discussed in the upgrade planning and preparation resources.
Beyond just the migration resources that SQL Server 2005 provides in the DTS Migration Wizard and Upgrade Advisor, the software also includes support for DTS packages. For example, the \Management\Legacy folder in SQL Server 2005 Management Studio (SSMS) includes a container for DTS 2000 packages, as you can see in Figure 2. These packages might have been included with the SQL Server relational engine upgrade; you can also load DTS packages after the upgrade into a SQL Server 2005 instance.
With the DTS design tools installed for SSMS (available in the “Feature Pack for Microsoft SQL Server 2005 - April 2006” referenced earlier), you can modify DTS packages through SSMS and execute them on the server. Also, using the Execute DTS 2000 Package Task included in the SSIS Control Flow, you can run DTS packages alongside SSIS packages, so you can move forward with SSIS packages while managing your migration incrementally.
The Migration Wizard Output
The reason DTS-to-SSIS migrations pose challenges goes beyond the fact that SSIS is a new product. Fundamentally, DTS and SSIS have different architectures. By settling for just a migrated package without then extending or rewriting parts of the package to take advantage of SSIS functionality, you’re settling for less. To be sure, you’d see immediate manageability advantages, but when it comes to performance and scalability, DTS architecture limitations would still be evident.
Most DTS packages use an Extraction, Loading, and Transformation (ELT) model of data processing—a flip from true Extraction, Transformation, and Loading (ETL). In other words, data processing logic in DTS often relies on SQL Server to perform the business transformation logic, and to use SQL Server, the data first needs to be loaded into tables. This type of process can be relational database management system (RDBMS)-intensive. TSQL can provide valuable support in an SSIS-based ETL process, particularly for set-based operations, but when it comes to bulk operations and data transformations, DTS tends relies too heavily on SQL Server logic and is therefore often bound by disk I/O bottlenecks and synchronous processing. For example, consider this typical DTS package: Data is extracted from a flat file source where it’s landed to a staging table; a SQL Server update adds key relationships and repairs missing values; finally, the staged data is joined to a production table and loaded to the destination.
In this typical package, which Figure 3 illustrates, the extraction needs to be complete before performing the update, which then needs to be complete before the loading processes. You might have similar packages or even more complicated packages with many staging and SQL Server precedence requirements. Perhaps you even have hundreds of packages that perform operations that are similar to this example.
If you haven’t yet tested the DTS Migration Wizard, you might be wondering how much faster your DTS packages will perform after you migrate them to SSIS. The answer is that SSIS packages created by the DTS Migration Wizard most likely won’t perform any faster than the original DTS package. If you’ve already performed a few tests, you might still be wondering why your migrated DTS packages don’t perform much better in SSIS.
The answer is in the package architecture. When you run the DTS Migration Wizard, the output essentially generates a Control Flow–centric (or workflow-centric) SSIS package that contains Data Flows with minimal to no transformations. If your DTS packages were workflow packages that coordinated the execution of SQL Server tasks or copy-column data pumps, your migrated packages in SSIS will have the same architecture, relying on the underlying staging tables and RDBMS engine to perform the core of your logic. In both cases, DTS and SSIS are simply providing the coordination of the data copies and SQL Server–based logic. Similarly, if your DTS packages leveraged any transformation in the data pump, those data pumps would migrate as an Execute DTS 2000 Package task and therefore still run under the DTS runtime engine.
The DTS package that Figure 4 shows, when run through the DTS Migration Wizard, generates several Data Flows and a couple of Execute SQL Tasks, matching directly to the DTS Data Pumps and DTS Execute SQL Tasks, respectively. In this simple example, the DTS package runs in almost exactly the same time frame that the migrated SSIS package runs. This behavior is expected: Both packages execute the same steps in the same order and rely on the underlying relational engine to perform the transformation logic in an ELT-type process.
Although performance is an important consideration, it’s only one of the many criteria to look for in an ETL tool. From the start, it will be easy to apply the new SSIS feature set to your migrated DTS packages. Many of these new features don’t necessarily relate to performance but still provide great improvements in the areas of development administration and package control—for example, package configurations to share properties and connections, checkpoints to enable restartability, source control and debugging integration with Microsoft Visual Studio, environment-deployment tools, offline capabilities to ease development, conditional and logical-OR precedence constraints, built-in looping and Windows Management Instrumentation (WMI) integration, and better backend Analysis Services and SQL Server 2005 support.
Rethinking Your Data Processing in SSIS
When it comes to performance and data processing, SSIS can do much more than DTS while providing better scalability. However, when just relying on the output of the DTS Migration Wizard, SSIS’s core data-processing features don’t come into play (because of the nature of a DTS migration). This includes the SSIS Data Flow’s pipeline engine, which provides data transformations, cleansing, and performance beyond the capabilities of DTS’s staging and SQL Server–centric model. A few of the SSIS data flow advantages are:
- In-memory data association of heterogeneous sources using merge, union, and lookup capabilities
- Text and data mining to provide insight into data relationships and text-field contents
- Grouping and sorting for data aggregations and ordering
- Data routing and duplicating, which allow multiple destinations and filtered data inserts
- Data cleansing to match data based on similarity
- Additional adapters that integrate XML, binary, SQL Server Mobile, OLE DB, ADO.NET, and other sources and destinations
When you apply these advanced Data Flow features to your migrated DTS packages, you’ll unlock the muscle of SSIS. For example, suppose we redesign the aforementioned SSIS package to take advantage of the Data Flow in SSIS. Figure 5 shows a single Data Flow to handle the same logic present in the original DTS package and initial package following migration to SSIS.
This redesigned package uses the memory-based Lookup, Merge Join, Aggregate, Sort, and Derived Column transformations. This redesign reduces the synchronous steps involved in the prior architecture, and because of the memory-based architecture of the SSIS pipeline, also reduces the disk I/O. Overall, this redesigned sample package runs about 40 percent faster than the migrated package. Also compelling is that when you track the Physical Disk counters, the redesigned package uses 50 percent less disk I/O, and in fact also reduces the average processor utilization from 35 percent to 25 percent, allowing more headroom for other processes and packages to run on the same server.
So, are data-staging tasks no longer required? In many cases, staging data isn’t necessary, but a few situations merit their use. For example, staging can help fulfill data-validation requirements, provide a point-in-time capture of a volatile source, or help align sources in which the extraction times don’t overlap. SQL Server can also provide valuable functionality with set-based updates or deletes, recursive operations, and ETL auditing and administrative tasks. When deciding on staging data in your migrated packages, ask, “Which staging or SQL Server steps were used because of the DTS based architecture?” and “For these steps, how can I leverage the SSIS Data Flow?” It’s true, however, that the net effect will be a reduction in staging and sequential SQL Server tasks and lead to an overall increase in server scalability and a reduction in processing times.
Getting to SSIS
As you consider a migration, what are the repercussions of these developments for your DTS environment? First, moving from DTS to SSIS doesn’t have to be an overnight event—particularly if you have dozens or hundreds of packages in your environment. SQL Server 2005 supports DTS package design and execution, opening up the opportunity for a rolling migration with a temporarily mixed DTS-and-SSIS environment.
Second, you would do well to take advantage of the available resources that can help you plan your migration and prepare your DTS packages for the migration. Training is also available. Microsoft has just released its SSIS curriculum, and SQL Server partners also offer excellent material.
When it’s time to move your packages to SSIS, the DTS Migration Wizard provides a good first step in the process but might require some minor workarounds and modifications to get packages running in SSIS. However, once your packages are in SSIS, you should focus on two general areas for improvement: first, project, package, and control flow capabilities to enhance package development and administration, and second, Data Flow features to improve data associations, cleansing, and other transformations. Because SSIS supports improved development and testing features, it will make your changes easier and you’ll soon be able to leverage the improved SSIS features and architecture.
When you’re done, you’ll deserve to take a breather. The good news is that once your environment is migrated to SSIS, you’ll be better positioned for the future with a solid architecture and better tools to make administration and development easier. And take heart: Future upgrades will be much easier.