DTS Scripts and Tricks

Downloads
7836.zip

Put DTS through its paces

During the past few months, we've seen numerous newsgroup postings and received several email requests for additional coverage of Data Transformation Services (DTS) scripting techniques within the Package Designer. So let's examine several common scripting patterns that you can apply to package development, including invoking a package from within another package, accessing properties the Package Designer doesn't directly expose, and dynamically changing the source or destination data store. We'll show you how to overcome a subtle but show-stopping gotcha that Visual Basic (VB) developers need to watch for, and how to create a splitter transformation to normalize one row with many repeating columns into a table where each of the repeating columns becomes a row.

Package Within a Package


How many times have you developed a complex transformation in one package, then found that you need the same functionality in another package? In this situation, most developers duplicate the code by creating a new package and copying the original task's scripts and transformations. Although this process works, it creates two separate code bases, resulting in two points of maintenance. Instead, we propose leveraging the original package by calling it from within the second package.

You have two options for invoking a package from another package. The first option is to use an Execute Process task to directly launch DTSRun.exe. The second option is to use an ActiveX Script task to programmatically create, load, and execute the original package. To illustrate these options, we'll create a package called PackageA, which Screen 1 shows. PackageA contains an ActiveX Script task that returns a simple message box. Now let's create a second package, PackageB, and add an Execute Process task to launch PackageA. Screen 2 shows how you invoke the DTSRun executable. You pass the /S parameter to specify which server the package is stored on, the /E parameter to instruct DTSRun to use a trusted connection when connecting to SQL Server, and the /N parameter to identify which package to execute. Executing PackageB, as you see in Screen 3, produces a spawned command process in which PackageA executes and produces a message box containing PackageA's output.

An alternative to using the Execute Process task is using an ActiveX Script task to invoke a package, which gives you greater control by exposing the entire package, through the DTS object model, to customization. For this example, we'll implement the simple ActiveX Script task that Screen 4 shows to load and execute PackageA. This task uses VBScript to directly access the DTS object model. To start, the task declares a local variable, moPackage, to hold the package. Next, the Create Object function instantiates a DTS package object. Then the package's LoadFromSQLServer method loads the definition of PackageA from the SQL Server machine Marble. The task is now ready to execute PackageA. After package execution, the script reclaims the package object's memory by setting its value to nothing and sets the task's completion status to report a successful execution. This time, instead of spawning a new process, PackageA executes within PackageB's memory space and produces the expected message box, which you see in Screen 5.

Looking Under the Covers


We've noted in previous articles that the Package Designer hides many details of using the DTS object model directly. For example, the Package Designer handles the task-to-step associations that control the task execution flow. The Package Designer also hides package methods such as GetExecutionErrorInfo, properties such as StartTime and FinishTime, and details of the underlying task and connection objects. Wouldn't you like to have access to this information, or better, be able to dynamically change the information at runtime? To get access, you need to write directly to the DTS object model. In May and June 1999 ("Unleash the Power of DTS" and "Pump Up the Power of DTS"), we explored how to create packages with VB. You can reuse many of those concepts within the Package Designer.

The key to accessing this information is creating a reference to the package you want to access. To see how this process works, imagine that you've developed a package that imports a data file. Later, your project's requirements change, and now you need to process two more data files with the same format and similar data content. Rather than creating two new packages or adding two additional tasks to your existing package, you can reuse the existing package and dynamically reset the source connection to process the three files.

Let's first build a package to process one file. Listing 1 describes the schema that holds the destination data. Before creating the task, run this script to set up your database environment. This task creates a database called StagingDB and the table SourceData, which will be the destination for the transformation. We'll use the other table, VisitPhysician, in a later example. Now let's create the package.

The package, DataLoader, consists of one SQL Server connection and one bulk insert task. Screen 6 shows the SQL Server connection, StagingDB, which defines the database connection. The bulk insert task, which Screen 7 shows, contains the reference to the data file that we'll later modify. We pointed the source file to the first data set, C:\Projects\Article\SampleData01.dat. For the purpose of this example, the data's format is unimportant. (Subscribers can download the format file, SampleData.fmt, along with all the other sample code, at the link to this article at http://www.sqlmag.com.) At this point, save the package to your local SQL Server and run it once to test it.

To complete the example, a second package, DataLoaderDriver, consists of a single ActiveX Script task. The task, in Listing 2, page 48, builds on the programmatic package-within-a-package technique we discussed earlier. This time, the task goes a step further by referencing the package's Tasks collection, specifically the DTSTask_DTSBulkInsertTask_1 task, and changing its DataFile property to one of the filenames in the FileNames array. With this process, you can quickly and easily iterate through the file list and process each file without having to recode any of the original package. (If you try this example, be sure to change the pathnames of the data files and the SQL Server server name in the script to match your environment.) To extend this example, you can read the file list from a database, the Active Directory, an external file, or even the Registry. Where the list originates is irrelevant. You need only to understand how to modify the package's properties from within another package.

Another variation on this example lets a package modify itself. Here, too, you need to create a reference to the underlying package and then leverage your knowledge of the DTS object model. For example, from within a package, you access the package through the DTSGlobalVariables meta data. Consider the ActiveX Script in Listing 3, page 48. This script drives the task in a package called CreatorName. The script begins by using the package's GlobalVariables collection to gain access to the parent package. After you have access to the package, you can access or manipulate any of its attributes. In this example, the task simply uses a message box to display the package's creator.

DTS Threading Model


When developing with DTS and VB, you need to be aware of a subtle but potentially show-stopping difference in the threading models of the two tools. Briefly, threads let multiple tasks execute within one process. For example, in a multithreaded application, one thread might handle disk I/O while another handles network traffic. Multithreading allows parallelism within an application. In November 1999 ("DTS Error Handling Revealed"), we discussed the importance of choosing thread-safe data providers that fully implement the OLE DB Service Components interface, and the ramifications of using non-thread-safe providers. This warning extends to developers who are mixing DTS with VB.

The problem with mixing DTS and VB is that DTS is free-threaded (a variation of multithreading), whereas VB is apartment-threaded (a concept born of the COM architecture). The exact differences between the two and the definition of apartments are topics well beyond the scope of this article. For further reading on COM threading models, please refer to the Microsoft Developer Network (MSDN) Online Library under the topic Platform SDK/COM/COM Fundamentals/ Processes, Apartments, and Threads (http://msdn.microsoft.com/ library/psdk/com/aptnthrd_8po3.htm). However, the differences are serious enough that not handling them appropriately within your projects will lead to access violations at runtime. What makes this compatibility issue hard to diagnose is that Enterprise Manager and the Package Designer are free-threaded. Therefore, packages executing through the Package Designer won't experience the problem. Executing under VB, however, often generates this error:

<i>Run-Time Error -2177221499 (80040005)</i>

<i>Provider generated code execution exception:
     EXCEPTION_ACCESS_VIOLATION</i>

So, how do you avoid this access violation? It's simple. For any package that executes through VB or any package that references a custom task developed with VB, DTS is forced to run within the main thread. Programmatically, setting the step object's ExecuteOnMainThread attribute to TRUE does the same thing. Screen 8 shows how the Options tab of the Workflow Properties dialog box exposes this attribute within the Package Designer. The workflow properties belong to this package's Data Transform task. It's important to note that enabling the ExecuteOnMainThread attribute forces the associated tasks to run in serial rather than parallel, resulting in possible performance degradation (the performance degradation could be significant on an SMP machine).

Changing Columns into Rows


In relational database design, the first rule of normalization is to eliminate repeating groups. In an ideal world, all databases would follow this rule. However, in the real world, not all databases are relational and, even within many relational databases, repeating groups exist. This fact leaves database developers, especially in the data warehousing community, with the challenge of transforming repeating groups into normalized, relational tables, or to put it another way, splitting columns into rows.

At first glance, DTS might not seem to be a good candidate for solving this normalization problem. After all, as we've noted in several previous articles, transformation tasks are limited to a single operation or statement against the destination data store. To solve this problem, you'd potentially need many operations to properly convert one source row into several destination rows. DTS offers two possible solutions. The first solution uses a Data-Driven Query task with a stored procedure. With the data-driven query, DTS sends the entire source row to the server where the stored procedure resides, and then the stored procedure maps the columns into their appropriate INSERT statements. "The DTS Development Guide" (July 1999) covered the use of data-driven queries, so we won't go into detail here. Instead, we'll focus on the second solution to the splitter problem, using a transformation task's DTSTransformationStatus constants to control the flow of processing.

Within each transformation, you can control when the DataPump moves to the next row. By manipulating the DTSTransformationStatus value, you can force the transformation script to process the same row multiple times, thereby letting the transformation generate multiple INSERT statements from the same source row.

The following example illustrates this idea: At a hospital, multiple physicians are associated with a visit. The SourceData table, whose schema you see in Listing 1, represents the source system that supplies the data. The VisitPhysician table, whose schema is also in Listing 1, represents the destination in the warehouse.

First, build a package containing two connections, the first representing the sourcesystem and the second representing the warehouse. Next, add a Transform Task between the two connections and define a new ActiveX Script to handle the transformation. By default, DTS maps each source column to a destination column in column order. Before you add the transformation, you must delete the default column mappings by selecting and deleting each transformation line between the source and destination tables. Now, add the new transformation. As Screen 9 shows, when you're defining the transformation, select all columns in the source and destination tables and click New to create a new transformation of type ActiveX Script. Clicking New opens the ActiveX Script Transformation Properties dialog box. Place the VB Script from Listing 4 into this dialog box to implement the splitter.

In Listing 4, before the Main function declaration, establish and initialize a variable, nCounter, to keep track of where the code is in processing the row. The value 4 represents the number of output rows per source row. The function Main() begins by checking whether any rows are left to be inserted. When the function returns true, it first sets its return status to DTSTransformStat_SkipFetch. Here's the key to the splitter's functionality: By setting the status to SkipFetch, the transformation tells the DataPump not to get a new row from the source connection, but instead to resubmit the current row for additional processing. If nCounter equals 0, the transformation resets the nCounter value to 4 and sets the status to DTSTransformStat_SkipInsert. This time the transformation tells the DataPump not to write anything to the destination, but to move on to the next source row. The remainder of the function determines which PhysicianType to insert and sets the destination columns appropriately.

Using the data-driven query approach to solve this normalization problem offers a definite performance advantage over implementing the splitter by limiting the number of calls to the database. However, the splitter offers the flexibility of leveraging additional scripting and error-handling capabilities that aren't available to Transact SQL (T-SQL). Besides the splitter, you can use a similar technique to implement PivotTable-like functionality. A PivotTable goes in the opposite direction of the splitter—that is, where the splitter turns columns into rows, the PivotTable turns rows into columns. But we'll leave that example for a future article.

Being able to leverage common design and coding patterns is one of the hallmarks of a good developer. This month, we touched on several common scripting patterns that you can use on your DTS projects, but we've only scratched the surface. Send us email and tell us about your scripting tricks. We're always looking to expand our knowledge base and share that information with you.

Discuss this Article 5

Richard Burton (not verified)
on Oct 17, 2003
Hi I'm searching the web to try and find any information on the 'Parameters' value in DTS package task 'Execute Process. So far I havent' found much, the batch job I'm calling from DTS is failing and so is my knowledge..! Cheers
Anonymous User (not verified)
on Nov 3, 2004
To the Anonymous User, you should use global variables to set all values that are going to change when you move the package. Then, you simply have to update the global variables to whenever/wherever you move it.
Sambasiva Darbha (not verified)
on Jun 26, 2001
Hi I have been trying for quite sometime now, to execute a DTS Package saved on my server under Local packages, from the command prompt using a batch file. It doesn't execute. It opensup a separate window but it doesn't execute. The syntax i had used to get it executed is as follows cmd /c \\abcdefghi\MSSQL7\binn\dtsrun /S abcdefghi /E /N P1 I had tried all combinations of flags for instance , saving it as a file and running and using USERID etc. But it doesn't seem to work. The server configuration we have is as follows.. It's a Winnt CLUSTER of 2 servers , which i am accessing through their virtual Names. Some one told me that you cannot invoke the DTCRUN application on a virtual server. If this is true how do you schedule the package. Please help.
Rohan Perera (not verified)
on Jun 5, 2004
This article is very usefull for me. Thanks
Anonymous User (not verified)
on Oct 27, 2004
I want to create a DTS package in the development SQL Server and move it to Production SQL Server later. How can i change the SQL Server Name, Source and Destination names used in the package with out much intervention from the Production DB Administrator.? Is there any way I can create some scripts like SQL Stored Procedure and edit it later.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.