Use breakpoints and checkpoints to more efficiently debug packages
| Executive Summary:|
The debugging and logging capabilities in SQL Server Integration Services (SSIS) are greatly improved over those found in DTS. In SQL Server Integration Services (SSIS), you can debug packages, Control Flow tasks, and Data Flow tasks. SQL Server Integration Services (SSIS) also includes logging capabilities that display information about a package after it's been executed.
It’s been more than two years since SQL Server 2005 shipped with SQL Server Integration Services (SSIS). However, many companies still haven’t converted their DTS packages to SSIS, possibly because the migration process can be painful or they don’t have the time to learn about a new product.
Those of you who have made the conversion know that SSIS definitely isn’t the “next version of DTS.” Debugging and logging are just two of many areas that have undergone a complete overhaul in SSIS. Let’s take a high-level look at SSIS’s debugging and logging capabilities. If you’re among those who are still using DTS, prepare to be impressed.
Debugging Packages During Development
SSIS is far ahead of DTS in the area of debugging, which is the process of identifying errors that prevent a package from being executed or producing the desired results. In DTS, debugging typically involves using MsgBox statements or the VBScript Stop command to simulate a breakpoint in scripts. Such statements had to be removed from the package before it was put into production. In contrast, debugging is built into SSIS, and nothing has to be removed when the package is moved to a production environment.
The problem with SSIS’s debugging tools is that they aren’t consistent across all types of tasks, so knowing when to use what tools can be a big help. Let’s look at SSIS’s debugging capabilities at the package level, followed by debugging within Control Flow tasks and Data Flow tasks.
During the development process, SSIS provides red or yellow icons within the SSIS designer that tell you when something is inherently wrong with the package. To view the message associated with an icon in a collection of packages, hover your mouse over the icon, as Figure 1 shows. These messages are typically related to data-source connections or data-type problems. Because it can take time to sift through the messages in the Output pane (which is located below the Error List pane in the SSIS designer) during debugging to see what caused a package to fail, SSIS lets you click the Progress tab from the designer to view an outline structure of the package, as Figure 2 shows, and see where the failure took place.
When you debug a package in aDevelopment Studio environment, you can see which task is running and how far it’s progressed by the background color of the task: Yellow indicates that the task is running, green indicates that the task completed successfully, and red indicates that the task completed with errors. Next to Data Flow tasks, you also get a count of the rows that have been processed by the task. You might notice that SSIS often runs multiple tasks simultaneously, whereas DTS runs only one task at a time. The pipeline nature of SSIS is one of the core architectural differences between it and DTS.
Debugging Control Flow Tasks
Control Flow tasks control the flow of the package. SSIS’s debugging tools for Control Flow tasks closely resemble those available in any respectable development environment. Breakpoints and the Debug windows can be especially helpful in debugging Control Flow tasks.
Breakpoints tell SSIS to pause execution at the indicated point in the package. When processing encounters a breakpoint in Debug mode, processing pauses, and the Debug windows give you access to additional information such as variable values, call stacks, and status messages. You can set breakpoints at the package, container, task, or Script task level. You can even set breakpoints to pause after a given number of encounters with an event, such as on the tenth iteration of a task in a For Loop construct. Figure 3 shows how you can interrogate an object to reveal its property values during a breakpoint session.
I find the Locals window, which is accessible from the Debug/Windows/Locals menu item, to be the most useful of SSIS’s Debug windows. It displays values for all the variables in the package and even lets you modify variable values, which can be useful when skipping ahead to a later point in the package or simulating certain conditions during package development.
Debugging Data Flow Tasks
Data Flow tasks control how and when data is manipulated in the package. The primary debugging tool available in Data Flow tasks is the data viewer. I use the data viewer during the development of Data Flow tasks to see what the data looks like in the pipeline as it flows from one task to another—usually just before the task that actually loads the data into its destination.
You can add a data viewer between two Data Flow tasks by right-clicking the connector between the tasks, selecting Data Viewers, and then selecting the data viewer you want to use. You can view the data in a grid, chart, scatter plot, or histogram. I usually view data in a grid, as shown in Web Figure 1, but I recommend playing with all four data viewers to get a feel for when you should use each one. Although I typically remove my data viewers before deploying a package in production, you don’t have to do so.
You can modify columns displayed by a data viewer either as you set up the data viewer or after setup. To modify a data viewer after you’ve created it, right-click the connector, select Data Viewers, highlight the data viewer, then click Configure.
Another Data Flow task debugging technique that I use frequently during package development and debugging is the RowCount task. The RowCount task relates only to Data Flow tasks. The RowCount task isn’t usually billed as a debugging tool but can be quite useful as one. The RowCount task simply counts the rows passed through the pipeline and puts the final row count into a variable. I almost always use the RowCount task as my initial data destination because it serves as a way of examining the data, via a data viewer, without actually loading the data anywhere. Because the Row- Count task carries no measurable overhead, it can also be used for baselining or to diagnose performance problems. For more information about using the RowCount task, see the Microsoft white paper “Integration Services: www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx).Techniques” (
Continued on page 2
Because you can’t foresee all the conditions that will ultimately occur in a production environment, SSIS provides powerful and flexible logging capabilities to display information about a package after it’s been executed. The challenge is to log enough information to help you quickly diagnose and minimize the impact of problems that might occur in production.
Several error log providers let you specify where log messages will be written—to text files, SQL Server Profiler, a SQL Server table, the Windows event log, or XML files. You can indicate for which events log messages will be written and the amount of information that’s written. The type of events that trigger log messages can vary based on the Control Flow task type. For instance, a Data Flow task can log events such as OnPipelineRowsSent and PipelineInitialization.
To add logging to a package, click Logging on the SSIS menu and select one or more error log providers (i.e., log entry destinations), which enable you to write to a target destination. Select the check box next to the events you want to log, then click the Details tab and specify the events you want to log. Next, click Advanced to specify the columns to be logged, otherwise all the columns will be logged. After configuring logging, you can view log events in real time during development runs by right-clicking in the Control Flow designer and selecting Log Events.
Because the error log identifies each logged task by name, I recommend implementing a naming standard that uniquely identifies the running SSIS package. Depending on the events you choose to log, the error log can grow fairly rapidly. Be sure to log only the events that you need and occasionally prune old log entries. Because SSIS doesn’t include a process to do this out of the box, you must manually prune logs if your error log provider doesn’t provide the functionality to do so. For example, if the error log provider is configured to send log messages to SQL Server, rows in the msdb.dbo.sysdtslog90 table can be deleted after a specified period of time. You can also create logging configurations as templates to provide a consistent strategy across packages that perform similar functions and make log management easier.
Checkpoints, another powerful SSIS feature, let you restart a failed package in production. SSIS package execution—especially extraction, transformation, and loading (ETL) package execution—can be lengthy. By its nature, ETL moves and transforms large amounts of data, which can be time-intensive. The failure of a package or task two hours into an ETL process could be catastrophic in a data warehouse system that’s required to be available by a set time. Checkpoints let you save the work that’s been accomplished so that when you restart the package after resolving the problem that caused it to fail, the process can pick up where it left off. Checkpoints aren’t enabled by default, however; they have to be turned on for each package.
Knowing how checkpoints work before you develop your packages is important because they can affect package design. Note that checkpoints can only be applied to Control Flow tasks. For example, a package can’t be restarted in the middle of a Data Flow task, which is considered to be a unit of work that’s either entirely successful or not. The lack of ability to checkpoint Data Flow tasks provides a good argument for componentizing packages to break logical tasks into Data Flow tasks or, ideally, to try to group packages together based on a Data Flow tasks’ packages. For example, you might modularize the packages by taking one big package and making into several smaller packages grouped together inside a controlling package. Then you could set a checkpoint on each module package.
Each package must be configured to be checkpointcapable on the Details tab of the Package window (shown in Figure 4), which can be accomplished in a SSIS process that includes parent and child packages. You can configure each package to create a checkpoint log to track execution information in case the package fails, as shown in Figure 4. If the entire process runs without error, the logs are removed when the ETL process is complete. Web Table 1 shows a high-level view of the properties you have to set for each package.
Debugging and Logging in SSIS
As you can see, SSIS provides more powerful debugging and logging capabilities than DTS, enabling you to debug packages more efficiently. Knowing when and how to use these tools can greatly reduce the time it takes to develop packages and resolve problems before and after the package is sent to production.