SQL Server Integration Services (SSIS) provides organizations with powerful data integration capabilities. To effectively monitor SSIS performance, troubleshoot errors, and understand data lineage in a production environment, you need to implement package logging and data auditing. I’ll show you how to use Business Intelligence Development Studio (BIDS) to set up a logging environment for managing, monitoring, and troubleshooting packages. I review SSIS’s built-in logging capabilities and then show you how to customize them. You’ll see how to incorporate basic data auditing in the package data flows. The examples I use include package templates along with a logging database and reports (for storing and viewing package execution results), which you can use as a starting point in your deployments.
The logging database and many of the SSIS tasks I use come directly from Project REAL. Project REAL is a reference architecture and implementation of an enterprise-class business intelligence (BI) solution based on SQL Server 2005. Microsoft and several partners used actual customer data to create this reference implementation. To learn more about Project REAL, see the Learning Path.
To download the sample code used in this article,click the "Download the Code" link at the top of the page. The .zip file contains four packages, two of which I discuss in this article, and two for your future use. I use the 1_ETL_Template.dtsx package to show you how to add a log provider to a package and then augment the log provider. I use the 2_ETL_Template.dtsx package to give you some Data Flow task tips.
Configuring Package Logging
Package logging refers to writing information about the execution results of a package, and its tasks. During package execution, SSIS can log many types of events (e.g., status, completion, error information about a task) to a variety of destinations, known as log providers. A log provider can be a text file, the SQL Server Profiler, a SQL Server relational database, a Windows event log, or an XML file. If necessary, you can define a custom log provider (e.g., a proprietary file format).
SQL SERVER MAGAZINE RESOURCES
For information about how to develop with BIDS:
“Building Better BI in SQL Server 2005”
“SQL Server 2008 Business Intelligence Enhancements”
For more information about using SSIS:
You configure logging at the package level by adding a log provider. You can choose to log only certain tasks within a package. When designing a package in BIDS, click Logging on the SSIS menu to display the Configure SSIS Logs dialog box. Use the Providers and Logs tab to select a provider to use for the container, as shown in Figure 1. When using the SSIS log provider for SQL Server, the Configuration column is where you specify a connection to a SQL Server database. All the log information is written to a table called sysdtslog90, which SSIS automatically creates in the database.
When you’ve added a log provider to your package, click the Details tab and select the events you want to log. Some of the event choices include OnError (which writes a log entry when an error occurs), OnPost Execute (which writes a log entry after a task or package finishes running), and OnVariableValueChanged (which logs an entry when a variable value changes). To prevent the log from becoming unnecessarily large and to avoid degrading performance, log only the events you need.
Opening the Logging Examples
Now let’s put the sample code for this article to use. Download it, if you haven’t already; then, open the LogAuditTemplates.sln file in BIDS from File, Open, Project Solution. This solution has two projects: LogAuditTemplates and LogAuditReports. In Solution Explorer, expand the LogAuditTemplates project and the Data Sources and SSIS Packages folders. Double-click the 1_ETL_Template.dtsx package, which you can see in Figure 2, to open it in the package designer. 1_ETL_Template.dtsx is a basic package that you can use as a template for building other packages.
SSIS Log Provider or SQL Server
The 1_ETL_Template.dtsx package has been configured to use the SSIS log provider for SQL Server and to store the log results (the sysdtslog90 table) in a database called ETL_LogAuditDB. Before you run the package, restore the database to an instance of SQL Server; you’ll find a zipped backup of the database in the file ETL_LogAuditDB.zip, which is included with the sample code. If you decide to restore the database to something other than a default instance on your local machine, you’ll need to update the connection information for the package’s ETL_LogAuditDB.ds and ETL_LogAuditDB OLEDb.ds data sources.
The log provider for the 1_ETL_Template.dtsx package is configured to log the OnError and OnPostExecute events. Limiting logging to these two events keeps the logged information to a minimum, yet allows you to calculate performance data for each task in a package and capture any errors that might occur.
Augmenting the Built-In Provider
As Figure 2 shows, the 1_ETL_Template.dtsx package contains two Control Flow tasks: SQL Audit OnPreExecute and SQL Audit OnPostExecute. These two tasks augment the built-in SSIS log provider by populating a row in a table named ExecutionLog. The ExecutionLog table serves as a header for all the detail rows the SSIS log provider writes to the sysdtslog90 table. The sysdtslog90 and the ExecutionLog tables are linked together by a field named executionid, which is a GUID generated by SSIS each time a package is run.
The ExecutionLog table provides a concise summary of package execution results (e.g., start time, end time, overall status) in a single row, along with other useful information. For example, the table stores package version information, which can be useful if troubleshooting problems resulting from a new version of an existing package.
The SQL Audit OnPreExecute and SQL Audit OnPostExecute Control Flow tasks are both Execute SQL tasks that call stored procedures in the ETL_LogAuditDB database. Most of the data passed to the stored procedures (e.g., PackageName, PackageID) comes from SSIS system variables. The sample uses the following three variables, as you can see in the left-hand pane in Figure 2.
• The LogicalDate variable lets you define a “logical” processing date, which might differ from the actual package execution date. For example, in an extraction, transformation, and loading (ETL) process you might need to load data from a specific time period (e.g., all of the new sales entries from a given fiscal period) or you might need to perform a net data capture based on the last successful execution date of the package. In these scenarios, a date value needs to be stored and then read/updated by the package during execution.
• LogID is an integer that uniquely identifies a package execution and is generated via an identity column in the ExecutionLog table.
• ParentLogID is simply the LogID of a “parent” package. In other words, if the package is invoked from another package, the ParentLogID value will be equal to the LogID of the calling (i.e., the parent) package. To pass the LogID from the parent package, you use a package configuration. To view the package configuration, select Package Configurations from the SSIS menu.
As I mentioned, the SSIS log provider for the 1_ETL_Template.dtsx package is configured to log OnError events. Additionally, the package contains a package-level OnError event handler consisting of an Execute SQL task named SQL OnError, which updates the ExecutionLog table if an error occurs.
Viewing Package Results
From the Solution Explorer in BIDS, right-click the 1_ETL_Template.dtsx package and select Execute Package. After the package runs, you can use SQL Server Reporting Services (SSRS) to view the logged events.
In Solution Explorer, expand the LogAuditReports project and the Reports folder. Double-click the PackageExecutionDetails.rdl report to display Report Designer. Click the Preview tab, select a Package Name and Execution Run value, and then click View Report. The report will look similar to the Package—Execution Details report in Figure 3. This report shows you when the package ran, who initiated the package and from what computer, the total package runtime, and the time it took to complete each task. If an error occurs, information about the error will also be listed in the message column.
Data Flow Performance and Data Auditing
The second package in the sample download, 2_ETL_Template.dtsx, incorporates performance tracking and data auditing (sometimes referred to as data lineage) into a Data Flow task. Open the package in the package designer and click the Data Flow tab. In this example, shown in Figure 4, I select a set of records from a source system (in this case, the AdventureWorks sample database) and then eventually output the records to a Row Count component. Let’s examine the two components in the middle, STAT Source and DER Audit.
STAT Source is a script component that measures the number of rows flowing through a Data Flow task, along with the minimum, maximum, and mean throughput per second. The results are stored in the StatisticLog table. You can include multiple STAT Source components in a Data Flow (be sure to give each component a unique name) to measure throughput at multiple points in the flow. In the LogAuditReports project, I’ve included another report, PackageProgressByExecutionTime.rdl, that provides a summary of package execution results by date and includes data from the StatisticLog table for packages that incorporate at least one STAT Source component.
DER Audit is a derived column component that adds two important pieces of information into the Data Flow—LogID and LastModifiedDate. This information is meant to be written to your destination source(s) so you have a way of mapping back to the package responsible for writing the data. If you use another means of inserting or updating data in a destination system, such as Execute SQL tasks, make sure to add these derived columns.
I’ve included two more packages for you to examine on your own. 3_ETL_MP_Template.dtsx is an example of a parent package that calls a child package. 4_ETL_Package-WithError.dtsx includes an intentional error in the data flow so you can see what an error looks like in the sample reports.
How do you use these packages in your environment? You can make a copy of the LogAuditTemplates project (updating/adding Data Sources as needed) and go from there. Or, you can add one or more of the individual packages to your existing SSIS projects and customize as needed. Note that each package contains a property named ID, which is a GUID that uniquely identifies the package. When you copy a package, this ID doesn’t change. To cleanly separate data in the ETL_LogAuditDB database, you should not only rename the package but also generate a new GUID. To generate a new GUID in BIDS, select the drop-down arrow in the ID column of the Properties pane of the Package Designer and select Generate New ID.
Finally, there are a few components in the ETL_LogAuditDB database I didn’t cover. You can use the ExecutionLog table to track performance of SQL statements run in a destination system. You can use the ProcessLog table to manage partitioned tables. The Project REAL ETL white paper discusses these components in greater detail.
To recap, I’ve reviewed SSIS’s built-in logging capabilities and then shown you how to augment them. I also incorporated performance tracking and data lineage into the package templates for auditing/troubleshooting purposes. These templates, along with the ETL_LogAuditDB database and sample SSRS reports, can serve as a starting point for ETL solutions you deploy in the future.