Use SSIS to retrieve data from a MySQL database and insert it into SQL Server tables
|Executive Summary:If you work with SQL Server Integration Services (SSIS), at some point you’ll probably need to access data from a MySQL database and integrate it into Microsoft SQL Server. Here’s an easy-to-follow solution that uses SSIS,Development Studio, MySQL connectors, and SQL and SQL Server ( ) scripts to retrieve data from two MySQL tables and insert that data into corresponding SQL Server tables. By performing the steps in the solution, you’ll become familiar with retrieving data from a MySQL database and some of the issues you might have to address when you retrieve that data..|
As you work with SQL Server Integration Services (SSIS), sooner or later you’ll have to retrieve data from a MySQL database. Fortunately, MySQL provides drivers that make the process relatively easy. And once you learn the basics of how to use those drivers, you’ll be able to integrate MySQL data almost as easily as SQL Server data. To help you get familiar with using SSIS with MySQL, I’ll show you an SSIS solution— MySQL_SQLServer—that retrieves data from two tables in a MySQL database and inserts that data into comparable tables in a SQL Server database. In our discussion here, I assume that you already know how to develop an SSIS package in Business Intelligence Development Studio, the pared-down version of Microsoft Visual Studio 2008 that ships with SQL Server 2005. I also assume that you’re familiar with how to define connection managers, control flow, data-flow elements, and Windows ODBC data sources. You can find information about MySQL at www.mysql.com. To download the generally available edition of MySQL, go to the Developer Zone.
Step 1: Connecting to MySQL
The first step in creating an SSIS solution that connects to a MySQL database is to install the driver needed to connect to that database. MySQL provides a number of drivers (called connectors) that let you connect to a MySQL database. To connect from SSIS, you can use the Connector/Net driver or the Connector/ODBC driver. The Connector/Net driver provides an ADO .NET interface to the MySQL database, and, as the name suggests, the Connector/ODBC driver provides an ODBC interface. In the MySQL_SQLServer solution, I demonstrate both connectors.
You can download the two connectors at www.mysql.com/products/connector. The site provides information about downloading and installing the connectors and points you to other references that explain how to use the connectors. I encourage you to review these materials before installing either connector. Because the site so thoroughly documents each driver, I won’t go into the driver details here. However, to create the MySQL_SQLServer solution, you’ll need to download and install both connectors. Follow the instructions on the MySQL website, and for both connectors, perform a default installation.
To use the Connector/ODBC driver, you must create a Windows ODBC data source after you’ve installed the driver. The data source should point to a MySQL database. For this article, I’m using the Bookstore database, which contains two tables, Authors and Books. I wrote an SQL script, MySQL_BookstoreDB, to create the database and populate the tables with a few initial rows; you can download this code, as well as the other files for the MySQL_SQLServer solution by clicking on the Download the Code link at the top of the article. You’ll need to run this script against a MySQL installation before you can implement the solution. Note that, for this solution, I installed MySQL 5.0 (a default installation) and SQL Server 2005 Developer Edition on the same Windows XP computer.
After you create the Bookstore database in MySQL and install the Connector/ODBC driver, you can create the Windows ODBC data source. To support my solution, I created a system Data Source Name (DSN) ODBC data source called mysql and selected MySQL ODBC version Driver (aka the Connector/ODBC driver; the driver is displayed with this name after it’s installed) from a list of existing drivers. Note that if you installed MySQL on a port other than the default, you’ll need to enter the correct port on the Connect Options tab in the Connector/ODBC Configure Data Source Name dialog box. For more information about creating an ODBC data source, see Windows Help. (You don’t have to perform any steps to use the Connector/Net driver; it’s ready to go as soon as you install it.)
Step 2: Defining Connection Managers and Control Flow
Now that you’ve set up your drivers and ODBC data source, you can create your SSIS solution. You can create the solution in either Business Intelligence Development Studio or the full version of Visual Studio 2005 or later. (Use the Integration Services Project template to create your SSIS package.) When creating the SSIS package, you should start by defining the connection managers. But before you do so, run the SQLServer_BookstoreDB script, which you can also download through the hotlink at the top of this page, to create the target database (Bookstore) in SQL Server. Then create a connection manager to the Bookstore database.
To test both MySQL drivers, you need to create a connection manager for each one. Let’s start by creating the connection manager for the Connector/ODBC driver or more specifically, for the mysql DSN. Intuitively, you’d think that you simply create an ODBC connection manager. However, this approach doesn’t work. Instead, you must use a Microsoft .NET Framework ODBC data provider, which is an ADO.NET connection type, to create the connection manager. To create an ADO.NET connection, right-click the Connection Managers pane in Business Intelligence Development Studio, then click New ADO.NET connection. In the Configure ADO.NET Connection Manager dialog box, click New. In the Connection Manager dialog box, select .Net Providers\Odbc Data Provider from the Provider drop-down list. Also select mysql as the system DSN. Because the mysql DSN includes the username and password, you don’t need to specify it here.
Now create the connection manager for the Connector/Net driver. Once again, you create an ADO .NET connection. However, this time you should use the provider .Net Providers\MySQL Data Provider. In the Connection Manager dialog box for this provider, you must enter the name of the MySQL database, server, user, and password. In our example, I specified Bookstore as the database and localhost as the server. In addition, I used the root user account to access MySQL, but typically you’d use a more restrictive account. Note that after you enter the password and close the dialog box, the password is no longer displayed.
The next step in creating the SSIS solution is to add two Data Flow tasks to the control flow, one for each MySQL driver. Each task will retrieve data from one of the tables in the MySQL database. For this solution, I used the Connector/Net driver to retrieve data from the Books table and the Connector/ODBC driver to retrieve data from the Authors table. After you add the Data Flow tasks, you can view them along with the connection managers you configured (MySQL_Net and MySQL_ODBC, in our example) on the Control Flow tab in SSIS Designer.
Step 3: Configuring the Books Data Flow
After you add the Data Flow tasks to the control flow, you can begin to configure the Books data flow. Start by adding a DataReader source component. To do so, open the component’s editor. Then, on the editor’s Connection Managers tab, specify the connection manager for the Connector/Net driver. In my solution, I named the connector MySQL_Net.
Next, on the Component Properties tab, add a SELECT statement to the SqlCommand property to retrieve the data from the Books table. I used the statement
select * from Books;
I left all other properties with their default values, as I did on the Column Mappings and Input and Output Properties tabs. However, there are a few items on the Input and Output Properties tab that are important to note.
If you expand the Input and Output Properties tab’s DataReader Output tree, as Figure 1 shows, you’ll see a list of input (external) and output columns. Look at the BookID output column. As you can see, the data type assigned to this column is a four-byte unsigned integer, which is consistent with the column created in MySQL. (An unsigned integer is one that doesn’t support negative numbers.) However, the BookID column in SQL Server won’t accept the value in this format, which means that you’ll need to convert the column to the correct data type. You’ll run into the same problem with the NumIn- Stock column, although this is a two-byte unsigned integer, rather than four-byte.
Now look at the Title output column’s properties. Notice that the data type is Unicode string. Although the MySQL column definition doesn’t specify Unicode, this is how the data is saved. However, this isn’t how the column was created in SQL Server. Consequently, you’ll need to convert this column, too. You don’t have to worry about the DateModified column’s data type. SSIS retrieves this as a timestamp column, which is compatible with SQL Server’s DATETIME column.
Because you have columns that are incompatible between the source and destination, you’ll have to convert those columns. To do so, your next step is to add a Data Conversion transformation to the data flow and connect the source to the transformation.
Continue to page 2
Now open the transformation’s editor. You’ll need to convert the BookID, Title, and NumInStock columns, as Figure 2 shows. As you can see, you must convert the unsigned integers to signed integers of the same size and convert the Unicode string to a regular string. When you convert the string, leave the length as 60. This is the column length in both the source and destination databases. Also assign new column names to the outputted data. I simply added the number 2 to the names.
Finally, after you’ve converted the data, you can send it to SQL Server. To do so, add a SQL Server destination component to the data flow and connect the transformation to the destination. In the component’s editor, select the connection manager for the SQL Server database and the Books table from that database. For this solution, I’ve named the connection manager BookstoreDB.
Now make sure that you correctly map the columns from the transformation to the destination. Look at the Mappings page of the destination’s editor, and you’ll see that I mapped the converted columns to their respective destination columns but left the Date Modified column with its default mapping.
That’s all there is to it. As with any SSIS solution, the key is to make certain that the source data is properly converted to the format necessary to be inserted into the destination database. You can see the final Books data flow in Figure 3. Now let’s look at the Authors data flow.
Step 4: Configuring the Authors Data Flow
As you did for the Books data flow, your first step is to add a DataReader source component, configure it with the correct connection manager (in this case, MySQL_ODBC), and define the SQL statement needed to retrieve data:
select * from Authors;
Now look at the Input and Output Properties tab. This is where things get interesting when using the Connector/ODBC driver. You can see the AuthorID column properties in Figure 4. Notice that, unlike the BookID column, the data type is now an eight-byte signed integer, although the AuthorID definition is the same as BookID. Thus, you’ll need to convert AuthorID to a four-byte signed integer data type. Now look at the FName column’s properties. As with the Title column, the data type is a Unicode string; however, notice that the length is 21, although the column definition in MySQL specifies 20. The MName and LName columns exhibit the same behavior.
When you find a discrepancy of this nature, keep an eye out for the driver adding an extra space before or after each value. To protect against this possibility, you can use a Derived Column transformation to trim the extra spaces. Figure 5 shows how you’d trim the three columns that contain a string value. Notice that you use a TRIM function and simply replace the column with the trimmed values.
After you create your derived columns for the string values, you can then add a Data Conversion transformation and connect the Derived Column transformation to the Data Conversion transformation. In the Data Conversion transformation, convert the AuthorID column to a four-byte signed integer and convert the three name columns to a regular string with a length of 20. You don’t have to worry about the Date- Born or DateDied columns. Although they’re defined as DATE columns in MySQL, SSIS imports them as timestamp columns. From there, add a SQL Server source component and be sure to map the converted columns to their respective destination columns. When you’re finished, open the Data Flow tab to view the Authors data flow that you’ve configured, as Figure 6 shows.
On Your Way to MySQL Integration
The solution I’ve shown you is, of course, a basic one, but it does give you an overview of how to retrieve data from a MySQL database and some of the issues you might have to address when you retrieve that data. Although I showed you how to use both MySQL drivers, you’ll no doubt use only one in your SSIS package, and for the most part, you’ll probably want to go with Connector/Net. However, I wanted to show you both connectors in case you encounter situations in which Connector/ODBC has already been implemented or you have another reason to use ODBC. Regardless of the solution you ultimately implement, you should now have a good idea of how to access a MySQL database from SSIS.