Whether we like it or not, the modern office information environment isn't a tidy place. We would all like to imagine paperless offices in which executives and workers access information through engineered software and in which MCSAs manage information that flows through MCSE-designed, -implemented, and -maintained networks. Instead, the office environment remains chaotic.
No company could afford to stay in business long if staffed by the large numbers of certified systems administrators and systems engineers that contemporary information systems theoretically require. Instead, most organizations rely on high-end office administrators who aren't trained as programmers or DBAs but who have taught themselves the programming skills necessary to maintain corporate information flow at acceptable cost. These self-taught programmers wear many hats and typically have a diverse skill set that includes two essential components: They can use Query Analyzer to understand and write short, structured queries, and they know how to write short routines in Visual Basic for Applications (VBA) that run in either Microsoft Access or Microsoft Excel.
If you're this type of programmer-administrator, the most common problem you face is probably ad hoc data flow from one environment to another in which data often originates from diverse data sources that change unpredictably over time. And in your environment, importing data from comma-separated value (.csv) files and Excel spreadsheets is probably one of your most common tasks. A typical problem begins when a mid-level manager sends new data as an email attachment, often in the form of a headerless .csv file, and instructs you to input the new data into the existing database. Although you can accomplish this task by using Data Transformation Services (DTS) or bulk copy program (bcp), in many cases integrating the import operation with your existing stored procedures would result in a more seamless solution. Unfortunately, T-SQL stored procedures don't support the ability to access external objects such as Excel spreadsheets—or do they? In this article, I show you one way you can enable your T-SQL stored procedures to import data from an Excel spreadsheet. And with a little work, you can extend this technique to access data in Microsoft Word, Microsoft Outlook, or any resource that uses a COM-compliant object library.
Why Not DTS?
A typical DBA might solve the problem of the headerless .csv file by using Notepad to enter the header row into the file, then using the DTS Wizard to import the file into a SQL Server table that the wizard creates. One problem with this process is that the wizard might not create fields of the right data type. Another problem is that this process is labor intensive and difficult to customize. More often than not, the DBA has to do the task manually each time it's required. And the DBA has to use the DTS Designer to maintain and update the DTS package over time. Although the DTS Designer is a flexible tool, it has a complex drag-and-drop interface that requires the user to click around in a maze of arrows and icons looking for places to change data characteristics and insert small SQL or VBA routines. In addition, the DTS Designer creates "hidden" routines whose code exists only in the Designer. This hidden code is difficult to debug, maintain, expand, and update. The process also means the DBA has to spend valuable time learning to use the Designer rather than focusing on programming tasks and end results.
The ideal solution to the problem of the headerless .csv file would be to create a SQL Server table with the appropriate fields, then import the .csv file's data with one wave of a magic wand. The fact is, you can; SQL Server lets you import the data programmatically from inside a stored procedure. This technique results in compact, generalized code that you can extend and repurpose easily. What's more, this technique uses the existing skills of the typical programmer- administrator, with no complex interface and no steep learning curves.
One little-used SQL Server feature is the capability to create a SQL Server Agent job step that runs a short VBScript or JScript routine. One reason DBAs don't often use this capability is that few know that those scripts can access the features of any COM-compliant application resident on the server. When you realize you can embed the scripts inside a SQL Server stored procedure, the entire world of component-object model programming opens up. And almost universally available Microsoft Office applications such as Excel are suddenly able to interact with your SQL Server database.
The listings I provide in this article are useful examples of a methodology that is, for most programmer-administrators, considerably easier to use than DTS. But it's important to realize that this technique isn't merely an easy replacement for DTS. You can use this technique to drive any COM-compliant application, including Word, Outlook, and Microsoft CRM. If a program's object model is available, you can manipulate it from inside a SQL Server stored procedure.
CSV to XLS
Listing 1 creates a stored procedure, Util_CSV_to_Excel, that contains a short embedded VBScript routine, which uses Excel to open a comma-delimited text file. The stored procedure adds to the .csv file a header row derived from the column names of a specific SQL Server table, then saves the .csv file as an .xls file for later import into the same table. All that's required for the code to work properly is that Excel and Microsoft Data Access Components (MDAC) be installed on the server. The SQL Server Agent service also has to be running, which it usually is on most SQL Servers. One big advantage of converting the .csv file to an .xls file before saving it in SQL Server is that you can give the Excel file to executives who might have their own Excel skill set and prefer data in that form. You could also export the data to Access instead of to Excel.
You can run the Util_CSV_to_Excel stored procedure from Query Analyzer, call it from another stored procedure (as I do in a moment), or run it from an external application such as Access. The stored procedure has five input parameters: @UserName, @TargetName, @SaveAsName, @TargetTable, and @JobName. The @UserName parameter is an arbitrary choice; you could generate a random string to take its place. But regardless, you need to supply a unique value from the calling entity to give the resulting temporary SQL Server Agent job a unique name. @TargetName is the name of the target .csv file, @SaveAsName is the name of the .xls file that the stored procedure will output, @TargetTable is the SQL Server table whose schema the stored procedure will use for the header row, and @JobName is an arbitrary non-unique root name for the job.
Once you declare and set the internal variables, the Util_CSV_to_Excel stored procedure checks the information_schema.columns count of the target table to find out how many columns the stored procedure will be adding to the header row that the code uses in the embedded VBScript routine's FOR ...NEXT loop. The VBScript code is embedded in a text variable defined as nvarchar(3200), the maximum size available for a SQL Server Agent JobStep object script. If you need more space than that, you have to create several job steps with successive layers of code. (For information about how to create a job step, see the Microsoft article "JobStep Object" at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sqldmo/dmoref_ob_j_3gdw.asp.) From here, the task is familiar to any VBA programmer who has had to make one Office application work with another. The component-object models for all Office applications are available in their respective VBA Help systems, and Microsoft provides further information on its Microsoft Office Developer site at http://msdn.microsoft.com/library/default .asp?ur=/library/en-us/dnanchor/html/odc_ancofficedev.asp.
After Listing 1's code uses ADODB to upload the schema into a Recordset object, it iterates the column names into the Excel worksheet cell by cell and saves the worksheet as an .xls file. The VBScript component of the stored procedure, with the T-SQL variables replaced by literal text, would work if you pasted it into a SQL Server Agent job and ran it. All that remains is to create and run the job, which you can do by using the prefabricated stored procedures in the msdb database. Information about how to use these procedures is available in Query Analyzer's T-SQL Help. The one thing your program needs to do during the job creation is set the scripting language to VBScript. This option is in the Enterprise Manager SQL Server Agent scripting tool but, for some reason, got left out of the canned stored procedure. SQL Server Agent job steps are in plain sight—in the msdb sysjobsteps system table (an obvious name, that). All you have to do is update the scripting language name field called database_name for your job ID, which SQL Server output when you created the job with the sp_add_job procedure. Be careful—I have only one job step in this example job. If you do something more complicated, you'll need to pick the right step.
You need to note one other thing about this stored procedure: When you set up the ADODB connection, "Initial Catalog" is the database name that appears in the connection from which you'll load the Recordset. Initial Catalog is hard-coded in Listing 1 because the Util_CSV_to_Excel stored procedure is looking at the server and database on which it resides. Alternatively, you might want to further generalize the code by feeding the catalog name in as a parameter or even storing it in a SQL Server table for later lookup.
Now that you have a stored procedure that takes a .csv file and makes it into an Excel file with a header row derived from the table schema, it's time to insert the Excel file's data into the SQL Server table where it belongs. Before you can do that, you have to know the answers to two questions. One, of course, relates to the common problem with all deployed and unsupervised code: What happens if the stored procedure starts and never stops—running wild, as it were? The other is, Did the stored procedure do its job? Finding the answers to these questions is complicated by the fact that the scripts launched from SQL Server Agent job steps run asynchronously; they come from the job, and your code then moves on to whatever step comes next.
Listing 2 shows a simple way of learning the answers to both these questions. You supply the stored procedure in Listing 2 with the same @JobName you supplied to the stored procedure in Listing 1. Then, you give the code a numerical countdown (a primitive loop) that tells it whether to cancel the job (as callout A in Listing 2 shows) and lets the code know what you want it to do once the countdown has elapsed. For example, in Listing 2, I've specified that the return value 1 means the job has failed. If the code in Listing 2 gets a return value of 1 from the SELECT statement at callout B, it stops the job. You can accomplish this kind of job verification in several ways. But the simplest way is to use a loop that checks on the job however many times you told it to in the last_run_outcome field of your job step in the msdb sysjobsteps table. Listing 2's stored procedure tells you whether the job has stopped and, if so, whether it failed or succeeded. Once the timeout value you specify is exceeded, you have the option of canceling the job by using the msdb sp_stop_job stored procedure. One reason for letting the job run is so that you can investigate DBA-style to see whether a problem exists or whether you simply need to increase the timeout value.
Finally, you're ready to insert the Excel file you created into the SQL Server table you created for it. Listing 3's stored procedure, Util_Excel_to_Table, accomplishes this task by attaching the Excel table as a linked server, using SQL Server's convenient built-in stored procedures. Listing 3's code then uses the simple INSERT statement at callout A to move the data from the file to the table. Notice that the INSERT statement treats the Excel file as if it were part of a distributed transaction. The statement uses a made-up (and meaningless) server name separated from the table name (the name you earlier gave your worksheet) by three dots because the other two pieces of the four-part name aren't required.
In effect, this stored procedure calls the procedure in Listing 1 to create the Excel file, then calls Listing 2 to wait until the job is finished. When the job does finish, Listing 3's stored procedure moves the data to its destination. When the task is complete, the stored procedure drops the server link, then uses sp_cmdshell to run the MS-DOS Del command with a mask you supplied as an input parameter.
Because Listing 3 is the starting point for program execution, the input parameters require some explanation. @UserName is a unique identifier; the code appends it to the job-step name to distinguish the job step when multiple sources in the network are calling the same routine. The parameter you input for @UserName doesn't have to be an actual username, merely something that won't be duplicated between calling entities. @JobName is the name of the job the code will create. @SourceFile is the .csv file that contains the data you're importing. @Target-File is the .xls file the code is creating. @Target-Table is the SQL Server table that supplies the header-row data and into which you're importing the data. @KillMask is the list of temporary or source files the code will delete when the job is done. If you want to retain the files permanently, you have to delete this parameter and the command using it (in this case, the @KillCmd variable) from the stored procedure. If you were running Listing 3 from Query Analyzer, for example, the command line might be as follows:
I used the techniques in this article to automate a fairly simple task set, but you can use them to perform any of the million-and-one routine DBA tasks we all know and loathe. You can use these techniques to automate any task associated with the COM-compliant applications on your environment's network, including most of your network-administration tasks. You can also reverse these procedures to export SQL Server tables and views to Excel or Access, for example, and use SQL Server's built-in email capability to send your output as attachments to an automated distribution list. You're limited only by your imagination and the needs of your job.