SQL Server Integration Services transforms data
With SharePoint’s growing popularity in the business world, it’s become an important repository of data. However, this data is often isolated from the rest of an organization’s data system. But by using the native web-service calls in Windows SharePoint Services (WSS), you can access SharePoint list data and use SQL Server Integration Services (SSIS) to transform the resulting dataset from an XML source to a relational database table.
This import routine can be automated via SQL Server Agent and scheduled to run at regular intervals to keep the SharePoint list and relational database in sync. That way, SharePoint list data need not be a data island in the midst of an otherwise integrated data environment.
SharePoint Data Storage and Access
Before we walk through the step-by-step process to integrate SharePoint data with SQL Server, let’s look at list-data storage and retrieval in SharePoint. SharePoint pages are ASP.NET applications whose content resides in a back-end SQL Server database called the content database. Other SharePoint–related data, such as list data and document library attachments, is also stored in the content database.
Although server administrators might be able to access the SharePoint content database and the underlying tables, Microsoft doesn’t recommend or support accessing SharePoint data this way. However, Microsoft does support other ways of accessing this data: by using the SharePoint Service object model or by using SharePoint Services web services. The object-model approach is more development intensive and requires familiarity with a myriad of object-model namespaces, hierarchies, classes, and enumerations. The web-services approach provides a more straightforward alternative. Although it still requires some coding, it’s fairly lightweight in comparison and is the approach we’ll use.
SharePoint Web Services and SSIS
SharePoint Services web services include methods for accessing data on a website, such as for working with lists or site data, as well as methods for customizing content such as meetings, imaging, document workspaces, or search. The web services provide their functionality through the /_vti_bin virtual directory, which maps to the %PROGRAMFILES%\Common Files\Microsoft Shared\web server extensions\12\ ISAPI physical directory in the file system.
The Lists web service for example, which is what you would call to access list data, can be accessed from the path servername/sitename/_vti_bin/lists.asmx. The web-service method calls often require input parameters, typically in plain text or CAML format. CAML is an XML–based markup language used exclusively with SharePoint technologies. As with any web-service method call, the result set is in XML format.
SSIS provides a platform to build a data integration application. Although it uses a Visual Studio (VS) 2005 interface to build an application, it features drag-and-drop usability for rapid development. It also comes standard with a wide array of tools known as Control Flow tasks to connect to various types of data sources and destinations.
Two such Control Flow tasks that we can use for our solution here are the Web Service task and the Script task. Both these objects can connect to a web service to retrieve data. To illustrate this integration, I’ve created a scenario that uses a WSS site containing a list with some pre-populated data.
I created the site by using the Absence Request and Vacation Schedule Management application template, which is available for download from Microsoft (technet.microsoft.com/ en-us/windowsserver/sharepoint/bb407286.aspx). I populated the list, named Absences, with some sample data.
Setting Up a Project and Configuring a Script Task
To create a new SSIS project, open SQL ServerDevelopment Studio. BIDS is a subset of VS 2005 and is included in the default installation of SQL Server 2005 Standard Edition or higher. Create a new project of type Business Intelligence Projects using the Integration Services Project template.
In the project window, expand the toolbox on the left side. This displays a long list of Control Flow items, pre-packaged SSIS items that can be combined to form a data-integration application and dragged and dropped into the project.
The Web Service task is one such item on this list and seems to be the obvious choice. But it can’t support web services that accept complex input parameters such as CAML queries, and to query a SharePoint list, we need to be able to provide a CAML fragment as an input parameter. For that matter, it can’t be used to call a majority of SharePoint web services. (It is, however, compatible with some of the simpler web-service methods such as Webs.)
Instead, we’ll use the Script task, a flexible and extensible SSIS task item that lets you write .NET code and provides a bridge between SSIS objects and external applications. In the scenario here, we want the Script task to call up the SharePoint Lists web service, read certain columns of data from a specific list in a SharePoint site, and store the resulting data in a SQL Server database destination.
Follow the steps below to add a Script task to your SSIS project and configure it.
Step 1: Set up SSIS variables needed in the script. The variables are used to pass parameters to the web-service method call, as well as to capture result data from the call. Table 1 shows some variable names and default values.
Be sure to set default values and replace contents in <SiteURL> and <ListName> to match your environment. <SiteURL> refers to the address of the SharePoint site root that contains the list.
Step 2: Drag and drop a Script task item from the toolbox into the Control Flow tab of the SSIS package. Double-click the Script task item you just added to open the Script Task Editor window and set the script variable properties as follows:
• ReadOnlyVariables: listURL, listName
• ReadWriteVariables: wsOutput
Take note of other properties in this window. The default script language is Visual Basic .NET. This means you can take advantage of powerful features of the .NET platform in your code, such as the web-service framework.
Step 3: Click the Design Script button in the Script Task Editor window to invoke the script editor. The Visual Studio for Applications (VSA) window pops up, which is where you write the code for the SSIS task.
Step 4: To leverage the .NET web-service framework in the script, you need to add appropriate references first. A web-service reference file is a separate .NET class file.
You can generate the class file in a separate VS project and import the file into the SSIS script task. To do this, start up VS and create a new Visual Basic .NET Windows Application project by navigating to Project, Add Web Reference.
In the Add Web Reference window, which Figure 1 shows, specify the URL of the SharePoint site where the list data resides and provide a web reference name (for example, wsLists), then click Add Reference. VS generates the necessary class files to create the infrastructure to call the SharePoint Lists web service.
Step 5: Save and close the Visual Basic (VB) project, then reopen the SSIS project. Browse the folder structure of the VS project in Windows Explorer. Open the folder location Web References\Lists in the project directory, and you will see the files that VS generated automatically.
To import the class file, right-click the project in Solution Explorer and select Add from the context menu, followed by Existing Item, then navigate to the directory containing reference.vb.
Step 6: Back in SSIS script editor, add the following references to the project: System.Web.Services and System.Xml. Open the Reference.vb file you imported in Step 4 above, and add the following code at the namespace declaration towards the top:
Imports System.Web.Services<br> Imports System.Xml <br> Imports Microsoft.SqlServer.Dts.Runtime
Note that the namespace of the Reference.vb file is called “wsLists” (or whatever name you specified in Step 4). Scroll down in the code and locate the Public Sub New() procedure. Replace the line of code that starts with “Me.URL” as follows:
Me.Url = Dts.Variables(“list <br> URL”).Value.ToString()
Remember, the SSIS variable, listURL, specifies the location of the SharePoint list. The above line of code will tell the web service where to look for SharePoint list data.
Step 7: Now that you’ve set up the references and variables, it’s time to write the actual VB code to call the web service. Open the ScriptMain file in the VSA project window.
Replace contents of the code block Public Sub Main() with the code in Listing 1.
The global variable wssListService refers to the wsLists web-service namespace, as defined in the Reference.vb class. Note that this Main subroutine calls another function, CallWebService, and collects the output of the function in a string variable, outputXML. Finally, it sends the output to the SSIS variable, wsOutput.
Step 8: Add code from Listing 2 for the function CallWebService. The GetListItems method of the wsLists Web service returns an XML dataset that matches the query parameters. The query parameters specified in the above method call are
• listName: string value representing the name of the SharePoint list to be queried.
• ndViewFields: CAML fragment of the fields to be retrieved from the list. Each field to be included in the output should be in its own FieldRef element.
• ndQuery: CAML fragment of the query conditions (similar to “where” clause in SQL). This parameter is optional.
• stringRowLimit: specifies the number of rows to be returned in the query. This is also optional.
Now that you’ve set up the Script task to extract data from the SharePoint list via the web-service method call, it’s a good idea to verify that the script task performs as expected. Remember, the extracted XML data is stored in an SSIS variable, wsOutput. One way to verify it will work is to inspect the value in the variable wsOutput populated by the Script task in debug mode, using the following steps.
Step 1: Go back to the Public Sub Main() code block you created above. Insert a debug breakpoint on the line with code Dts.Variables(“wsOutput”).Value = outputXML. Save and close the VSA window and return to the SSIS package window.
Step 2: Execute the package by going to menu Debug, Start Debugging.
Step 3: The VSA window will pop up automatically, revealing the code inside the Script task and pausing at the breakpoint you inserted on Step 1 above.
Step 4: Go to the VSA window menu Debug, Windows, Locals. The Locals window appears at the bottom of the workspace, listing all variables in current scope, including outputXML.
Step 5: In the Locals window, go to the value column of outputXML variable and invoke XML Visualizer by clicking the eyeglass icon menu. Export the text in the XML Visualizer window to a text editor such as Notepad. Save the text as an XML file.
Step 6: Back in VSA, close the XML Visualizer window and continue the code execution from the breakpoint. When the package has finished executing, go to menu Debug, Stop Debugging.
Creating an Inline Schema
The XML document you created in Step 5 above contains a sample of SharePoint list data in a fully valid XML format. This document, however, doesn’t contain an inline schema.
Now you will need to supply sample data as well as a schema file (XSD) that defines the structure of the XML data. You can then use an XML– compatible application such as Microsoft Office Excel 2007 or Microsoft Office InfoPath 2007 to create a schema based on the sample data. The steps below illustrate how you can use Excel to generate a schema from the XML output document that you created earlier.
Step 1: Open the XML file using Excel 2007. When prompted, opt to open the XML file as an XML table.
Step 2: Excel will notify you that the XML document doesn’t have a schema and that Excel will create one for you. Click OK.
Step 3: The schema generated by Excel isn’t visible in the UI. However, it’s accessible programmatically through Visual Basic Editor. To launch a Visual Basic Editor window in Excel, press Alt+F11 (or in ribbon bar go to Developer, Visual Basic).
Step 4: Locate the VBAProject tree on the left of the Visual Basic Editor workspace and double-click ThisWorkbook.
Step 5: In the code area for ThisWorkbook, create a procedure as follows and execute it.
Sub ShowSchemaText() <br> Sheet2.Range(“A1”).Value = <br> ThisWorkbook.XmlMaps(1).Schemas(1).XML <br> End Sub
This extracts the XML property of the first schema contained in the first XmlMaps collection and places the corresponding text in cell A1 of Sheet2. Since the workbook currently has only one XML document, we can safely assume the first items in both XmlMaps and Schemas collection are the items we want.
Export the text value in cell A1 of Sheet2 to a text file and save it with an .XSD extension. Listing 3 shows the XSD schema file contents for the Absence list I used in this example. In the next section you will see how to transfer the data in this variable to a SQL database table using the Data Flow Task item and XML Source adapter.
SSIS Data Flow Task
The Data Flow task is the most common of the SSIS task items, as it’s capable of facilitating the transfer of data from a variety of source and destination types. In our particular situation, the source data is of XML type whereas our destination is a SQL Server database table.
Two of the many data adapters built in to the Data Flow task—XML Source adapter and SQL Server Destination adapter—read the XML data from the SSIS variable and write to the SQL Server destination table, respectively. To set up such a Data Flow task, follow these steps:
Step 1: Drag and drop a Data Flow task item from the toolbox into the Control Flow tab of the SSIS package. Connect the Data Flow task item to the existing Script task item using the precedence constraint (the green line) that originates from the Script task.
Step 2: Go to the Data Flow tab of the SSIS package, which will be empty. From the SSIS toolbox on the right, drag and drop the XML Source adapter (located under Data Flow Sources), the Data Conversion adapter (under Data Flow Transformations), and the OLE DB Destination adapter (under Data Flow Destinations), to the empty area.
Step 3: Double-click the XML Source adapter component to set its properties. In the Connection Manager tab in the XML Source Editor in Figure 2, you can see the following properties have been set:
• Data access mode: XML data from variable
• Variable name: User::wsOutput
• XSD location: <XSDFilePath>, where <XSDFilepath> corresponds to the path of the XSD schema file you created earlier using Excel.
Step 4: Create the destination table manually for the data in SQL Server, if the table doesn’t exist, by executing the CREATE TABLE SQL script elsewhere (not in the SSIS package). I used the SQL script in Listing 4, to create my destination table in Adventure- Works database.
Step 5: In the Data Flow tab, connect the XML Source adapter to the Data Conversion transformation adapter using the green arrow (Data Flow Path) that originates from the XML Source adapter. The reason for using the Data Conversion adapter here is to convert the two date columns in the source XML data (Start Date and End Date columns) from string values to date/time values.
In the Data Conversion Transformation Editor, which Figure 3 shows, I set the data type for these two input columns to data type database timestamp (DT_DBTIMESTAMP\], which corresponds to the SQL datetime data type.
Step 6: Back in the Data Flow tab, connect the Data Conversion Transformation adapter to the SQL Server Destination adapter using the Data Flow Path. Open the OLE DB Destination Editor window and set the Connection Manager properties to point to the destination table you created in Step 4.
In the Mappings page, map the input columns (i.e., output columns from the Data Conversion transformation) to appropriate output columns (i.e., the SQL Server destination table), as Figure 4 shows.
Destination SQL Server
You have now set up the SSIS package to import data from the SharePoint list to the SQL Server destination table. Save the package and give it a test run by hitting F5. If everything goes well, you will see all the package items highlighted in green upon execution.
You should also see the destination table in SQL Server populated with data from the SharePoint list. Keep in mind that this SSIS package is currently set up to import all rows of data from the source list each time it is executed, whether the rows of data already exist in the destination table or not. This can create duplicate rows of data in the destination table each time the package runs.
One solution is to truncate contents of the destination table at the beginning of package execution. Another solution is to use the Slowly Changing Dimension SSIS transformation item, which imports only new rows of data and updates changes to existing rows of data but is beyond the scope of this article.
For now, just add the Execute SQL task item before the Script task in the Control Flow tab of the SSIS package. Set the connection property to connect to the destination database, and type in the following SQL command under the SQLStatement property, which Figure 5 shows:
"TRUNCATE TABLE Absences"
Automating the Import Process
The SSIS package you created above can be run on a regular basis so that any data changes in the source (i.e., the SharePoint list) are reflected in the SQL Server table.
Using the SQL Server Agent, you can set up a job to execute the package at set intervals, such as hourly or daily. To set up a SQL Server Agent, follow these steps:
Step 1: Using SQL Server Management Studio, connect to the SQL Server server that contains the database and the destination table we’re working with.
Step 2: In Object Explorer, right-click SQL Server Agent and select New Job. Type in a job name under General page.
Step 3: Go to the Steps page and add a new step in the New Job Step screen, which Figure 6 shows. Ensure that the package path points to the location of the .dtsx file you created earlier.
Step 4: Go to the Schedules page and add a new schedule for the job. Select the Recurring schedule type to repeat the task on scheduled intervals. Click OK to save.
No More Data Islands
You have now scheduled the SSIS package to be executed automatically during regular intervals, as set up in the task schedule. As I hope you can see, the web-services approach to integrating SharePoint data with SQL Server is relatively painless, and the result is well worth it, ensuring that SharePoint is no longer an isolated data island in your system.LISTING 1: Code to Call the Web Service
Public Sub Main()
Dim taskResult As Integer = Dts.Results.Success
Dim stringListName As String = Dts.Variables("listName").Value.ToString
Dim outputXML As String
wssListService.Credentials = System.Net.CredentialCache.DefaultCredentials
If stringListName.Length 0 Then
outputXML = CallWebService(stringListName)
Dts.Variables("wsOutput").Value = outputXML
taskResult = Dts.Results.Success
Throw New ApplicationException("Invalid SharePoint List")
taskResult = Dts.Results.Failure
Catch ex As Exception
Dts.Events.FireError(0, String.Empty, ex.Message, String.Empty, 0)
taskResult = Dts.Results.Failure
Dts.TaskResult = taskResult
LISTING 2: Code to Call the Function CallWebService
Dim outputNode As Xml.XmlNode
Dim xmlDoc As Xml.XmlDocument = New System.Xml.XmlDocument()
Dim ndQuery As Xml.XmlNode = xmlDoc.CreateNode(Xml.XmlNodeType.Element,"Query","")
Dim ndViewFields As Xml.XmlNode = xmlDoc.CreateNode(Xml.XmlNodeType.Element,"ViewFields","")
Dim ndQueryOptions As Xml.XmlNode = xmlDoc.CreateNode(Xml.XmlNodeType.Element, "QueryOptions","")
Dim stringRowLimit As String = "2000"
Dim outputString As New System.Text.StringBuilder()
ndQuery.InnerXml = "<query></query>"
ndQueryOptions.InnerXml = ""
ndViewFields.InnerXml = "<fieldref name="Title"></fieldref>" + _
"<fieldref name="Start_x0020_Date"></fieldref>" + _
"<fieldref name="End_x0020_Date'/">" + _
outputNode = wssListService.GetListItems(listName, String.Empty, ndQuery, _
ndViewFields, stringRowLimit, ndQueryOptions)
LISTING 3: XML Schema
<xs:element minOccurs="0" maxOccurs="unbounded" name="row">
<xs:attribute name="ows_Title" type="xs:string" use="optional" />
<xs:attribute name="ows__ModerationStatus" type="xs:unsignedByte" use="optional" />
<xs:attribute name="ows__Level" type="xs:unsignedByte" use="optional" />
<xs:attribute name="ows_EventDate" type="xs:string" use="optional" />
<xs:attribute name="ows_AVMStatus" type="xs:string" use="optional" />
<xs:attribute name="ows_ID" type="xs:unsignedByte" use="optional" />
<xs:attribute name="ows_owshiddenversion" type="xs:unsignedByte" use="optional" />
<xs:attribute name="ows_UniqueId" type="xs:string" use="optional" />
<xs:attribute name="ows_FSObjType" type="xs:string" use="optional" />
<xs:attribute name="ows_Created" type="xs:string" use="optional" />
<xs:attribute name="ows_AVMAbsenceType" type="xs:string" use="optional" />
<xs:attribute name="ows_EndDate" type="xs:string" use="optional" />
<xs:attribute name="ows_Employee_x0020_Name" type="xs:string" use="optional" />
<xs:attribute name="ows_FileRef" type="xs:string" use="optional" />
<xs:attribute name="ows_MetaInfo" type="xs:string" use="optional" />
<xs:attribute name="ows_Description" type="xs:string" use="optional" />
<xs:attribute name="ows_fAllDayEvent" type="xs:unsignedByte" use="optional" />
USE \[AdventureWorks\]<br>GO<br>CREATE TABLE \[dbo\].\[Absences\](<br> \[ID\] \[int\] IDENTITY(1,1) NOT NULL,<br> \[Title\] \[nvarchar\](255) NOT NULL,<br> \[EmployeeName\] \[nvarchar\](255) NULL,<br> \[AbsenceType\] \[nvarchar\](255) NULL,<br> \April 22, 2009 12:00 AM \[datetime\] NULL,<br> \[EndDate\] \[datetime\] NULL,<br> \[Description\] \[nvarchar\](255) NULL<br>) ON \[PRIMARY\]<br>GO