R2 Parallel Data Warehouse (PDW) isn't just about the appliance. It's also about the users who will query and analyze the data stored in it. But users can't simply log on to the appliance and access the data. Client connectivity tools and the tools to query PDW need to be installed on their computers.
I'll walk you through how to install and configure the client connectivity tools and the primary query interface. I'll also show you how to connect to the PDW Administration Console and how to configure some SQL Server(BI) tools to work with PDW.
Installing the Client Connectivity Tools
In PDW 1.0, the client connectivity tools and the tools to query and manage PDW are separate from the SQL Server 2008 R2 toolset. You must install the PDW drivers, then separately install the Nexus Query Chameleon tool for graphical query access.
The first step is to install the client libraries. You'll find the client library setup programs on the Landing Zone of the appliance at C:\Program Files\Microsoft SQL Server Parallel Data Warehouse\Redist. A 32-bit package (ClientTools-x86.msi) and a 64-bit package (ClientTools-amd64.msi) are included. Note that the Landing Zone is the component node of a PDW appliance where you can stage files to load into PDW and run SQL Server Integration Services (SSIS) packages to load data into PDW.
Setting up the client libraries is simple. For the standard setup, start the setup program, click Next in the Welcome dialog box, accept the license agreement, and click Next again. For the type of setup, select Typical and click Install. You'll be prompted to elevate to administrative rights. After you do so, the setup program copies and registers the client libraries for PDW. When setup completes, click Finish to exit the wizard. If you want to install both the 32-bit and 64-bit drivers, simply follow these steps for both the x86 and x64 packages.
After you have installed the client tools components, you can make connections using the provided drivers (OLE-DB, ODBC, and ADO.NET). If you plan to use SSIS with BIDS in your PDW appliance, you'll need to install the PDW SSIS Destination Adapter. You'll find the setup file for a 32-bit installation (SSISSQLPDWDest-x86.msi) and the 64-bit installation (SSISSQLPDWDest-amd64.msi) on the Landing Zone. If you don't plan to use BIDS on your client computer, you don't need to install the destination adapter.
At this point, you can use the command-line query tool for PDW, dwsql.exe. Much like osql.exe or sqlcmd.exe, dwsql.exe lets you query PDW from a command-line environment. You query PDW using the Dynamic SQL (DSQL) query language, which is similar to but not quite the same as. The DSQL syntax, including connectivity parameters and supported language constructs, is available in PDW Books Online, which you receive with PDW.
Besides running queries from the command line, you can use the graphical query tool, Nexus Query Chameleon. Before you can use it, though, you need to install it.
Installing the Primary Query Interface
Nexus Query Chameleon is the primary query interface you'll use for graphical queries against PDW. You install this query tool from the same location as the other files on the Landing Zone. The files are NexusSetup32.msi and NexusSetup64.msi for 32-bit and 64-bit environments, respectively. Unlike the client components, you must set up the appropriate Nexus client for your computer (i.e., if you have a 64-bit Windows installation, you must install the 64-bit version of the Nexus client).
When you start the setup program, you'll get the traditional Welcome screen. Click Next, accept the EULA, accept the default file location and visibility choices, and click Next to confirm the installation. You'll be prompted to elevate to administrative permissions during the installation, after which the setup will complete.
Now that you have the Nexus client installed, you can connect to your PDW installation. If you're the first person connecting to the appliance, you need to use the standard SQL Server security login of "sa" as your username and the password for your PDW installation. The hardware vendor will have set the password for you during the appliance installation.
If you're not the first person using the appliance and someone has created an account for you, a login must have also been created for you on the appliance (with the CREATE LOGIN statement) and you must have been granted connect permissions (with the GRANT CONNECT statement). For the Nexus client to work fully, you should have three more permissions granted to your account. All three permissions can be granted with the GRANT SELECT ON OBJECT statement.
- GRANT SELECT ON OBJECT::information_schema.schemata TO <login>
- GRANT SELECT ON OBJECT::sys.databases TO <login>
- GRANT SELECT ON OBJECT::sys.types TO <login>
Configuring the Query Client
The next step is to start the freshly installed Nexus query client. On your Windows Start menu, you'll find the shortcut to the Nexus Query Chameleon in the \Coffing Data Warehousing\Nexus by Tera-Tom folder. You should also find a shortcut created on your desktop. Note that the Nexus tool is a third-party query tool and completely separate from SSMS and the other query tools provided with the other SQL Server editions.
When you open Nexus Query Chameleon for the first time, a wizard will guide you through the initial setup. In the wizard's opening screen, click Next. You'll be prompted to add a Data Source Connection, as shown in Figure 1. The Nexus query tool connects to many different kinds of databases, so its configuration options are not specific to PDW.
In the Source Type drop-down list, select SQL Server PDW as your source type. Because this is probably the first time anyone has configured PDW on your computer, click the Add New button to create a new ODBC Data Source Name (DSN). This action will launch the ODBC Data Source Administrator. Click Add, select Microsoft SQL Server 2008 R2 Parallel Data Warehouse ODBC Client, and click Finish.
This process launches the Microsoft SQL Server 2008 R2 Parallel Data Warehouse ODBC Client Setup dialog box. Give your data source a user-friendly name (I used MyPDW) and a description if desired, then enter the PDW connection information. For this, you need either the computer name or the IP address of your PDW control node. Your administrator will specify the port number. (The system defaults to port 17000.) The default system database is dwsys, but you'll likely be assigned a particular database to use by default. Figure 2 shows the completed ODBC client setup for my test PDW system.
Click Test Connect to verify that you entered the information correctly. When prompted for your PDW username and password, enter your security credentials and click OK. You should get the message that the connection test was successful. Click OK to dismiss that dialog box and OK again to get back to the ODBC Data Source Administrator screen. Your User DSN has now been created and tested, so click OK one more time to return to the Add Data Source Connection dialog box. Select your newly created ODBC DSN in the drop-down list (you might have to click the Refresh button to see the new entry), then enter your username and password. If this is your first time ever logging on to the PDW appliance, select the Alter Login after connection check box because all passwords are set by default to expire on first usage. You should now see a screen like that in Figure 3, completed and ready for you to proceed.
Click Save. If you selected the Alter Login after connection check box, you are prompted to enter your new password. Enter your new password, and then click OK. Congratulations-you have now successfully connected to your PDW appliance.
There will be a tree of your available systems on the left side of your screen, as shown in Figure 4. If you expand a PDW system, you'll see a list of the available databases. If you expand a particular database, you'll see the tables and views in it.
As you can see in Figure 4, I entered a query to find out how many rows are in the Sales_Item table in the Sample_Retail_5X database. When you enter a query, the Nexus query tool will highlight the System field on the toolbar. (I circled it so you can find it more easily.) You need to select the active system you're using for the query (MyPDW in this example). The Database field will then default to the system database dwsys.
You run a query by clicking Execute on the toolbar. When I ran my query, the USE command changed the database context to the specified database (Sample_Retail_5X), just like in a typical SQL Server database query. The number of rows in the Sales_Item table was then counted. As you can see on the Results 1 tab, the table has more than 7.6 billion rows.
Accessing the PDW Administration Console
The PDW Administration Console is a website hosted on the control node of the appliance. The console provides a variety of information about your PDW system. To access the console, open Microsoft Internet Explorer (IE), then navigate using a secure connection to the IP address of your PDW control node. You'll be prompted to log on to the appliance, just as when you were using the Nexus query tool. Once connected, you'll see a screen like that in Figure 5.
You have access to a significant amount of information in the PDW Administration Console from the following tabs:
- Sessions-This tab provides information about the active sessions on your appliance. You have the ability to disconnect any session if you have the administrative rights to do so.
- Queries-This tab shows queries that are currently running and provides a history of recent queries. You can also access showplans from this tab, which I'll discuss shortly.
- Loads-This tab contains information about current and historical data-loading sessions. For example, you can find out the commands that were run in a past data-loading session, the steps processed during that data load, the start and end times of the load, and the tables affected.
- Backups/Restores-This tab contains the history of every backup and restore operation that has occurred on the appliance. You can drill down to see the database affected, the backup or restore mode, who ran the backup or restore operation, and the commands used for that operation.
- Appliance State-This tab lets you quickly check the health state of the appliance. If anything is in a trouble or error state, you'll get a graphical indication. You can drill down to see more detailed health-state information.
- Alerts-This tab reflects any Critical Alerts or Warning Alerts, again with the option to drill down into detailed diagnostics information about the error or warning.
- Locks-This tab gives you a view of all locks affecting the appliance, letting you drill into the details of the lock. You also can cancel a lock by terminating the query causing it.
- Performance Monitor-This tab lets you configure custom graphs of information about disks, memory, processes, processors, and more. After you select the graphs you want to monitor, they'll refresh automatically, providing you with a custom troubleshooting console for your PDW appliance.
Looking at Showplans
A key use of the PDW Administration Console is to look at showplans, which can help you optimize queries and database designs. Showplans are automatically captured for queries run in PDW. I ran a more complicated query and received the showplan in Figure 6. You can drill down into these steps, down to the level of finding out what operations were performed on each PDW node to accomplish the query. When queries are run against a distributed table, you'll see that operations occur on every PDW node, as Figure 7 shows.
With the showplan information in hand, you can begin the process of tuning and optimizing both your database design and your queries to improve performance.
Using SQL Server's BI Tools with PDW
As you might expect, you can use SQL Server's BI tools in PDW. Let's step through the main BI tools and look at how you can connect to PDW with them.
SQL Server Analysis Services (SSAS). To configure SSAS to work with PDW, start a new SSAS project in BIDS. Right-click the Data Sources folder and select New Data Source to launch the Data Source Wizard. Navigate to the Define the Connection dialog box and click New to bring up the Connection Manager dialog box. From the Provider drop-down list, select the ADO.NET provider named Microsoft SQL Server 2008 R2 Parallel Data Warehouse.
Next, you need to enter the IP address of your PDW installation for the Host parameter. Be sure the port is set to 17000 (or whatever your server is configured to use). Specify which database to use in the Database property, which is at the top of the dialog box under Advanced selections. You also need to enter your username for User ID and your PDW password. Click the Test Connection button to confirm connectivity, and click OK. Click Finish to complete the Data Source Wizard.
At this point, you'll proceed through the Data Source View Wizard as you would with any SQL Server installation. You can then build your cubes and dimensions on top of PDW.
SSIS. Before you can configure an SSIS package to work in PDW, you need to enable the SSIS PDW SQL Server Destination connector. Open BIDS and select the Choose Toolbox Items option on the Tools menu. Go to the SSIS Data Flow Items tab. Find and select the SQL Server PDW Destination check box. Click OK to enable this connector in your graphical interface.
If you right-click Connection Manager in your SSIS package, you'll see that your PDW connector is already available. To use it, you simply drag the SQL Server PDW Destination icon from the Data Flow Destinations list to the Data Flow design surface, just like you would any other data destination.
(SSRS). Configuring SSRS to work with PDW is similar to the processes used for configuring SSAS and SSIS. Connectivity is through the ADO.NET provider. After you define the connection, you are good to go, just as you would be with any other SQL Server connection. Keep in mind that programmability through stored procedures in is not the same as it was in SQL Server 2008 R2, so your reports will not necessarily work unchanged between SQL Server systems and PDW systems.
The Microsoft SQL Server PDW team has released a series of connectors to simplify connectivity between PDW and several third-party data products. At the time of this writing, connectors are available for the Apache Hadoop, Informatica PowerCenter, MicroStrategy BI Platform, and SAP BusinessObjects Data Integrator. With these connectors you can interoperate between PDW and the other BI products to allow optimal integration and speed of connectivity.
PDW Is Also About the Clients
Basic connectivity to the PDW appliance requires that you install and set up client drivers and client connectivity components. I explained how to set up that connectivity. I also explained how to install and set up the Nexus Query Chameleon on the client computer so that you can run queries. Finally, I showed you how to access the PDW Administration Console so that you can view and analyze showplans and other BI data. By taking advantage of the showplans, you can optimize the queries you run against your PDW appliance.