Getting Started with Parallel Data Warehouse

A peek at SQL Server 2008 R2's new edition

What is in this article?:

  • Getting Started with Parallel Data Warehouse
Downloads
125098.zip

This summer Microsoft will release the SQL Server 2008 R2 Parallel Data Warehouse (PDW) edition, its first product in the Massively Parallel Processor (MPP) data warehouse space. PDW uniquely combines MPP software acquired from DATAllegro, parallel compute nodes, commodity servers, and disk storage. PDW lets you scale out enterprise data warehouse solutions into the hundreds of terabytes and even petabytes of data for the most demanding customer scenarios. In addition, because the parallel compute nodes work concurrently, it often takes only seconds to get the results of queries run against tables containing trillions of rows. For many customers, the large data sets and the fast query response times against those data sets are game-changing opportunities for competitive advantage.

The simplest way to think of PDW is a layer of integrated software that logically forms an umbrella over the parallel compute nodes. Each compute node is a single physical server that runs its own instance of the SQL Server 2008 relational engine in a shared-nothing architecture. In other words, compute node 1 doesn't share CPU, memory, or storage with compute node 2.

Figure 1 shows the architecture for a PDW data rack.

The smallest PDW will take up two full racks of space in a data center, and you can add storage and compute capacity to PDW one data rack at a time. A data rack contains 8 to 10 compute servers from vendors such as Bull, Dell, HP, and IBM, and Fibre Channel storage arrays from vendors such as EMC, HP, and IBM. The sale of PDW includes preconfigured and pretested software and hardware that's tightly configured to achieve balanced throughput and I/O for very large databases. Microsoft and these hardware vendors provide full planning, implementation, and configuration support for PDW.

The collection of physical servers and disk storage arrays that make up the MPP data warehouse is often referred to as an appliance. Although the appliance is often thought of as a single box or single database server, a typical PDW appliance is comprised of dozens of physical servers and disk storage arrays all working together, often in parallel and under the orchestration of a single server called the control node. The control node accepts client query requests, then creates an MPP execution plan that can call upon one or more compute nodes to execute different parts of the query, often in parallel. The retrieved results are sent back to the client as a single result set.

 »

Discuss this Article 2

afomchenko
on Apr 12, 2011
From the list of new commands it is evidently seen that the principals to divide information in PDW are the same like in usual Federated Distributed Database. It was done by using horizontal tables splitting. But, all management of distributed queries was shifted to the primary PDW node. Moreover, the old-fashion idea of manual data distribution was embedded into T-SQL syntax. However, it is still crucially important to design a database structure properly. Otherwise, all you investments will be spent for nothing. According to this, it could be very interesting to compare performance of Federated Distributed Database and Parallel Data Warehouse because theoretically they are quite similar. Nevertheless, thus it is not so easy to get an access to really operated Parallel Data Warehouse solution, I am steal curious how faster is it in compare with the predecessor technology operated on base of Federated Distributed Database architecture with the same amount of hardware. For more information: http://sqlconsulting.wordpress.com/2011/04/13/inside-of-parallel-data-warehouse/
phani.sub
on Aug 6, 2010
The article looks excellent. Going through this I have a question. Can we move existing database to PDW? What are the pre-requisites? For example, I have fact table which is huge and doesnt have partitions implemented, how can I store that in PDW. Request you to throw some light on this.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.