Market Watch: Scalable Data Warehousing

Exploring the two types of scaling solutions

What is in this article?:

  • Market Watch: Scalable Data Warehousing

The more technology pervades humans’ daily lives, the more data that’s being generated. Back in 2007, IDC estimated that 45GB of data existed for each person on the planet. Most industry experts agree that the volume of data doubles every two years, so that number is likely much higher now. Many companies want to not only capture this data but also record additional value-added attributes about it. For example, almost every activity that humans (and organizations) perform can be tied to a geographic location, which has spurred the addition of spatial data.

Another trend seen in the past few years is that business leaders are increasingly realizing the value of data-driven decisions. When data is cleansed and accurate, the information gleaned from it can truly deserve the label “actionable information.”

Both the data explosion and the embracement of data-driven decision making are producing the business challenge of how to store and analyze massive volumes of data with a minimal total cost of ownership (TCO) footprint. This business challenge is providing momentum to the scalable data warehousing industry.

The goal of scalable data warehousing is to easily and cost effectively expand a company's data warehouse and thus increase overall solution ROI. Scalability is especially important in today’s economy because enterprise hardware is by no means cheap. Thus, companies need their data warehouse hardware and software platforms to scale with their analytic needs, without a complete retooling. In response to this need, the scalable data warehousing industry has produced two main types of solutions: data warehouse appliances and data warehouse reference configurations.

So what’s the difference between a data warehouse appliance and a data warehouse reference configuration? When explaining the difference to clients, I like to use the LEGO analogy: An appliance is a castle that’s been preassembled from a set of LEGO pieces with a bit of “special sauce” added in, whereas a reference configuration is a similar (but not the same) set of LEGO pieces that you have to assemble into a castle. Let’s take a closer look at each type of solution.

 

Data Warehouse Appliances

A data warehouse appliance is a turnkey solution. Appliances come with hardware and software preconfigured for data warehousing workloads. Several data warehouse appliances use massively parallel processing (MPP) hardware. By using MPP hardware in a shared-nothing architecture, you can create a data warehouse infrastructure in which multiple servers (i.e., nodes) can cooperate to process large quantities of data and queries. To scale out data warehouse appliances, you simply need to purchase additional racks (n nodes per rack).

Although there’s some disagreement as to which company created the original data warehouse appliance, it’s generally accepted that Netezza (now an IBM company) was responsible for generating the initial mainstream interest. Netezza currently offers the Twinfin and Skimmer appliances.

Other data warehouse appliances that you’ll find in the marketplace include:

 »

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.