BI Without the Data Warehouse

In 2002, a study from the University of California, Berkeley suggested that it had taken humans 300,000 years to generate 12 million terabytes of data. Now we can generate that much data in only two or three years. In fact, we have more data than we can handle. We seem to think that somehow all of this data is going to help us make better business decisions; however, it’s doing nothing but taking up space until we can turn this data into useable business information. Business intelligence (BI) and data warehouses, by definition, are all about turning data into useable business information. Regardless of the data source—raw transactional data or extracted, transformed, and loaded data warehouse data—BI’s sole function and purpose is to turn that data into useable information to help make business decisions.

Recently, the subject of BI without the data warehouse has been discussed in the SQL Server community. Initially I thought this idea was terrible; however, after doing some research I’m finding that there's some validity to this approach, with caveats.

Analyzing Raw Transactional Data


With OLTP databases already approaching or exceeding terabytes in size, duplicating the data in a data warehouse (or staging database plus data marts) and transporting data from multiple data sources to a centralized repository (i.e., the data warehouse), which eats up network bandwidth and time and consumes endless CPU cycles in transformation processes, seems like a budget-buster of an idea. If you can trust the data in your OLTP database and non-relational, legacy systems, then creating what’s essentially a virtualized data warehouse environment could save you time and rescue your budget.

Why would you want to analyze raw transactional data before it has been verified and validated? There's one situation in which you’d want to do exactly that—if you worked for an enterprise-level organization in which there’s a discrepancy between how management believes the business operates and how the business actually operates. Analyzing raw transactional data can help you expose the gap between perception and reality.

Traditionally, the data in a data warehouse has been cleansed, decoded, reorganized, reordered, and basically sanitized before it's provided to the business community. But by using raw transactional data, the bad and the ugly data are exposed with the good data during analysis. The business decision-makers will have an opportunity to evaluate what’s broken and why, as well as have a chance to fix it.

I’m not saying that this approach will work for every company, nor is it necessary for every BI project. If your firm is resistant to a “king has no clothes” type of message, then you probably won’t want to suggest this approach for BI analysis, unless you want to be demoted to the mailroom. However, if your company is a little more open-minded, or if you and your IT staff need to fix a discrepancy or anomaly in a transactional or non-relational data source, then this approach might work really well.

BI without the data warehouse is also a valid approach for continuous data loading for dynamic data elements (e.g., clickstream data, streaming stock market results, real-time construction project management data rendered as dashboards or scorecards), assuming that you’ll stretch the definition of BI to include dashboards. By eliminating the data warehouse step in this process, you can analyze the data flow immediately. You no longer have to wait days or weeks for data warehouse content to refresh. Having immediate access to the data as it’s being captured could give your company a competitive edge. Of course, you must ensure that your OLTP source systems have enough horsepower to manage data streaming in and data streaming out, with potentially complex processing occurring in between.

Why This Approach Works


There are many reasons to resist building a traditional data warehouse. ROI and delivery-point and delivery-time objectives can be very hard to quantify. Funding for any major IT project depends on a clear business case and a solid bottom-line contribution to the balance sheet. Data warehouses don’t lend themselves easily to these conditions. A data warehouse project is invariably large and complex, it might not have a defined business case, and it might be a perpetual “work in progress” or “project black hole” that consumes resources and careers. A study done by the Gartner Group in 2002 stated that “80 percent of enterprises implementing data warehouses will not properly plan for their implementation efforts, and will underestimate the costs related to the data acquisition tasks by an average of 50 percent—leading to cancelled projects or data warehouses delivered with inaccurate or incomplete data.” I doubt that we’re doing much better today, and with statistics like those presented in the study, it’s tempting to forego the data warehouse portion of the project and just use raw source data for BI analysis.

Discuss this Article 1

dcomingore
on Sep 13, 2008
While anyone who has implemented an enterprise-class data warehouse would agree with your statements about data warehouse justification and efficient delivery there are many good reasons why enterprises continue to pursue data warehousing for their decision support needs. For SMBs, I believe a non-DW approach is a better candidate. Why? Enterprise OLTP systems are usually allready pegged on hardware consumption Enterprise OLTP systems entail more complex business logic which usually has a net result of more 'dirty data' or atleast the perception of such Regardless of the business size, it can be quite challenging to perform cross business process analysis via muliple source OLTP systems. For all but the smallest companies and budgets the data warehouse is still king. And the failure rate of data warehousing projects is somewhat attributable to the lack of true DW architects available to match business requirements to design and technology implementations.

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.