Why data warehouses fail--and what you can do about it
Data-warehousing projects are prone to failure. How prone is subject to debate. Various studies show wildly different percentages of failure. For example, according to the Data Warehousing Information Center article "The Case Against Data Warehousing" (http://www.dwinfocenter.org/against.html), the failure rate is between 10 and 90 percent. Failures are variously defined as cost or time overruns, inability to deliver key objectives, or cancellation of the entire warehousing project. Regardless of how you define failure, data-warehousing projects frequently fail, and when they do, the failure can affect more than just that project. Often the cost of failure is high, not just in terms of money and lost productivity, but in the loss of the IT department's credibility with the rest of the organization. Given these risks, why would anyone want to implement a data warehouse? The benefits of warehouses, such as the ability to support faster, more informed decisions and to let knowledge workers answer their own questions, can far outweigh the costs of building the warehouse.
I can't offer you a silver bullet to ensure data-warehousing success, but I've set up data warehouses at more than a dozen companies. From this experience and interviews with several other data-warehousing consultants, I've assembled this description of common problems that cause data warehouses to fail. If you're aware of these problems, you can work to mitigate their effects before they derail your data-warehousing project. Although this article doesn't cover every problem that can occur, you can use it as a checklist to gauge risks in your data-warehousing project.
Corporate politics is a major reason for data-warehouse failure and unfortunately also one of the toughest problems to resolve. For example, some organizations can become enmeshed in turf battles over not just resources but also the vision of the project. As Don Awalt and Brian Lawton explain in "Data Warehousing: Back to Basics" (February 2000, InstantDoc ID 7833), warehousing projects work when they support a clearly established vision or mission that the organization defines from a business perspective. As you define this vision, some members of the organization might not support the vision because it conflicts with their groups' agendas, requires resources from their groups, or they feel the vision somehow threatens them. Sometimes the solution to this conflict is more politics—you might work with managers to rebalance the resources the data-warehousing project requires.
Other times you need less politics. You might cut through red tape by enlisting the support of a business champion. Good business champions often come from an organization's sales, manufacturing, finance, or human resources areas. These business areas typically have the most pain and therefore the most to gain from warehousing. For example, manufacturing organizations are always looking for better ways to identify and track quality problems by differentiators such as product, shift, and assembly line. Data warehousing can provide rapid access to this kind of information. Champions can help by acquiring proper resources and coordinating end users and people outside IT to answer questions and work on the project.
To get a champion's support, you need to clearly define the problems the warehouse will solve. To do this, target a real problem that is causing pain for someone high up in the organization. If you can show high-ranking people how the project will alleviate their pain, you can enlist their support as business champions for the project. Business champions who have sufficient clout in the organization can knock down many barriers. Champions are also less likely to see a data-warehousing project as a budgetary black hole if that project is going to relieve real pain for the champion's business unit. For example, if you have a plan to eliminate a specific financial problem and the CFO is your business champion, you can surmount nearly any barrier. Warehousing projects that IT champions alone are rarely successful. Simply put, a data warehouse shouldn't be an IT initiative—it should be a business initiative. Focusing on solving a real business problem helps narrow the scope of the project and motivate business users to participate in the project, from defining the warehouse's goals to evaluating client tools.
Inadequate User Involvement
Although it might be hard to believe, some IT departments think they can build a warehouse without user input because they believe they know what the warehouse should accomplish. Some IT departments view data-warehousing projects as database projects—which is partially true—but data warehouses are useless if end users can't work with the data. IT departments don't always build warehouses that users can easily access or that meet crucial business needs. Unless you understand exactly what questions users will be asking, you can't design a cube that provides the right answers.
To make sure you're building what users need, involve them early in the process and keep them involved so that they can verify the warehouse is meeting those needs. Have users evaluate client tools, then choose the tools that will deliver the proper mix of power and flexibility.
As part of involving users, you must also establish a proper training plan. Training should cover how to use the client tools, include education about cube design, and show users how to navigate cubes to answer ad hoc questions.
Before turning users loose in a data warehouse, teach them how to formulate questions. Ask them to figure out what they want to see and how they want to see it. The structure of the question should be: "I want to see <what> by <this> and by <this>." The <what> they want to see is the fact or measure, and anything after the word "by" is a dimension. For example, a user might ask to see the sales for 2004 by product and quarter. This simple exercise really helps users understand how to navigate a cube. Remember that you're likely to have various kinds of users, including developers, analysts, and end users, and the training will be different for each group.
The Mother of All Data Warehouses
You can put your data-warehouse project in jeopardy if you decide that right away you need to build a complete enterprise warehouse that addresses all your organization's data needs. Creating an enterprise data warehouse presents problems of scope and time. For example, one company that I worked with attempted to build an enterprise data warehouse from scratch because the people involved believed that was the right way to do it. Initially, the company's budget of $2 million seemed sufficient, but 18 months later, it had detailed documentation, no warehouse, and no money left.
Focusing your efforts will help ensure your success. Most companies start by building individual data marts. The processes involved in building a data mart are the same as those for building a data warehouse, but marts are smaller and typically focus on one department. By using the data-mart approach, you can pick a specific problem and solve it quickly. Solving one major problem in 3 months is usually far more successful than trying to solve all your organization's problems and delivering the benefit at the end of 2 years.
Data marts aren't without problems, of course. A major challenge with data marts is making sure you can later combine two or more marts into a warehouse. You have to design the data marts so that similar dimensions in different marts are identical in structure. When you use the same dimension structure across marts, the dimensions are called conformed dimensions. Creating conformed dimensions requires that you create one version of the truth for the organization. In other words, you need to clean up your data so that all departments are using one customer list, one product list, and so forth. As I explain in a moment, cleaning up data can become a significant project on its own. But spending extra time cleaning up data for your first data mart will help ensure that you produce conformed dimensions, will make the building of future marts easier, and will make the eventual rollup into a warehouse easier. Because the first few marts don't have to solve all your organization's problems—just the problems you want those data marts to solve—you typically don't have to define all the dimensions before you build the first mart. For example, your company might want to eventually build a warehouse that includes sales, marketing, manufacturing, human resources, and finance operations. But if the first data mart focuses on manufacturing, you don't need a dimension for sales promotions.
The first data mart you develop should be a quick-hit success: Plan to deliver the completed project in no more than 3 months. If that sounds like heresy, consider that several third-party vendors have built their businesses around delivering rapid data-warehousing solutions. These vendors start with nothing and build a data mart to solve one business problem. The jobs typically last 3 to 6 weeks, and although the vendors put limitations on such jobs, they're successful because they deliver a quick solution. If you want to implement a warehouse and your first deliverable is 12 to 24 months out, you should seriously reconsider your approach.
A cube is the basic building block of a data mart or data warehouse. Because data warehousing is "cool" and most companies have little experience building warehouses, I often see IT departments trying to build cubes that contain every scrap of their organization's data. This is a real problem from an end user's perspective; such cubes are notoriously difficult to navigate.
Cubes have two uses: reporting and analysis. Reporting cubes are often necessarily more complex than analysis cubes, but typically, only IT report builders and analysts use reporting cubes. Analysis cubes are usually far less complex. For example, one of the world's largest vendors of Consumer Packaged Goods (CPG) data resells its data in cubes containing just four dimensions. These cubes are simple for users to navigate yet provide valuable data for the companies that buy them.
There are no set rules for when an analysis cube is too complex, but you can follow some guidelines for designing cubes. First, limit the number of dimensions in an analytical cube to eight. Most nontechnical end users find cubes difficult to navigate when the cubes contain more than eight dimensions. That's not to say a cube with 10 dimensions is automatically bad, but including too many dimensions can greatly reduce usability and user acceptance. I've worked with several companies that complained that users weren't working with the company's data warehouse. When I examined their cubes, I discovered that some had more than 25 dimensions. The users simply found such large cubes too difficult and confusing to work with, and the warehouses sat unused.
One way to create easily navigable cubes is to build simple cubes and link them to make larger cubes. Analysis Services lets you create virtual cubes, which are similar to views in SQL Server. You can use these virtual cubes to link two or more real cubes, or you can create virtual cubes that present simplified views of a more complex cube. For example, you might create a physical cube with 15 dimensions to use for reporting, then create two virtual cubes that contain just a few of the physical cube's dimensions. You can let users access only the virtual cubes, which helps enforce security and ensures that users deal only with the simpler cubes for analysis. Users should tell you how they want to see the data structured and what values they want to calculate.
Incorrect or Inadequate KPIs
Even when their cubes aren't too complex for general use, some warehouses fail because they don't provide the proper key performance indicators (KPIs) as the cube measures. KPIs vary from company to company, but they often include such metrics as sales, cost of goods sold, gross profit, profit margin, inventory turns, and customer retention. Defining and using the correct KPIs for your warehouse might seem like an obvious requirement, but without appropriate end-user input and executive sponsorship, the cubes that IT departments build typically include whatever data is available, not necessarily the data users need. You can't measure the success of a warehouse implementation if you don't understand the business requirements; you need to be able to compare these requirements to the results your warehouse delivers. Knowing your organization's KPIs helps you create an effective, usable warehouse.
For example, a company's data might include the date a customer receives products and the date the customer needed those products, but the KPI of real interest is on-time delivery. How do you determine the on-time delivery percentage for each customer and for a region as a whole? Can you compare a particular customer's on-time delivery average with the average for all customers in his or her industry? Another example of a crucial KPI is customer retention. Your data might contain the dates of your organization's first and last contact with a customer, but users care about whether they've retained the customer. Make sure you give them what they really want.
In one real-world case, the president of a small insurance company complained that he couldn't get a specific 500-page report from the warehouse. After asking a few questions, I discovered that the president was flipping through the report and looking for customers who had significantly cut back on their insurance coverage, as reflected by a drop in premiums. What the president really needed was a metric of the percentage change in premiums from one time period to another. Unfortunately, the builders of that data warehouse had never asked what the data was for. They just cubed up the existing payment information the president requested—without first discovering what the president really needed—and moved on to the next project.
One of the best ways to identify the KPIs that will deliver the most value to the company is to determine what factors the organization uses to calculate bonuses for upper management. Often, a manager's bonus depends on the manager achieving goals that are important to the company's business. So, for example, if a manager's incentive pay is based on increasing sales and maintaining a specific level of customer retention, you'd better build cubes that make it simple for that manager to instantly identify and use those numbers.
Limited or Poor Access to Data
Related to the problem of delivering the right data is the problem of delivering the data to the right people. You can build a warehouse that contains all the necessary KPIs and is easy to use—but still fails miserably. There are various schools of thought on making data-warehousing information available. Many companies expect nothing more from a data warehouse than the ability to produce the same static reports that they create from relational databases. Although this approach is possible, it misses the point of building a warehouse and merely creates a very expensive data repository. To get the maximum benefit from a data warehouse, users typically want to be able to explore, mine, or otherwise analyze data and answer business questions. You need to use appropriate client tools to provide the appropriate level of analytical capabilities to various users, even when those users are geographically dispersed.
Enlist users to help you select a client tool to access the warehouse. Analysis Services has great tools for building and maintaining warehouses, but it has no real tools for client access. Microsoft does offer Microsoft Excel and Data Analyzer for this purpose and more recently has added a powerful new tool with the release of SQL Server 2000 Reporting Services. For most users, who simply need canned reports with basic drilldown capability, Reporting Services gives a good mix of static reports and some limited analytic functionality. You can expand certain items in a report, in effect drilling down to see more detail. And you can filter on certain items. So, for example, you can choose to see data for all products or just certain product lines.
Providing access to reports with some drilldown capability is sufficient for many users. However, as you move higher in the organization, people have less time to play around with data. Some client tools use a scorecard- or dashboard-style UI. A scorecard or dashboard is a screen that shows KPIs in an easy-to-grasp format. If you think about your car's dashboard, you know what this interface can look like. With one quick glance, you can see your speed, engine RPMs, oil temperature, and fuel level. Many client tools let you build a KPI scorecard. The point is to give users a quick visual overview of the most important KPIs. If any of the KPIs isn't measuring as expected, users can drill into the information and discover the cause of the problems.
The users who must delve into the data in greatest detail are analysts. A scorecard isn't meaningless for analysts because it provides a good overview of the data, but analysts typically spend most of their time immersed in more detailed data for complex analyses. To give analysts the details they need, you might consider a powerful, full-featured client application such as Business Objects Crystal Reports, Cognos PowerPlay, or ProClarity Analytics Platform. (For an overview of third-party SQL Server products, see Michael Otey's Windows & .NET Magazine article "Pump Up SQL Server 2000" at http://www.sqlmag.com, InstantDoc ID 42408. The article is free if you register on the site.) Make sure you involve your organization's analysts in the client-tool selection process, but don't rely solely on analysts to help you build a warehousing solution. If you do, you'll likely end up with a design that's too complex for the average user to navigate. Finally, remember that having real end users test the tools you select is crucial to the eventual acceptance and use of the warehouse.
Poor Data Quality
As I said earlier, one essential step to a successful data warehouse is to create a single version of the truth. Users must be able to rely on the validity of the data in the warehouse. If the data is suspect, the warehouse project is destined to fail because users will refuse to use it.
Companies have various strategies for validating data. In some companies, the strategy is to simply ignore dirty data and leave it in the warehouse. In other companies, there is a data "owner," often the business champion or someone in the champion's group, who is ultimately responsible for the data's accuracy. For example, if the CFO is the data owner, someone in the finance department validates all warehouse updates against the accounting system and gives a green light only if everything checks out. Storing several years of history in the warehouse also helps with user acceptance. When users can see that the historical data is in line with their expectations, they're more likely to have confidence that the current data is correct as well.
Don't think that you have to wait until the warehouse is complete before you verify the data. Look at all data coming from various source systems, identify data problems, and fix them at the source when possible. If you can't fix data problems at the source, try to fix those problems during the extraction, transformation, and loading (ETL) process. For example, data-collection screens that let users type a city name into a free-form text field will ultimately lead to errors, such as spelling the city of Ann Arbor as Ann Arbour, Anne Arbor, or AnnArbor. Standardizing such data-entry details makes the warehousing process easier and also provides better data throughout the organization. As you identify incorrect or incomplete data, you can work to fix it while you're designing and building the ETL process.
Another essential factor in the success of data warehouses is proper budgeting for projects. Even with a quick-hit data-mart project, funding estimates tend to be low—often much too low. Initial budgets frequently leave out funding for unforeseen expenses. For example, data-cleanup efforts are notoriously tricky and difficult to predict. Often, organizations don't know about bad data until the data is extracted and ready to load into the warehouse.
When one hospital I worked with loaded its first cube, the pharmacists discovered entries that showed dispensed medications without the name of a prescribing physician. The hospital claimed that the source systems wouldn't allow the entry of a prescription without a prescribing physician. However, the pharmacists checked the data, and sure enough, a bug in the entry system was allowing a small group of users to enter prescription information without the physician's name. Solving this unforeseen problem cost the project time and money but ultimately led to solving other data-related problems.
Don't forget to budget for all parts of the project, including training, software licenses, and hardware. A data mart costs less than a full-blown data warehouse, but the first mart you create might cost more than subsequent marts because of the learning curve. Be sure that your first project is properly funded, or you might run out of money before the mart can deliver any business value.
Less Risky Business
Data-warehousing projects are typically risky propositions. Like any large IT project, their success or failure hinges on several interrelated factors. Data warehouses are rarely, if ever, fully successful when IT drives them alone, so get a business champion on board. You need to reach into the business to understand the KPIs and how to best deliver that information to the users. And remember that users need tools that are easy to use and support varying levels of analytic capabilities. Data warehouses need not fail if you work to avoid the most common mistakes, and your organization can begin to reap the benefits of being able to make informed decisions faster.
| Data Warehousing Information Center|
"The Case Against Data Warehousing," http://www.dwinfocenter.org/against.html
Don Awalt and Brian Lawton
"Data Warehousing: Back to Basics," February 2000, InstantDoc ID 7833
Mark D. Scott and David Walls
"7 Steps to Data Warehousing," February 2000, InstantDoc ID 7834
Mark D. Scott and David Walls
"Step-by-Step Data Warehousing," January 26, 2000, InstantDoc ID 8047