In "Implementing Supertypes and Subtypes" (November 2006), I discussed using supertypes and subtypes for data modeling and how to implement the design objects. Now you’re probably waiting for me to give you a real reason for adopting the supertype-subtype design concept. I will.

Whether you realize it or not, your company is failing to capitalize on business opportunities and is losing revenue because it doesn’t have a single source of accurate data, or data truth. And you’re well aware of the time you and your colleagues spend trying to reconcile the discrepancies in reports that you’re producing. Master Data Management (MDM) is a method you can use to target incomplete, inaccurate, and fragmented data that’s stored in various data stores in your enterprise.

What Is Master Data?

MDM is a hot topic in today’s market. I first googled “Master Data” and found 3.2 million entries, then “Master Data Management” and found 975,000 entries. There are seemingly hundreds of software companies selling MDM software. Some companies advertise that by using their products you’ll be able to connect to your legacy systems and fully integrate your core data across the entire enterprise with a mere click of your mouse.

I have news for you: Reconciling disparate data sources isn’t easy. When data warehousing was first introduced, DBAs experienced a revelation: To create a true enterprise data warehouse (EDW), DBAs had to develop common definitions of core data that was used by many of their companies’ applications. By trying to develop these definitions, DBAs found more definitions for customer than they ever expected and struggled with common data definitions and sources.

Many EDW projects fell by the wayside because DBAs couldn’t come to a consensus on a unified view of corporate data. Now, many corporations are revisiting these concepts to try and reconcile their disparate data sources and are using new buzzwords: master data and MDM. DBAs now realize that every piece of data in their company doesn’t need to be categorized, cleansed, and made uniform across the enterprise. Instead, there can be, and probably should be, departmentally based variations in data to support the different functions performed within each department. What we do know is that there is core data—people, places, and things—that needs to be tracked across the enterprise. This is master data.

You can think of master data as meta-data and real data combined. Master data consists of consensus-driven data and data definitions applied consistently across an enterprise. It’s consensus-driven because master data requires more than the IT or data warehouse groups to devise the metadata layer and to determine which set of customer data or inventory data is the real or base data. To gain consensus across the enterprise, you have to involve all the data stakeholders—business decision makers and IT staff—whose job depends on the validity of the data.

Managing Master Data

Managing master data has given rise to the MDM movement. Because MDM is consensus-driven and enterprise-wide, it shouldn’t surprise you that the governing body in an organization that determines what constitutes master data and how to manage it needs to be staffed with cross-functional individuals from the technology and business sides of the enterprise. “Cross-functional” means that these individuals have experience in many facets of business and technology. No one person is expected to know it all, but the more experience each member has, the better suited that person is to sit on the data governance board.

Figure 1, shows a typical enterprise data scenario at a very high level. The enterprise has legacy applications, existing applications, and new applications, and each application feeds from and writes to its own set of data. This is the genesis of informational silos. All the data stores feed into the Historical-Analytical System, which becomes a dumping ground for data. The analysts use the Historical-Analytical System to do two things: determine which data is most valuable, and use the data in this system to reconcile the discrepancies between the informational silos.

Figure 2 is a logical representation of what this environment would look like if it were under MDM control. In the center is the Master Data Repository. There are many ways to implement MDM control. One scheme is to maintain the data in the Master Data Repository and serve up the data to requesting applications on an as-needed basis and store the nonmaster data (for example, transaction details) locally. Another implementation scheme is to synchronize the master data content between the Master Data Repository and the applications. In either scheme, applications write master data to the Historical-Analytical System as the data is modified, whereas applications write transactional data from the local data stores to the Historical-Analytical System according to specific Historical-Analytical System requirements (not shown on Figure 2 to retain clarity). Master data is fairly static; it behaves like the slowly changing dimensions defined in Ralph Kimball et al.,The Data Warehouse Lifecycle Toolkit (Wiley, 1998). Transactional systems pinpoint business activity—sales activities, production-line operations, real-time activities—and usually generate volumes of data.

Using Supertype-Subtype Entities to Map Your Master Data Scheme

In Figure 3, I’ve laid the supertype-subtype entities on top of the structure in Figure 2 and removed most of the activity arrows (I did that only for clarity). Conceptually, the supertype entity belongs with the Master Data Repository, whereas each of the sub-type entities is part of one application. To illustrate, let’s look at a banking operation and imagine how it would operate in the scenario Figure 3 illustrates.

In a conventional organization similar to the one Figure 1 depicts, each department manages its own data internally, creating the ubiquitous informational silos. Let’s say for the sake of argument that there are separate applications for checking accounts, investment accounting, home mortgage loans, and commercial loans. At the end of the day, each department sends the data it collected to the Historical-Analytical System and analysts attempt to use this data to compile a complete financial picture of each customer. This data consolidation process isn’t always accurate because each application assigns a different identifier for the same customer. Additionally, the applications might not fully synchronize with one another, so the Historical-Analytical System might contain redundant data. Bank managers know they need to understand a customer’s financial activity, but with this fragmented picture of the customer they’re really limited in what they can hope to achieve.

Implementing an MDM Solution

The bank decides to implement an MDM solution with a Master Data Repository and a supertype-subtype master data schema. The supertype entity, nested in the Master Data Repository, is Person. The bank purposely didn’t make the supertype Customer because customer doesn’t reach high enough—a bank employee or a vendor can also be a customer. The bank is trying to eliminate the informational silos and has embraced the concept of master data at the highest level. Therefore, all people associated with the bank—employees, customers, and vendors—will be listed in the Supertype table. Each of the subtype entities will contain additional information that’s relevant to the associated application. For example, the subtype for home mortgage loans will contain connections to a person’s home address, and the subtype for commercial loans will contain connections to the customer’s business address. In each case, a person has a single identifier by which she or he is known in the Master Data Repository and in each of the applications. At the end of the day, when the applications transfer their transactions to the Historical-Analytical System, the analysts will have an accurate and complete view of each customer.

Getting an unfragmented view of customers is especially important in the customer relationship management (CRM) arena, in which companies are vying for customer loyalty. Automatic rewards programs are designed to reinforce customer loyalty. Because qualification programs for rewards are based on the amount of business the customer does with the company, companies need to be able to consistently track customer activity across all departments of the company. Without being able to keep track of a customer’s activities, a company is powerless to differentiate services according to customer loyalty.

Don’t Miss Any More Business Opportunities

Incomplete, inaccurate, and fragmented data costs companies every day in lost opportunities, diminished revenues, and time spent trying to reconcile discrepancies found in the corporate data stores. Even Microsoft has jumped on board the MDM train. According to a report in DM Review (dated July 2006), Microsoft has selected a third-party software package, Initiate Systems’ Initiate Enterprise Integrator, to help it create an internal global data infrastructure. Microsoft has cited that it needs to integrate its internal data with external data into a single, trusted source. Doesn’t that sound familiar?