Many BI efforts--especially enterprise BI--go awry at one critical point--the management of master data. And yet most SQL Server shops are flying blind because historically Microsoft has offered nothing to help manage master data.

What is Master Data?

Any data that's truly important to your business that's usually stored in more than one place: that's my over-simplified way to identify master data. For example, customer data is usually important: A customer relationship management (CRM) system might be considered the “home of record” for customer data, but there's probably a ton of customer data in the neighboring ERP and invoicing systems. Folks are often horrified to realize how inconsistent that critical data is. It's easy to figure out how this inconsistency in data happens: A salesperson logs a change in customer information in CRM (i.e., the customer's office location changed) but has no access to make the change to the ERP or invoicing systems; therefore, shipments and bills continue to go to the old office location. In this case Customer is master data, and it demands proper management. A lot of IT shops have some form of MDM though they never think of it as such--intelligence built into data integration or apps that look in other systems to sync data that’s created elsewhere but needed locally.

Take note that I'm pushing master data management (MDM), not selling you on extraction, transformation and loading (ETL). ETL is not a holistic solution; it cleanses only the data warehouse data not the source data. MDM holistically manages the data that flows to all source systems and the data warehouse. The riskiest component of a data warehouse or BI project is generally the ETL process because you have to combine redundant data from multiple data sources, build business rules that define winners (or merges) of data conflicts, and add intelligent processes and code to merge and cleanse data before loading it into the data warehouse. And after you load it into the warehouse, you still have the dirty, conflicting data mucking up the systems from which you extracted it.

Application That Could Manage Your Master Data

Wouldn’t life be simpler if you built or implemented an application that could manage your master data? And I do mean an application, as opposed to a mix of one-off data integration processes or doing swivel-chair computing involving multi-system data entry for each change (or--heaven forbid--just putting up with dirty, conflicting data). Then you could take a standardized, holistic approach to MDM that lets you proactively manage your data assets to benefit your source systems as well as your data warehouse and BI. Effectively managed master data lets users carefully draft specifications for entities and attributes that are then available as a subscription of sorts to other systems that need it, including your data warehouse ETL processes. You will still have your share of heavy lifting in your ETL but hopefully it will be in dealing with your facts and your non-master data.

If you're like me, you don’t relish the thought of building another application to perform the aforementioned tasks: This functionality should be part of existing tools and platforms. Many players, especially in the BI space, claim they've been covering MDM for a long time, but when you pull back the covers, most of those solutions are more focused on metadata or are only effective on technologies from the same vendor. In 2007, Microsoft acquired an MDM company, Stratature, and is rebuilding the Stratature applications as a robust MDM solution that will be hosted in a future Microsoft Office SharePoint release in late 2008. You can learn more about this solution now, and see the product roadmap, at the MDM site at microsoft.com/mdm.

If you want to get deeper into MDM, check out the Microsoft white paper “The What, Why, and How of MDM.” Also, read “Master Data Management: A method for reconciling disparate data sources,” January 2007.