How many of you are DBAs, classic developers, or BI developers? I would imagine most of you are in the first two buckets. Indexing, TSQL, SQLCLR, Service Broker, SQL Server Agent, clustering, replication, log shipping these and their related topics are what make you get up and go to work each day. Business Intelligence and Data Warehousing (DW/BI) tends to be a foreign subject to many people who work with the above on a daily basis. With a little luck Mark and I will be killing two birds with one stone via a new series of posts focused on the SQL Server DBA and Developer.
Let’s first set some ground rules and context. Our goal here is to help educate DBAs and developers in gaining a better understanding of DW/BI in the ‘Microsoft World’. We will explore SSIS/SSAS/SSRS including the RDBMS from a decision support perspective. Some of the content will be more introductory while other posts might be more in-depth. There is definitely a ‘content sweet spot’ whereby professionals who are familiar with BI can still learn some new things and yet not be so detailed that those who are brand new to BI become lost quickly.
In today’s post I want to start us off right and level set the audience a bit by drilling down into each SQL Server component with a brief description of how the component supports DW/BI scenarios. I’ll then discuss DW/BI methodologies keying on the two most popular ones in use today. If you have ideas on how we can provide better content for our fellow DBAs and classic developers please feel free to send me an email at firstname.lastname@example.org.
Note While DW/BI is traditional used for business analytics and data there is nothing stopping you from applying the same concepts and related technologies for IT pro usage! Common examples of BI for IT include custom DBA performance data warehouses, network administration data marts, and many more.
SQL Server Relational Database Engine
The SQL Server relational database engine supports our tasks in creating a variety of decision support and related repositories. The names and classification of these repositories are pretty popular and you’ve probably heard all of them including: Staging, Operational Data Store (ODS), Data Mart, and [Enterprise] Data Warehouse (EDW). The ODS, Data Mart, and EDW are all consumer-facing repositories; they are our proverbial ‘dining room’. Conversely, staging databases are the BI professional’s ‘kitchen’, business users should never go into the kitchen.
Note Please see my article “SQL Server 2008 Business Intelligence Enhancements” for an in-depth listing of fairly recent DW/BI advances of the SQL Server stack
Note One thing I’ve been seeing some recent confusion on is using the entire SQL Server data platform for BI. Decision makers sometimes believe that SQL Server for BI is an ‘all or nothing’ decision and this is absolutely not true. You can use SSAS on top of Oracle, SSIS to integrate SQL Server data into a DB2 data mart, the list of examples can go on and on. The bottom-line is that each and every SQL Server component can be used independently from the others.
SQL Server Integration Services (SSIS)
SSIS provides us with an Extract, Transform, and Load (ETL) toolset to bring in data from multiple operational systems in the business. SSIS routines are called packages and can call one another providing the ability to have ‘master’ and ‘slave’ packages. SSIS has a concept called data flows which represent how data flows from source to destination. Data flows contain data sources, transformations, and destinations. For ETL purposes there are a handful of transformations that are used more than others including the Lookup, Aggregate, and the Split/Join transformations.
SQL Server Analysis Services (SSAS)
SSAS provides us with OLAP and Data Mining (DM) capabilities. The easiest way to understand OLAP is to understand the needs of BI. People want to view their data at aggregated levels and then on occasion drilldown into the leaf-levels. When you have a small set of data you can aggregate data on the fly easily enough using TSQL. However, when you need to aggregate massive volumes of data, at run-time, TSQL is not the best solution. So what are OLAP Cubes? They are large-yet-sparse collections of data pre-aggregations potentially including indexes and leaf-level data. MDX is the query language we use to both enhance and query OLAP cubes with.
While OLAP is valuable the end user must manually look at a report, dashboard, or sometimes even the OLAP cube directly for identifying hidden trends and patterns. With DM, we pre-define a particular trend or pattern we are interested in seeing and then apply various models on top of the structure to compare results using different algorithms. The key point to take away here is that DM does the detective work for you, auto-pilot mode if you will.
SQL Server Reporting Services (SSRS)
SSRS provides us with an excellent reporting platform. Notice the word I use ‘report’, as I mentioned in my post “Selecting the Right Microsoft Reporting & Data Visualization Tool”, SSRS is primarily for developing reports. However, SSRS can also be used for creating effective dashboards and scorecard solutions. This is the one component that is always the easiest to explain from a 10000 foot perspective, people understand reports generally speaking. In fact, many times when speaking with prospective customers they genuinely believe they have business intelligence already when they actually have traditional operational reports.
Kimball, Inmon, What Are You Talking About!?
You’ve probably heard these two names mentioned quite a bit. These are in fact real people’s last names which are Ralph Kimball and Bill Inmon. These two gentlemen collectively represent an unbelievable amount of data warehousing knowledge (some say they are the fathers), each with his own idea about how someone or a team should go about implementing enterprise business intelligence solutions. BI professionals get into heated debates about which methodology is best including hybrid mash ups of both.
I’ll confess right now, I am a Kimball practitioner, I own and have read every one of his books. Some hard copies being so worn that I’ve since replaced them with iPad Kindle versions. I’ll do my best not to sway you in either direction but I will add why I personally choose Kimball momentarily. So, what are the major differences between the two methodologies? There are many differences but I will cite the biggest two:
1. Kimball’s Bus Matrix Methodology is a ‘Bottom Up’ Approach where as Inmon’s Corporate Information Factory (CIF) Methodology is a ‘Top Down’ Approach. Meaning, with Kimball you build incremental/conformed data marts to form an EDW. With Inmon you build the entire EDW right off the bat.
2. Kimball’s Bus Matrix Methodology recommends a denormalized data warehouse design where as Inmon’s CIF Methodology recommends a 3NF design.
Note Many more details on the deltas between the two methodologies can be found at http://www.nagesh.com/publications/technology/173-inmon-vs-kimball-an-analysis.html
There are many reasons why I choose Kimball over Inmon but without a doubt the number one reason is because I can deliver ROI faster to our valued customers. If I had to tell every prospective customer that it would be an entire year (for example) before they could realize value I probably would not have too many customers. For those of you who are Inmon practitioners I absolutely welcome your comments but let’s do keep it nice please.
That’s a rap. You should now have a pretty good understanding of how the SQL Server components are employeed for business intelligence needs. You also now understand what the ‘BI geeks’ mean when they are spouting about Kimball or Inmon philosophies. I hope you found this post helpful and encourage you to check back for additional detailed posts on DW/BI for the DBA + Dev.