Comparing business intelligence approaches of old BI and new BI.
Recently, I've been encountering the terms "old BI" and "new BI" to refer to the classic DW/BI approach to business intelligence and the newer Big Data Analytics approach, respectively. In fact, I've now started to use similar language in my conversations to differentiate between what I now call "traditional BI" and "Big Data Analytics" to demonstrate a split or inflection point in our industry.
I just found "old BI" too much of a pejorative term which is not the intent. This is because traditional nightly ETL from sources into an EDW with dashboards is still the most valid and effective approach for 90 percent of business intelligence (BI) requirements.
Big Data Analytics Gaining Traction
It's this "new BI", Big Data approach—or simply just "Big Data Analytics" approach to BI—that is gaining more and more traction and interest. However, for most cases, "old BI" techniques will be utilized. Here is a nifty example of my attempt at promoting both the "old BI" approach as well as what I was at the time calling the "new" approach to BI in which the only real difference is that OLAP models become in-memory columnar data stores to more effectively store data for aggregation.
At the time that I wrote that early in 2012, I also felt like Mobile BI and Cloud BI were going to become even more important in most BI enterprise architectures. Anecdotally, I certainly continue to see cloud & mobile as part of both large enterprise hybrid on/off prem implementations as well as more pure cloud and mobile SSAS offerings. But they haven't taken over the industry as much as Big Data has. Big Data seemed to creep into corporate BI solutions to steal the air from the room and this is where many of us have focused pur energies for the past several years now.
So that brings me back to "new BI" or Big Data Analytics, which I see also as a hybrid approach, but hybrid in terms of traditional ETL + DW together with Hadoop and MPP. This also means that as BI professionals, we need to focus more on providing data sandboxes, data discovery capabilities and analytics across big data that requires new technologies and approaches. To me, the differentiators between "old BI" and "new BI" include the movement away from heavily pre-defined waterfall-ish EDWs and dashboards and more toward Agile free-flowing semi-structured data that requires data stores that can support maximum flexibility, agility, in-memory (i.e. break through the IOPS and IO scan barriers), distributaed parallel processing power. The data we are analyzing is only going to get bigger and stanger, as opposed to smaller and easier.
Which now means that I have a "new BI" reference architecture picture below! This demonstrates the more fluid movement of data from semi-structured and streaming sources that are not traditional sweet spots for highly structured ETL and RDBMS schemas: social media, log files, sensor data, etc. I've taken a few best practices from recent implementations where MapReduce is used as an "ETL accelerator" and provide pre-aggregation of complex unstructured big data and then fed into a more traditional database that is an MPP distributed across parallel nodes. The data integration tools must speak natively to more than traditional RDBMS sources. They must understand metadata and optimizations for HDFS, Hive, Impala, NoSQL like MongoDB & Cassandra.
In "new BI", we are going to keep detail data in HDFS clusters with aggregated business-value OLAP layers that must scale to this new data complexity. And finally, the data visualization tools in the presentation layer must enable data discovery, predictions and drill down into the detail grain which will also require native understanding of Big Data & NoSQL data stores.
I'm sure eventually (perhaps soon), we'll reach another inflection point to where hybrid Big Data Analytics or "new BI" techniques become the norm instead of the exception. So it's definitely a good idea for database and BI pros to begin looking at the "new BI" technologies in this reference architecture such as new database in-memory and columnar capabilities, NoSQL, Hadoop and Big Data Analytics visualizations.