Providing a business intelligence (BI) solution that lets end users create analytics on their own and requires minimal IT intervention, low back-end maintenance, and a short learning curve for IT is a challenge. My solution, which is designed for small-to-midsized businesses (SMBs), involves empowering end users with true self-service BI on the client side.
How the Solution Came About
My organization is a Microsoft shop using enterprise software. In 2010, I found that our only BI option was SQL Server Analysis Services (SSAS) because it was bundled with SQL Server. SSAS generates an OLAP cube that can be queried by Multidimensional Expressions (MDX). SSAS is a comprehensive but complex tool that requires a steep learning curve to create effective cubes correctly. The manpower investment to create, maintain, and query an OLAP cube exceeded our requirements and was too costly to pursue. So, it was eliminated as a possible BI solution.
By 2012, I found two SSAS alternatives that followed the self-service BI paradigm:
- Server-side SQL Server 2012 Tabular Business Intelligence Semantic Model (BISM)
- Client-side Microsoft Power Pivot 2012 for Microsoft Excel 2010 using the xVelocity (formerly VertiPaq) engine
Both solutions leverage the Data Analysis Expressions (DAX) query language. Unlike MDX functions, DAX functions use a syntax that's similar to the syntax used by Excel functions. These features let Excel power users master DAX significantly faster than MDX.
I requested a comparison of the server-side and client-side approaches as part of my pre-sales discussion with Pragmatic Works. I contracted with Pragmatic Works for an onsite Power Pivot training class that integrated the use of the organization's data as part of the training.
Table 1 shows the comparison, which is from an enterprise perspective rather than an SMB perspective.
Client-Side BI Solution:
Server-Side BI Solution:
Power Pivot has a 2GB limit for the size of the Excel file.
Tabular models don't have a hard upper size limit. Tabular models have partitions that are used to manage the processing of large data volumes.
Power Pivot is limited to daily scheduled data refreshes in SharePoint and data usage checks.
Tabular models can be managed by SQL Server Management Studio (SSMS).
Power Pivot doesn't support row security or dynamic security. Security for a workbook is just a binary choice—either users can read a workbook or they can't read it.
Tabular models can be secured using row security or dynamic security.
Development tool chain
Power Pivot is limited to the extensibility model provided in Excel (which doesn't include source control or build configuration). Power Pivot doesn't have deployment functionality like that in the server-side solution.
Tabular models are in the Visual Studio shell. They benefit from all the Visual Studio shell services, such as integrated source control, build integration, and Microsoft Team Build integration. The editing environment can be extended using Visual Studio extensions, such as the DAX Editor. In addition, tabular models live in the Visual Studio environment, so the build and deployment can be naturally separated. The deployment wizard can be used after building to change around connection strings, retain role members, etc., before deploying changes in a model to production.
The client-side approach might be viewed as negative because of Excel's upper boundary constraints and other limitations. However, we found that the upper boundary constraints were workable and that a client-side solution would provide end users with the flexibility that we wanted to make available. Remember that the client-side approach is for SMBs, which typically have fewer than 5 million rows of data. SMBs have BI needs that often can only be met with a solution like this.
The client-side approach offers several other advantages:
- The BI features (e.g., perspectives, hierarchies, date tables, Key Performance Indicators—KPIs) are available through Excel on the client side, thereby eliminating the need to move to SQL Server 2012 Tabular BISM and SQL Server Data Tools (SSDT).
- Excel provides the ability to create all kinds of rich models. You can share these models by using Power Pivot for SharePoint or putting them in shared folders in the file system.
- It isn't necessary to move to tabular models if you don't need them. (In our organization, we don't need to use a formal development process and we don't need to scale out, secure, or manage the model after deployment.)
The client-side approach has a few cons, but they're what I call "the cost of doing business." One cost is upgrading the client hardware, but this cost is minimal. An unquantifiable cost is the learning curve that the end users need to undergo when they're learning how to implement measures, calculated columns, and relationships, and how to test and debug results on the Excel client. It seems to be a cheap price to pay for true self-service BI on the client side.
It didn't take long for us to determine that using a client-side approach was our best option, so I designed the client-side self-service BI solution. As Figure 1 shows, it uses a cube topology (star schema) in an OLTP database on the back end.
On the client side, Excel 2010's xVelocity engine creates an in-memory cube. With this setup, developers can assist the end users on an as-needed basis rather than performing the BI analyses themselves. Let's take a closer look at the various components in the solution, including the extraction, transformation, and loading (ETL) process, the OLTP database and star schema on the back end, the client-side Excel and Power Pivot tools, and the OData service.
Our organization preserves the state of its information by publishing data by billing cycle. The data is scrubbed prior to loading and is the basis for all our reporting. The preservation of the data by billing cycle provides the end user with the ability to reproduce identical results at that point in time.
The published data is loaded into the star schema in our OLTP database (i.e., destination database) using the T-SQL MERGE command and a linked server pointing to the source data. When new data is loaded into the star schema, some of the tables are truncated, whereas others are upserted. Upserting is inserting rows into a table if they don't already exist and updating the rows if they do exist.
The use of the MERGE command simplifies the ETL process. When the source data is loaded into the destination table, the MERGE command:
- Creates a linked server that points to the source database server
- Uses the specified criteria to identify the source-to-destination column mappings, which ensure uniqueness and determine whether to do an update or insert transaction
- Reviews the output of the merged data and groups it by type of action ($action), such as insert, update, and delete
Figure 2 shows a sample MERGE command. In it, I noted where these three events occur.
OLTP Database and Star Schema
As I previously mentioned, the client-side self-service BI solution uses an OLTP database normalized to a star schema. I used the CA ERwin Data Modeler to create the baseline star schema, which Figure 3 shows. In the star schema, all the dimension tables have foreign key relationships to the fact tables.
The key is relationships rather than star schema optimization because Excel's xVelocity engine works its magic when the data is imported into the client. The dimension and fact tables are backed up every monthly billing cycle.
In the SQL Server database, I developed various computed columns for commonly used data to avoid the repetitive creation of the same information by the client. When the client loads the data from the database using the star schema, the main relationships are immediately available to the user. This helps reduce the learning curve for those people who are new to Power Pivot.
Excel and Power Pivot
Excel's xVelocity engine is an important component in the client-side self-service BI solution. It leverages the columnstore index to create an in-memory cube against which to pivot. (If you're unfamiliar with this type of index, see MSDN's Columnstore Indexes web page.)
We upgraded to 64-bit Excel on the client machines to get around the 65,535 row limitation in the 32-bit version. We also increased the machines' memory to a minimum of 8GBs, which is a small price to pay for the flexibility gained.
Power Pivot is added to Excel through a plug-in. (If you're unfamiliar with Power Pivot, check out the resources listed in the Learning Path at the end of the article.) The data loaded into Power Pivot reflects the design of the star schema model, as Figure 4 shows.
Non-related tables added to Power Pivot can be related through the drag-and-drop feature in the Diagram View. Handling nonrelated tables this way is far more intuitive for non-IT folks.
In the future, we plan to create an OData service using Windows Communication Foundation (WCF) Data Services. WCF Data Services lets you not only create but also consume OData services for the web (formerly known as ADO.NET Data Services). The services can be leveraged by using the Entity Framework with the OLTP database.
External users will access the data through the OData service. Our plan is to use proxy accounts and users' email addresses to create a custom context to access data based on our security scheme. We will leverage a T-SQL table-valued user-defined function (UDF) using the LDAP-authenticated email address as part of the extended membership to define the user's custom context. All queries will use the T-SQL table-valued UDF to return the data that the user is allowed to see.
A Better Approach for SMBs
SMBs have different needs than enterprises when it comes to BI solutions. In SMBs, using SSAS to create, maintain, and query an OLAP cube is often too costly in terms of manpower investment. A better approach is to design and implement a self-service BI solution on the client side. With this solution, end users can perform their own analytics, with minimal IT involvement. The end users are happy because they don't have to wait for IT to run their BI analyses. The IT folks are also happy because they have more time to perform other job duties. But perhaps most important, this solution is very flexible. Anyone with Excel, the Power Pivot plug-in, and a variety of data sources (e.g., flat files, spreadsheets) can perform BI.
Peter Heller is a computer specialist (level 3) at NYC DCAS in New York City. He is also the project manager of DCAS's Energy Cost, Control, and Conservation (EC3) project, a job for which he was nominated for New York City's Project Management Leadership Award in 2012. The Project Management Leadership Award is part of the city's Excellence in Technology Awards Program (ETAP).
SQL SERVER PRO RESOURCES
For more information about Power Pivot, see:
"A Walkthrough of PowerPivot for Excel 2010"
"How to Create PowerPivot Applications in Excel 2010"
"Introducing PowerPivot for SharePoint"
"Taking PowerPivot to the Enterprise"
"Understanding PowerPivot and Power View in Microsoft Excel 2013"
"What's New in Microsoft SQL Server 2012 PowerPivot"
For more information about BI in SQL Server, see:
"Power BI: Digging Into Your Business Data for Profits"
"SQL Server 2012 Business Intelligence Enhancements"
SQL Server BI Blog
For more information about Power Pivot, see:
"Data Analysis Expressions (DAX) Measures In PowerPivot"
DAX Formulas for PowerPivot: A Simple Guide to the Excel Revolution
Microsoft PowerPivot for Excel 2010: Give Your Data Meaning
Practical PowerPivot & DAX Formulas for Excel 2010