In the world of data warehouse specific databases, column-oriented databases (as opposed to the classic row-based databases) are becoming very popular. One of the most popular pure-play column-oriented databases is from Vertica and Oracle databases with Exadata storage includes hybrid columnar compression, which maintains the row structure, but compresses the data very effectively with common values in columns.
The next release of SQL Server (SQL Server 2012, 11.0, or Denali, if you like) will include Microsoft’s first release of their database engine to include columnar compression, which is being surfaced to database developers and DBAs as columnar indexes. The technology that Microsoft is using inside SQL Server for the column-oriented compression and indexing is based on the same Vertipaq technology that is currently in use in PowerPivot and being expanded upon in Denali as a secondary Analysis Services engine that will provide an alternative to the MDX/UDM engine.
Much of what I am describing here is fully expanded upon in this whitepaper by Microsoft’s Eric Hanson which I highly recommend that you download and read. It is brief and very effective. What I want to also highlight in terms of the Microsoft implementation of columnstore indexes is that the data on the data pages in a columnstore index are stored as pure column storage, not as a hybrid row and column model.
The example columnstore index from that whitepaper is reproduced for you here:
- CREATE COLUMNSTORE INDEX cstore on [dbo].[catalog_sales]
Basically, what they are doing is adding ALL rows from the fact table of their star schema data warehouse data model in this index. The query optimizer will make the determination of using the columnstore index or another index or heap for you. I’ve also copied a screenshot of how you can access columnstore indexes on your fact table from the SSMS GUI in SQL Server Denali here:
That dialog is available under indexes in your database tables folder in SSMS. You can only have 1 columnstore index per table.
This is from CTP3 of SQL Server Denali, so you can try this out today. The Microsoft whitepaper states that using the column oriented index on your tables can improve your query performance against that fact table up to 100X and possibly even more. The idea is that you can start to use columnstore indexes on your fact tables in your SQL Server data warehouse so that you won’t need to pre-aggregate certain data. Instead, the increased speed times will allow you to perform those aggregations in your queries. This is where the benefit to your business intelligence solutions are noticed. Particularly, if you want to avoid building cubes for aggregations and avoid the processing delays of dealing with a UDM model, then this may be a solution to try out.
Notice that from the whitepaper, this query includes a grouping and aggregate function and is what they are using for their sample timing:
- select w_city, w_state, d_year, SUM(cs_sales_price) as cs_sales_price
- from warehouse, catalog_sales, date_dim
- where w_warehouse_sk = cs_warehouse_sk
- and cs_sold_date_sk = d_date_sk
- and w_state in ('SD','OH')
- and d_year in (2001,2002,2003)
- group by w_city, w_state, d_year
- order by d_year, w_state, w_city;
If you’d like to keep more of your BI logic in T-SQL, away from MDX and MOLAP, then columnstore indexes on your data warehouse may be the ticket to speed these types of queries to the point to where they are viable for your solution.
I also recommend this YouTube video that Eric Hanson uploaded and walks you through the creation of a columnstore index. The whitepaper I linked to above will also detail for you some of the limitations of columnstore indexes in SQL Server Denali that you will want to become familiar with including how to update them (they are read-only) and they must be partition-aligned.