For ease and speed of analysis, data is gathered from various sources and placed in data model. The data model serves two purposes. First, it presents the data to the business user in a manner that they are familiar with. No weird table and field names to decipher. No complex language required to query the data. Second, the data model stores the data in such a way that it can be retrieved and navigated quickly.

Power Query

Power Query makes it possible to extract data from a wide variety of locations and assemble it into a data model.  Along the way, the data can be manipulated from its native form into a format that is more conducive for analysis. Power Query can extract data from a wide range of sources, including:

  • Text files including comma-separated values (CSV) and XML
  • Excel spreadsheets
  • Databases including SQL Server, Oracle, and IBM DB2
  • SharePoint lists
  • Hadoop
  • OData feeds
  • Active Directory
  • Microsoft Exchange
  • Internet sites including Facebook

Once the data is extracted, it can be transformed into a format appropriate for analysis. Transformations include:

  • Splitting data into multiple columns
  • Pivoting and unpivot data
  • Replacing values (i.e. replacing “N/A” with an empty value)
  • Change upper case to title case
  • Parse XML
  • Data type changes
  • Merge data from multiple data extractions into a single table

The transformed data can be loaded into an Excel spreadsheet or into a PowerPivot data model. (See the section on Power Pivot.)

As an example, Figure 1 shows U.S. state population data as found on the Internet. The data is exactly what is needed, but the format of the data does not work well for the type of analysis desired.

Figure 1

A series of data manipulation steps is applied to change this data to the desired format as show in Figure 2. The data manipulation steps can be seen in the Applied Steps window on the right.

Figure 2

Power Query is an add-in for Excel. It requires Office 2010 Professional Plus with Software Assurance, Office 2013 Professional Plus, Office 365 ProPlus, or Excel 2013 Standalone.

Power Pivot

Power Pivot is the data model architecture available within Excel. Power Pivot also has its own capability to extract data from external sources. However, Power Pivot cannot access quite the variety of sources that Power Query can. Power Pivot is also limited as to the data manipulation it can do while loading data. That said, if little data manipulation is required, it is possible that Power Query will not be required for a given data model and Power Pivot will be able to perform the data extraction

Power Pivot is extremely powerful, once the data is loaded and the modeling process begins. Millions of rows of data can be stored in a format that allows aggregation and retrieval in seconds. Aggregation in Power Pivot is accomplished through measures. Measures know how to sum, average, or count elements while paying attention to any filtering or grouping the user applies to the data during analysis.

Figure 3 shows the state population data in a table inside a Power Pivot model.

Figure 3

The Population item, which is selected, is a measure defined within this table. The formula for the Population measure is show in the formula area below the ribbon. More complex measures, which can provide additional analysis such as year over year comparisons and year-to-date values, can be created using the DAX expression language.

Power Pivot is an add-in for Excel. It requires Office 2010, Office 2013 Professional Plus, or Office 365 ProPlus. Power Pivot can also be used within SharePoint 2010 Enterprise Edition or SharePoint 2013.