Microsoft PowerPivot for Excel 2010 is a data analysis tool that lets end users load, model, and explore data—without direct involvement or assistance from IT. In this article, I walk you through the key capabilities of PowerPivot to facilitate analysis of the US Federal Government’s historical debt. Specifically, I use PowerPivot to
- load data tables from multiple sources
- manage relationships between data tables
- use the new Data Access Expression (DAX) functions to perform lookups among related tables and create calculated columns
- create a pivot table and pivot chart
- create calculated measures
If you want to follow along, you’ll need Excel 2010 and PowerPivot for Excel, which you can download from www.powerpivot.com. You can download the sample projects using the Download the Code Here button at the top of the article page.
Open the USGovtData_SQLMag_v0.xlsx workbook in Excel 2010 and select the USHistoricalDebt worksheet. The tblUSHistoricalDebt table in this worksheet was sourced from the US Treasury and represents the US Federal Government’s historical annual debt from fiscal year 1790 through 2010; monthly debt information is available from September 2007. In the Excel ribbon, click the PowerPivot tab, select the USHistoricalDebt worksheet, and click inside the tblUSHistoricalDebt table. Then, click the Create Linked Table button located in the PowerPivot tab. The PowerPivot window will open with the tblUSHistoricalDebt table loaded, as shown in Figure 1. (Note that I have adjusted the formatting of the Date and Amount columns in Figure 1.)
In the PowerPivot window, click the From Database-From Access button located in the Get External Data section of the Home tab. (Note that if you’re running Windows XP, the PowerPivot window doesn’t have a ribbon; instead, a set of menu items is exposed. The functionality and available options are still the same, but you’ll need to adapt my instructions a bit to select the appropriate menu items.) In the Table Import Wizard window, select the USGDP.accdb Access database, click Open, click Next, and make sure the default option Select from a list of tables and views to choose the data to import is selected. Then, select all three tables and click Finish. The Table Import Wizard will show the results of the import, as shown in Figure 2. Click the Details link (located in the Data Preparation row) to see how the two foreign key relationships from the database are also detected and imported. Then you can close the wizard.
The Date, USHistoricalCPI, and USHistoricalGDP tables are now loaded into PowerPivot. The Date table, as the name implies, is used for time-based analysis and calculations. USHistoricalCPI contains Consumer Price Index (CPI, CPI-U) data from 1790 to present and can be used to adjust dollar amounts for inflation. USHistoricalGDP contains Gross Domestic Product (GDP) information for the US economy. (GDP data is often used as a basic measure of a country's economic performance.)
Managing Relationships Between Data Tables
To analyze data from multiple sources, it’s important to establish relationships among tables. With the PowerPivot window still open, select the Design tab in the ribbon and click Manage Relationships; the two foreign key relationships from the database are listed, as Figure 3 shows.
USHistoricalCPI and USHistoricalGDP are both related to the Date table; you’ll also want to create a relationship between tblUSHistoricalDebt and Date. In the Manage Relationships window, click Create to create a relationship between tblUSHistoricalDebt and the Date table based on the IdDate column, as Figure 4 shows.
Note that PowerPivot also has the ability to automatically create relationships between tables. When analyzing data in a pivot table, PowerPivot detects if unrelated tables are used at the same time and presents the user with the option of creating a relationship between the tables.
Using DAX to Create Calculated Columns
The tblUSHistoricalDebt table represents US debt in terms of current dollars; because we have data that spans more than 200 years, we’ll want to account for inflation for accurate comparisons across time. In economic analysis, it’s common to pick a given year as a basis for calculations. For example, the GDPAmount_2005_Billions column in the USHistoricalGDP table represents GDP in terms of 2005 dollars. Our goal, then, is to express US debt in terms of 2005 dollars as well.
The USHistoricalCPI table contains a CPI value (CPI-U) we can use to adjust for inflation. Using the September 2005 CPI-U value of 198.8 as the basis, the formula for adjusting a given year’s dollar amount to 2005 dollars is Year Amount x (198.8 / CPI-U for the year). (I chose September because it’s currently the last month of the US government’s fiscal year.)
PowerPivot introduces a new formula language called Data Analysis Expressions (DAX) that extends the data manipulation capabilities of Excel. The DAX syntax is similar to Excel formulas, providing a combination of functions, operators, and values. We’ll use DAX to create a calculated column representing US debt in terms of 2005 dollars.
In the PowerPivot window, select the tblUSHistoricalDebt table and click the Add Column column header. In the formula area, enter the following:
An in-depth explanation of this formula is outside the scope of this article. At a high level, the formula returns the lowest (i.e., MIN) matching CPI-U value from the USHistoricalCPI table based on the fact that tblUSHistoricalDebt and USHistoricalCPI are indirectly related via their shared relationship to the Date table. Double-click the CalculatedColumn1 column header and rename the column CPI-U, as shown in Figure 5.
We now have a CPI-U value for each row. When we eventually create a pivot table and chart for analysis, we don’t need the end user to see this duplicate CPI-U value because CPI-U is already part of the USHistoricalCPI table, so you’ll want to right-click the CPI-U column header and select Hide Columns-From Pivot Table.
Next, add another calculated column to the tblUSHistoricalDebt table by clicking the Add Column column header. Enter the following formula:
Then, rename the column Amount_2005. I recommend, when typing this formula, you try clicking (instead of typing the name of) the referenced columns.
Creating a Pivot Table and Pivot Chart
Close the PowerPivot window and save your workbook. In the PowerPivot tab, click the PivotTable-Chart and Table (Vertical) button. Accept the default option of inserting the pivot table into a new worksheet. You’ll see that two worksheets are inserted into the workbook: Sheet1 and Data for Sheet1 Chart 1. The Data for Sheet1 Chart 1 worksheet stores data for, and lets you manipulate, the pivot chart independent of the pivot table. Select the pivot table in Sheet1, and, in the PowerPivot Field List, drag the tblUSHistoricalDebt Amount field to the Values list box. Drag Century, Decade_Desc, and CalendarYear from the Date table into the Row Labels list box. Your pivot table should now look similar to Figure 6.
Note that in my pivot table, the Summary (i.e., aggregation) of the Amount value has been changed from the default Sum option to Maximum. (I accomplished this by clicking the Amount column in the Values list box of the PowerPivot Field List, selecting the Summarize By menu item, and selecting Max). Also, in the pivot table itself, the Amount column has been formatted as currency by editing the Value Field settings. (Right-click the Amount column header in the pivot table and select the Value Field Settings menu item.) Finally, I collapsed the Decade_Desc field for the time being to hide all of the years.
Let’s stop here and look at the numbers. In the 1790s, the United States had a national debt of $83 million. (For the curious history buff, expand the decade 1830-3; the year 1834 represents the United States’ closest attempt at becoming debt free.) The debt jumped to $2.7 billion in the 1860s (financing the Civil War), to $27 billion in the 1910s (World War I), and to more than $40 billion in the 1930s (because of the Great Depression and subsequent government programs). World War II pushed the debt to more than $260 billion in the 1940s, where it remained at a slow but steady increase until the 1970s. From 1970 to October 2009, the debt has increased to nearly $12 trillion. (Expand the decade 2000-0 to see how our debt has nearly doubled in the past 10 years.) Looking at these numbers, it would appear the US debt is increasing at an exponential rate.
Now, let’s place these numbers on the chart for a graphical representation. Click inside the (currently blank) chart and then, in the PowerPivot Field List, drag the tblUSHistoricalDebt Amount field to the Values list box (and as before, change the summary to Maximum). Then, drag the Decade_Desc field from the Date table into the Row Labels list box. The chart visually confirms the alarming rise in the national debt.
The next step in our analysis is to look at the debt in terms of 2005 dollars (i.e., to adjust for inflation). To do so, add the Amount_2005 field to both the pivot table and chart, making sure you change the summary aggregation to Maximum. Even with inflation factored out of our analysis, the rise in our debt (especially in the past 40 years) is dramatic. With that said, the rise in the US debt might not be so problematic if the size of our economy were also growing at a similar (or more rapid) pace.
Creating Calculated Measures
Add the GDPAmount_2005_Billions field to the pivot table and change the summary aggregation to Maximum. As the name implies, GDP is expressed in billions (and in 2005 dollars). To keeps things consistent, let’s convert the US debt into billions as well. In the PowerPivot tab, click the New Measure button. In the Measure Settings window, enter the following values:
Table Name: tblUSHistoricalDebt
Measure Name: Amount_2005_Billions
Formula: = tblUSHistoricalDebt\\[Maximum of Amount_2005\\] / 1000000000
After you click OK, a new field (i.e., a new measure) is added to both the PowerPivot Field List and to the pivot table. Visually comparing the debt amounts to the GDP amounts over time is a bit of work, so we’ll add a second calculated measure to express the debt as a percentage of the GDP. Click the New Measure button again and enter the following values:
Table name: tblUSHistoricalDebt
Measure Name: DebtAsPercentageOfGDP
Formula: =if( ISBLANK(USHistoricalGDP\\[Maximum of GDPAmount_2005_Billions\\] ), BLANK(), tblUSHistoricalDebt\\[Amount_2005_Billions\\]/ USHistoricalGDP\\[Maximum of GDPAmount_2005_Billions\\] )
I added this new measure to my pivot chart, as well as Amount_2005_Billions and Maximum of GDPAmount_2005_Billions. I then made a series of customizations to the chart, including
- customizing the series chart types by right-clicking a value in the chart legend and selecting the Change Series Chart Type menu item
- creating a secondary axis for DebtAsPercentageofGDP by right-clicking the value in the chart legend and selecting the Format Data Series menu item
- adding data labels to the DebtAsPercentageofGDP series by right-clicking the series in the chart and selecting the Add Data Labels menu item
- filtering the chart to display data from 1900 to present by clicking the Decade_Desc filter in the chart and selecting the desired decades; alternatively, you can add a Report Filter or take advantage of the new Slicers feature to filter data by century.
The resulting pivot chart is shown in Figure 7. Unfortunately, the chart confirms that our national debt is growing faster than our GDP (i.e., debt was 39 percent of our GDP in the 1970s and is now 82 percent of our GDP).
Sharing PowerPivot Data via SharePoint
We’ve loaded, modeled, and analyzed data—without leaving Excel. Eventually, you might want to share your analysis with others (who might or might not have Excel 2010). With SharePoint 2010 and SQL Server 2008 R2, it’s possible to publish Excel workbooks (with PowerPivot data) to a SharePoint document library. Other users can then view the workbooks in a browser via Excel Services. And because a PowerPivot data model is actually an in-memory version of SQL Server Analysis Services, our published workbook becomes a data source that other applications (e.g., SQL Server Reporting Services, PerformancePoint Services) can consume.