Looking Back to SQL Server 2000
Microsoft has had data mining algorithms embedded within Analysis Services since the SQL Server 2000 days. I recall playing with the Poisonous Mushroom Decision Trees demo more than 10 years ago, learning how to apply predictive models to retail customer segmentation and fraudulent item return scenarios. Although SQL Server Analysis Services (SSAS) Data Mining didn't get significantly enhanced in the latest SQL Server releases, the Add-Ins to Excel have continued to be upgraded to 64-bit, to work with the latest versions of SQL Server, and also to be used with both Excel 2010 and 2013. If you're not familiar with the base SSAS Data Mining features, please see my SlideShare decks on this topic: Predictive Analytics with Excel and Predictive Analytics with SQL Server. Microsoft also has free Data Mining Tutorials.
What I really like about SSAS Data Mining is the ease of building predictive models and embedding predictive queries (DMX) into applications or reports. In the past, I've embedded predictive queries to identify check fraud, healthcare co-payment amounts, customer segments, and several other use cases. I have a public live example of combining Microsoft SQL Server Analysis Services Data Mining DMX predictive queries with Tableau to provide inspiration on what is possible.
The base SSAS Data Mining offering is nice but note that it hasn't been heavily invested in by Microsoft for several years now, causing PMML and other features to fall behind. I've run into customers experiencing compatibility issues with SQL Server 2012 Data Tools (SSDT) and challenges getting timely support for this aging base offering. For those reasons and many more, I've started recommending Predixion.
Predixion Insight: vNext Answer for SQL Server Data Mining
Predixion is a Microsoft partner solution that supplements the base SSAS Data Mining offering with many features that have long been on my wish list. The Predixion Insight solution also has exceptional support and far greater capabilities than the base Microsoft solution. Recently, I had the honor of walking through the latest Predixion Insight release directly with Jamie MacLennan, co-founder and CTO at Predixion Software. MacLennan formerly was the development manager of SSAS and also was the development lead of the SQL Server Data Mining platform that I've been discussing. Here's a peek at what the latest version of Predixion Insight brings to the table.
Predixion Insight has an Excel add-in, a Server, and an optional cloud offering. Although Predixion Insight is a supplement to the base SSAS Data Mining features, starting with version 3.0, it also works with R and Mahout via machine a learning library plug-in—a key differentiator and a warmly welcomed enhancement that I'll dig deeper into a little later on.
The user-friendly Predixion Excel add-in adds two new tabs, INSIGHT NOW and INSIGHT ANALYTICS. The INSIGHT NOW tab contains the enhancements to the base Table Analysis Tools. Things like Analyze Key Influencers, Detect Categories and Market Basket analysis reside here but have been improved upon. For example, the Analyze Key Influencer reports are more detailed and the report presentation is nicer.
Most of the great, robust features data miners and analysts will use are located in the INSIGHT ANALYTICS tab. Here's where you'll immediately notice the sheer breadth and depth of enhancements over the base Microsoft offering, including
- being able to use PowerPivot as a data source—a wonderful data profiling feature that I could see using on non-predictive projects
- predictive, easier ability to use external data sources
- an option for in-database model scoring
- better sampling, discretization, and labeling capabilities including a new predictive analytical expression (PAX) function that provides better binning
- added features for normalizing the data or adding calculated fields with a statistical function library
- a link to the Predixion Marketplace where you can share or get already developed predicted models to fast track your project
- a link to the the Predixion Server or cloud to centralize, share, and collaborate on models— a feature rich Insight Workbench for developing predictive models
The list is simply too long for one blog. Refer to the online documentation to get a much better idea of all the goodies Predixion Insight 3.1 offers.
One of my favorite features that I longed for in the base Microsoft offering was base statistical information for predictive model variables and variable interactions. Predixion Insight shows variable distributions, core stats, and correlation relationships with other variables in a predictive model with a single mouse click.
Other enhancements that I really appreciated include the improved model performance, exploration, and live predictive "what-if" analysis.
One of the biggest new features is a new Machine Learning Semantic Model (MLSM) that allows data scientists and predictive modelers to change their work flow from creating predictive models to creating predictive applications. MLSM packages all of the necessary data transformations, predictive modeling logic, sampling, validation, and model selection generated while creating a predictive solution into a reusable application. That's a big deal for automating and speeding up predictive model development. In addition, the MLSM also empowers predictive model sharing and collaboration amongst a team.
How to Embed Predixion Insight
The only area that I thought was more difficult than the base Microsoft offering was the APIs for embedding prediction queries. To achieve similar DMX capability in Predixion Insight, there are a few options.
1. Predixion’s API can be invoked with a row of data (singleton), batch of data, or a pointer to an external data source (e.g. a database table or a Hadoop hdfs:// store). From an application, the prediction query is invoked pretty similarly to Analysis Services DMX—an ADO.NET like connection—a query with parameters and results. The query is an XML representation of the request. From a modeling tool, such as Excel, the query is presented as a Visual Macro with a tabular format can be easily copied to SSIS ETL or other applications. You can execute a singleton query, in "real-time" mode from Excel by means of the VBA API, or from a .NET application. A VB API real time predictive call looks like this:
‘ Specify the target MLSM and Model
pred.Application = "Bike Buyer Demo Application"
pred.Model = "BikeBuyer_Classification"
‘ Add singleton inputs
pred.Inputs.AddField "Age", 35
pred.Inputs.AddField "Gender", "M"
‘ specify desired output
pred.Outputs.Add ("PredictProbability([Purchased Bike], 1)")
‘ execute prediction and collect result
Set result = pred.Execute
All the scoring is done in the Predixion code, without calling into the machine learning library used in training. If all the MLSM-transformation-code-plus-model-scoring belongs to Predixion, it can then be encapsulated and moved out of Predixion for embedding into applications.
2. Alternatively, Predixion Insight has In-Database Scoring with SQL CLR Stored Procedures. The MLSM can be downloaded from the Predixion server and brought into a process running .NET or Java for true real-time scoring, without the price of the network latency—another enhancement over base Analysis Services DMX since that's simply not possible. Scoring code can also run inside certain databases, such as SQL Server (via SQLCLR), Greenplum (Java UDFs) or Hadoop (Java), and the result is in-database scoring, with no latency. If a developer wants to change the execution of a query from Server-side to Local, all that needs to be added is a few lines of code:
PredixionCommand cmd = icmd as PredixionCommand;
cmd.ScoringExecutionMode = PredixionCommand.ScoringQueryExecutionMode.LocalExecution;
// Download the MLSM and execute it locally
cmd.CachedExecutionPlanExpiration = new TimeSpan(0, 0, 10);
// the MLSM should be checked for updates every 10 seconds
cmd.CommandText = query;
Easy predictive querying and embedding is an important capability. Other features for embedding Predixion include:
- an API for VBA
- components for SSIS ETL packages
- updated PMML for exchanging models versions 2.0 through 4.0 of the PMML standard
- ODBC connectivity that allows querying of Predixion job results
If you want to check Predixion out yourself to see how much it really does enhance base SSAS Data Mining, download a Predixion Insight Developer trial and go through the walk-throughs. Predixion Insight is compatible with the latest Microsoft technologies, including Office 365, Excel 2013, Windows 8, and SQL Server 2012.