Microsoft provides many products in support of business intelligence and analytics. Among these, SQL Server Analysis Services (SSAS) supports both the OLAP solutions that are central to business intelligence, and a rich set of data mining features that
allow users to build advanced statistical models of their data and use those models to categorize data and make predictions based on that data.
If SQL Server data mining does not meet your needs, there are other modeling tools available to you, including extensions to Excel, third-party add-ins for Excel, T-SQL libraries, and of course code samples that you can customize. These may require some programming.
This page provides an answer to some of the frequently asked questions about SQL Server Data Mining, as well as links to other resources. If you are looking for the official content, please see the
This page is intended to provide a place where people who are enthusiastic (or even lukewarm!) about predictive analytics can share information, tools, samples, and other resources. So please feel welcome to contribute.
Note that there is a separate Wiki Community page for SQL Server Analysis Services and one for
PowerPivot. And don't miss the DAX Resource Center if you are looking for formula tips!
For information on visualizations, check out the Wiki Community page for
Please feel welcome to add links to the following sections, create new articles, or or recommend articles that can be referenced from this Wiki.
Articles included here should touch on at least one of these subjects:
Consider this an annotated bibliography of my favorites. There are probably other great books so add yours! These aren't technical books, but are targeted more at someone just getting interested in predictive analytics
who wants a good grounding in the techniques and pitfalls.
 Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Michael Berry and Gordon Linoff.
Even if you don't work in any of the subtitled fields, you will find this book readable, applicable, and even enjoyable. They explain many of the common algorithms, but also explain the groundwork that must be laid before you do any kind of analysis, and walk
you through understanding results and how yo can apply them in real-world scenarios.
 The Practice of Business Statistics: Using Data for Decisions
David Moore, George McCabe et all
W.H. Freeman, 2003.
Forget everything you learned in that college statistics class? Keep this book on your bookshelf (or better still, download their free library of samples) and you'll never be caught flat without an explanation of lurking variables or covariance. The explanations
are clear, the graphs are excellent, and they painstakingly explain concepts and misconceptions.
 Data Preparation for Data Mining
Morgan Kaufman, 1999.
This book is an essential for all aspects of the data preparation process. It covers working with different types of dates and times, discrete vs. continuous data (what to bin and why and when), all kinds of aggregations and conversions, how to apply them and
when, and how data preprocessing affects the results (=quality) of models. The author provides clear explanations of techniques such as differencing and normalization and missing data inference, and discusses the pros and cons of different approaches in the
context of modeling.
 Data Mining with SQL Server 2008
Jamie MacLennan, Bogdan Crivat, ZhaoHui Tang
If you are using SQL Server data mining, you'll want this book, period. Jamie is a great writer and makes the concepts behind all the algorithms easy to understand. Bogdan has provided a huge amount of technical detail that will help you in creating custom
DMX or maybe your own algorithms. Providers, data sources, AMO, XMLA, walkthroughs of each algorithms shipped with SQL Server... it's all here.
 DM Add-ins videos
This page lists a series of short videos by UE writers Michele and Mary that highlight the wizards in the Data Mining Add-ins for Excel. The videos were created for SQL Server 2008, but the examples are equally applicable to SQL Server 2008 R2 and 2012.
This videos demonstrates the Key Influencers tool and explains how it compares to Decision Trees.
 Building or viewing models using BI Development Studio.
This is a nice demo that shows how you might start in the Data Mining Add-ins for Excel and move into BI Development Studio:
This video describes how to create and use lifts charts and cross-validation. These are important tools for testing models and for validating the representativeness of your data set. You can use these tools either in BI Development Studio (SQL Server Data
Tools) or in the Data Mining Client for Excel.
 Using data mining with PowerPivot
Data Mining MVP Mark Tabladillo created a series of videos on data mining with PowerPivot. He shows three different model types to illustrate how data mining complements and reinforces traditional BI.
Not a video, but a detailed walkthrough, with sample data, from Analysis Services PM Kasper deJonge. It demonstrates how to use results from a data mining model in PowerPivot and Power View:
 Data Mining Webcasts
Here are a few of the webcasts available on the Microsoft site.
Intro to the add-ins:
Your first DM Project:
Building Adaptive applications with DM:
algorithms: part of a series by MVP Lynn Langit: https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032516178&culture=en-us
Testing and Validating your data mining models:
Developer's Perspective on getting started with DM:
Using Data mining in data quality workflow: https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032359360&culture=en-us
Application Integration with the Excel DM ribbon:
This list of webcasts is older, but most are still available and still useful.
The official TechNet product page for SQL Server Data Mining is here. We try to put links to downloads and new videos here as soon as possible.
This site created by the original SQL Server Data Mining team has many resources. Be sure to look through the tips and samples!
The data mining forum on MSDN is very active with lots of helpful folks answering questions.
First, check out the Microsoft Case Study site. It's searchable.
EB Games uses SQL Server Data Mining, and although a formal case study has yet to be published, have discussed their use of data mining in the press.
INRIX Traffic Services leverages sophisticated statistical analysis techniques developed by Microsoft Research to aggregate and enhance traffic-related information from hundreds of public and private sources, including traditional road sensors and the company’s
unique network of over 800,000 GPS-enabled vehicles and cellular probes.
This section provides answers to some of the questions asked most often.
The data mining features in SQL Server Analysis Services are tightly integrated with both relational and OLAP data sources, to take advantage of the superior speed and functionality of both the database engine and multidimensional engine for retrieving data
and creating and storing calculations. Tight integration with other SQL Server products also makes it easy to build predictive analytics solutions in Visual Studio, and to integrate predictive modeling with your processes for data acquisition, data cleansing,
reporting, and BI analytics.
But you *definitely* don't need a cube to get started.
That said, an OLAP cube is a great source of data, since it is already cleaned, formatted, and aggregated in interesting ways. Also, if you have created a mining model based on OLAP data, you can save the results as a data
mining dimension. A data mining dimension is an OLAP table that stores the patterns and results in a format that you can use to explore correlations, drill through to details, and get new insight. You can also use the DM dimension as a source, like any dimension
This summary compares the two methods of building a data mining model, relational and from a cube:
We recommend you start with relational models.
Typically you will create some kind of view for data mining, to make sure the data is at the right granularity, and to ensure that the data has attributes that are useful for modeling.
For example, if your database captures transactions time-stamped in seconds, you might want to roll up the data into days, weeks, or weeks months, to make it easier to capture patterns over longer periods. If you are analyzing items that occur together and
you don't have enough items, you might try collapsing them into categories.
Remember, the point of modeling is to arrive at an abstraction that is just good enough to be useful.
To see an example of how you might prepare relational data for modeling, take a look at these views in the AdventureWorksDW sample database:
Looking at these views, and how they aggregate and simplify data, will help you understand how you can create variables that are useful for clustering, forecasting, or market basket analysis using data in a traditional
The introduction to the data mining section of Books Online provides an overview of the data mining process, and how data mining techniques can be applied to common business scenarios.
For each business problem or opportunity that you identify as a candidate for data mining, you need to find appropriate data, and identify an analytical method, or algorithm, that can be applied to the data to generate patterns and insights. The algorithm that
you use -- and you might employ several -- depends on the type of question you are asking and the result you need.
Are you exploring relationships, do you have a specific problem to analyze, or do you want to generate recommendations based on past data?
All these are common uses of data mining.
See this section of Books Online for an introduction of the algorithms provided in SQL Server Data Mining, and the types of problems each is best suited for.
Almost all of the algorithms provided by Analysis Services are available in the add-ins, except for linear regression and sequence clustering.
Table Analysis Tools
Analyze Key Influencers - Naive Bayes
Detect Categories - Clustering
Fill From Example -
Forecast - Time series
Highlight Exceptions - Clustering
Goal Seek - Logistic regression
What If - Logistic regression
Prediction Calculator - Logistic regression
Shopping Basket Analysis -- Association
Data Mining Client for Excel
Classify - Decision trees
Estimate - Decision trees
Cluster - Clustering
Associate - Association rules
Forecast - Time series
And here's one little-known feature in the add-ins!
A set of VBA functions is provided to let you use any models you create directly from Excel.
For example, if you have already created a customer segmentation model, and you add a new table of potential customers into Excel, you could use each row of customer data as input to the model, and get back a prediction as to the best segment for each customer.
See the Help file installed with the add-ins for more information on the functions.
There's no real difference.
Predictive analytics is a newer term, which has become more popular and seems to be more closely aligned with the type of exploration and analysis undertaken for business intelligence.
Data mining over the years has become confused with "data scraping" (which is simply data acquisition) and also seems to have become conflated with data capturing and surveillance.
However, there are some kinds of predictive analytics that are not data mining. These might include:
Use other tools for these types of analytics.
This section provides simple explanation about some of the concepts that can be confusing when you first get started.
Over the years many people have done samples, demos, and tutorials.
Here are some examples to get you started:
This section includes resources to take you from the "I've seen the demo" stage to putting models into production.