How to Use the SQL Server Data Mining Add-ins with PowerPivot for Excel

How to Use the SQL Server Data Mining Add-ins with PowerPivot for Excel

Introduction

This article has the following goals:

  • Explain what the add-ins are, how they differ, and how each is intended to be used.
  • Describe how to install PowerPivot for Excel and the Data Mining Add-ins for SQL Server 2008 on the same computer. Provide troubleshooting and configuration tips.
  • Describe performance issues and other considerations; suggest ways to optimize the use of PowerPivot in conjunction with data mining.
 This topic is a how to.
Please keep it as clear and simple as possible. Avoid speculative discussions as well as a deep dive into underlying mechanisms or related technologies.

Table of Contents


Overview of Data Mining

The Data Mining Add-ins for SQL Server 2008 is a free download that can be used with either Excel 2007 or Excel 2010. When you install the data mining add-ins, you can connect to an existing instance of  SQL Server 2008 Analysis Services and use the data mining algorithms and services provided by that server to perform data mining on the data in your Excel workbook and other supported data sources.

The Data Mining Add-ins contain two sets of tools: The Table Analysis tools, which let you perform analysis by using wizards and your data in Excel, and the Data Mining Client for Excel, which provides an easy-to-user interface for building data mining models. Although the add-ins make it easy to get started in data mining, the add-ins are also popular among experienced data mining developers who want to quickly and easily try out various multiple models before committing to development and deployment of a model in Business Intelligence Development Studio.

PowerPivot for Excel is an add-in for Microsoft Excel 2010 that lets you combine data from disparate sources in an in-memory relational model. You can then create sophisticated analyses based on formulas, and present the results in interactive PivotCharts and PivotTables.

PowerPivot for Excel is a tool that is designed for rapid exploration and analysis; however, it does not support the types of analysis that the Data Mining add-in does: namely prediction, what-if analysis, goal-seeking analysis, or other types of analytics that require inference of patterns from data. Therefore, the two add-ins are best seen as complementary tools that support different aspects of analysis.

Back to top


Installation

To install and use the Data Mining Add-ins for Excel, you must have the 32-bit version of Excel 2010. If you use the 32-bit version of Excel 2010, you must also install the 32-bit version of PowerPivot for Excel.

The PowerPivot for Excel add-in is available in both 32-bit and 64-bit versions, and each version requires the corresponding version of Excel. That is, you cannot install a 64-bit version of Excel and then the 32-bit version of the PowerPivot add-ins.

Therefore, if you want to use PowerPivot in the same workbook as the Data Mining Add-ins, you must use the 32-bit version of Excel 2010, the 32-bit version of PowerPivot for Excel, and the SQL Server 2008 Data Mining Add-ins for Excel.

If you need to install the 64-bit version of Office 2010 to optimize the PowerPivot experience, we recommend that you set up the data mining add-ins in a separate environment.

Note: There are no current plans to release a separate 64-bit version of the Data Mining Add-ins, because of a dependency on the 32-bit Office Web controls.

Prerequisites

.NET Framework: If you are running a version of Windows other than Windows 7, you will need to download and install .NET Framework 3.5 SP1.

SQL Server 2008 Data Mining Add-Ins for Excel: The Data Mining add-ins can be used together with the following versions of Excel:

  • 32-bit version of Microsoft Office 2010
  • 32-bit version of Microsoft Office 2007

Note: There is also a previous version of the Data Mining Add-ins for Excel that is designed to be used with SQL Server 2005. SQL Server 2005 does not support new data mining features such as automatic creation of training and testing data sets, model filters, and the new time series algorithm. If you have SQL Server 2008, you should use the 2008 version of the Data Mining Add-ins. You can still use data stored in a SQL Server 2005 database as a data source.

PowerPivot for Excel: The PowerPivot add-in requires Office 2010.

Excel Starter Edition: The following limitations apply to Excel Starter edtion:

  • Neither the Data Mining Add-ins nor the PowerPivot add-ins is supported for use with the Excel Starter edition.
  • In Excel Starter edition, you can view PivotTables but not modify them.
  • There is no 64-bit version of Excel Starter.

Side-by-side installation: You can install Office 2007 and Office 2010 on the same computer, but the versions must both be 32-bit versions of Office (x86). In other words, you cannot install 64-bit and 32-bit versions of Office on a single operating system.

Installing the Data Mining Add-ins

The following link provides download and installation instructions for the Data Mining add-ins for Excel and for SQL Server 2008.

Data Mining Add-Ins for Office 2007

Installing PowerPivot for Excel

The following link provides download and installation instructions for the PowerPivot client for Excel.

PowerPivot for Excel

Performance and Memory Considerations

This section describes performance differences between the 32-bit and 64-bit versions of Excel 2010 that might affect your decision of whether to use the Data Mining add-ins and PowerPivot on a single computer.

Memory: The limit for memory usage in the 32-bit version of Excel is 2GB.  If you need to use more than 2GB of memory to work with your model, you must use the 64-bit version of Excel with the 64-bit version of the PowerPivot add-in. Remember that actual working memory might be less than 2GB, because your workbook and your PowerPivot model must share working memory with Excel’s own processes, runtime stacks and other data structures, and any other add-ins included with Excel besides PowerPivot. Typically Excel and PowerPivot cannot use all the system resources.

Performance: In general, there is little performance difference between 32-bit and 64-bit environments on the PowerPivot client. If you are deciding whether to use a 32-bit version to accommodate use of the Data Mingin add-ins, performance should not be considered a major factor.

Data limits: If you are using the 32-bit version pf PowerPivot data to enable concurrent use of the Data Mining add-ins, you may not be able to add as many rows to your tables as you would in the 64-bit edition. If your data fits comfortably within the 32-bit address space, the workbook should open and save without problems. However, some workbooks may become too large to open on a 32-bit system. When using the 32-bit edition of PowerPivot and Excel 2010, the largest workbook with embedded PowerPivot data that you can open is about 300-400MB. If you open a workbook that requires more memory space than is available, the following message appears:

Embedded Analysis Services Engine: Could not load embedded PowerPivot data.

To resolve this problem, you can try removing unnecessary columns from the workbook. For example, text columns typically take up far more space than numerical data, because they are longer, tend to have more unique values, and are less easily compressed. You might also change the filters on the data source to import only the rows that are needed for the current analysis.

For tips on maximizing compression, and determining which columns are using the most memory, see the links in the Resources section.

Publishing workbooks: If you publish your PowerPivot enabled workbooks to a SharePoint server, you must plan for the fact that the integrated instance of Analysis Services on the SharePoint server is always 64-bit. Therefore, for you to publish workbooks developed in a 32-bit environment, the server administrator would have to provide a 32-bit version of any data providers used by your workbook.  If there are no 64-bit versions hosted on the server, then both 32-bit and 64-bit workbooks will use the 32-bit providers, and there will be no conflict.

If there are existing 64-bit providers that are preferred for use with PowerPivot, you might not be able to use the instance of PowerPivot hosted on the server to perform scheduled data refresh. In that case, you can use the 32-bit data providers included with the client to do all data refreshes manually on your own computer, and then upload the refreshed workbook to SharePoint.

At the time of this writing, most vendors have only a 32-bit version available, and have not published a 64-bit version yet. However, you should see your system administrator to determine whether there is any potential conflict with 64-bit providers.

Back to top


Technical Considerations

This section briefly discusses aspects of product architecture, and provides various technical tips that might help you better understand how to integrate the two-addins.

Data Stores for the PowerPivot and Data Mining Add-ins

To effectively use data mining with PowerPivot, it is important to understand that the data in a PowerPivot-enabled workbook can be regarded as separate but potentially related data stores:

Data stored by the PowerPivot engine: The data is loaded into memory from various relational or flat file sources, and is saved to disk as necessary, but is not part of the traditional Excel workbook.

Data that you work with in Excel worksheets, in ranges or tables: Data that is stored in the traditional Excel working set can be added to your PowerPivot data store by creating linked tables. Data that you do not explicitly add to the PowerPivot data store remains in the Excel workbook and is not accessible to PowerPivot.
Data from external sources: Both a standard Excel workbook and a PowerPivot-enabled workbook can contain data obtained by other data sources. In Excel, you get data from other sources by using the options in the Data tab. Data that you obtain this way is stored in the workbook as a local copy, until you choose to refresh it by using the menu options in Excel.

When you are using the PowerPivot add-in, you can also obtain data from external sources by using the import wizards on the Home tab of the PowerPivot window. Data that you add to the workbook by using the PowerPivot window is imported into memory and becomes a part of the PowerPivot data store. You can access this data to build PivotCharts and Pivot tables from the PowerPivot modeling window.

Note, however, that PivotTables that you build in PowerPivot are not identical to PivotTables built via traditional Excel. You use different menu options to create them—one in Excel, one in PowerPivot --and the PivotTables use different data stores. Moreover, data from external sources is loaded into each data store by using different providers.
   

Combining PowerPivot and Data Mining Data

Although both PowerPivot and Excel data are found within a PowerPivot-enabled workbook, the Data Mining Add-Ins work only on Excel tables, and cannot access the data stores used by PowerPivot. This is because PowerPivot uses a highly compressed in-memory relational data store, and does not use the same object model as data that you manipulate by native Excel applications.

The Data Mining Add-Ins can use external data that you connect to by using Excel’s data connection features, and can use data in Excel tables, regardless of whether that source data comes from a cube, Excel data in ranges, or tables linked to external data sources such as Access or SQL Server.
However, Linked Tables can be used by a PowerPivot model, while the associated source tables remain in Excel. This is really the only way to “share” data among PowerPivot applications and the Data Mining add-ins.

In summary:

  • Data that is stored in a VertiPaq data store cannot be directly accessed by the Data Mining add-ins. In other words, data that you import into your workbook by using the PowerPivot wizards cannot be used for data mining.
  • Data mining uses the traditional Analysis Services engine, running in MOLAP mode, for accessing and processing data. PowerPivot for Excel uses the new VertiPaq data store.  The VertiPaq engine currently does not support data mining.
  • You can use data from an Analysis Services cube as a data source for building a data mining model. However, when you do so, the data must be accessible as an Excel table. In other words, you export a flattened slice of cube data to Excel for use in building a model.
  • You can use Linked Tables to “move” tables of source data and data mining results into the PowerPivot cache.

Saving and Publishing PowerPivot Workbooks

The following additional restrictions may apply when you save a PowerPivot-enabled workbook, or when you publish the workbook to a SharePoint site:

  • When you create a data mining model by using the Data Mining Add-ins as client, the actual model is never contained in the workbook, but is either temporary or saved on an Analysis Server server running in MOLAP mode. Therefore, when you publish the workbook to SharePoint, users might not have access to the underlying storage server. Any results that were created in Excel based on the data mining model are stored in the Excel workbook and will be accessible to all users.
  • When you save the workbook, Excel assumes that the file should be saved with the default file name extension, .XLSX. However, in some regions, the default file name extension is set to ODF (Open Document Format), which does not support PowerPivot data. If you save the workbook to ODF or to another format that does not support PowerPivot (such as CSV, TEXT, or older versions of Excel), your PowerPivot data will not be saved with the file and will be inaccessible even if you reopen the workbook.
  • When you publish a PowerPivot enabled workbook that contains charts and reports based on a data mining table or model, the reports should be considered static. The data sources used in the mining model cannot be automatically refreshed, so if the data changes and you need to update the model, you must manually refresh the data in the workbook, and then rebuild the model and republish the workbook. 

Working with Cubes

Many people conceive of PowerPivot data as a cube that sits in a workbook. Although it is true that the PowerPivot relational store is based on a highly specialized and optimized version of the Analysis Services engine, the PowerPivot for Excel client is not intended as a replacement for existing tools for browsing or building cubes. PowerPivot does not support many operations that are possible with traditional cubes, including data mining and certain kinds of queries and processing. For working with cubes and building complex data mining models, we recommend that you use BI Development Studio.

If you import data from a cube into a PowerPivot workbook, the data is always imported in flattened form. It makes sense to flatten data if you are working with a small subset, but if you add too many levels of grouping the resulting table can be too large to work with easily and can exceed the memory limit described earlier.
If you import data from a cube into Excel by using the Analysis Services provider, for use in building a traditional PivotTable, note that you cannot use that data for data mining. You must create a separate data connection, using the Connections button in the Data Mining toolbar, to access cube data for data mining, and the data must be added to Excel as a flattened data set.

Thus, it is possible in a PowerPivot-enabled workbook to have separate three connections to the same cube data source: one connection used for a traditional PivotTable experience in Excel, one connection used to obtain data for further manipulation by the PowerPivot engine, and one connection to get data for use in creating data mining models.

This is not as strange as it sounds, when you consider the richness and complexity of cube data, and the fact that cubes cannot be altered by any of these clients. With each tool, you have a different purpose in mind and manipulate the data in different ways to provide new insights.

Working with Nested Tables

Nested tables are basically equivalent to one-to-many joins that embed related data in a single cell, and are a powerful feature in SQL Server data mining because they enable multiple attributes for each case. These structures are easy to create in PowerPivot by using relationships or DAX formulas, and therefore PowerPIvot seems like an ideal fit for data mining. However, the PowerPivot relational store cannot be accessed directly by the Data Mining Add-ins.
 
Therefore, to create mining models that use nested tables, you must either flatten the data from the nested tables and save the data to an Excel table, or use the following workaround.

  1. In the Data Mining Query Wizard, on the Select Source Data page, select the option, External data source.
  2. Leave the text box for Data source name empty, and paste in the text of a nested table query.

This workaround can be used in any dialog box where you define an external data source.

Note: Although there is no hard limit to the number of nested tables that can be specified in a data mining query, if you add too many nested tables, the number of attributes may increase and exceed the amount of memory needed for processing the model.

Back to top


Integrating Data Mining with the PowerPivot Add-in

This section describes some strategies for integrating use of the Data Mining Add-ins with PowerPivot for Excel. Regardless of whether you decide to use both add-ins in a single workbook, or keep the add-ins on separate computers to take advantage of 64-bit processing capabilities, understanding how the capabilities of the two add-ins reinforce and complement each other will help you make the most of your data.

This section discusses the following questions:

  • If PowerPivot and data mining are both intended for BI, how does BI differ from data mining?
  • Why can’t we use PowerPivot data for data mining? Isn’t more data always better?
  • If PowerPivot for Excel does not support data mining, how can the the two add-ins co-exist? Can we use them together at all?

How BI is Different from Data Mining

In most cases, when people talk about business intelligence, no prediction or hard math is involved. Instead, intelligence is derived from structuring data or adding relationships to make data easier to browse and discover. Whether that data is browsed and manipulated in a cube or in a spreadsheet, the intelligence comes from the knowledge worker or analyst who is face to face with the data.

Data mining, on the other hand, is optimal for finding patterns that are not readily apparent via exploration. The algorithms provide in SQL Server 2008 Data Mining have all been developed and used by academia and the broader data mining community for their power to discover insights that might otherwise go unnoticed. Such patterns might be discoverable by traditional BI, but typically are not. Patterns can missed for many reasons:

  • They involve many interactions, which are too complex for easy detection in a graph.
  • The size of a statistically significant effect can be hidden in large data sets, or obscured by other interactions or trends in the data.
  • Visualization is not possible, or is too complex to made sense of. Often people need to simplify a problem to understand basic patterns and infer meaning.

Another important way that data mining provides extra insight is by returning statistics that quantify relationships found in the data and that help put trends into perspective. “Browsing BI” can also give you measures of probability, of course, but only if you take the effort to create the calculations. With data mining, each algorithm generates a rich set of statistics based on mathematical analysis, and these statistics can also be used to make projections.

In short, data mining excels at identifying statistically meaningful trends in complex data, and supports those insights by providing statistics that help you interpret the meaning of the patterns that are found.

However, data mining is similar to traditional BI in that the user must contribute much to the analysis. In fact, users of data mining tools must be especially vigilant for bad data that can affect results. For example, as you beginto work with these tools, you will encounter problems such as overfitting and collinearity that were previously the concern only of statisticians.

Users of the data mining add-ins must also spend time understanding the results, lest you use conclusions without understanding them. Fortunately one of the best tools for interpreting the insights of data mining is Excel, and the PowerPivot add-in enhances Excel’s ability to investigate, back up, and support the findings of data mining. The walkthrough provides one example of how you might do that, and provides links to other examples.

How Much Data is Enough?

 When people think of data mining, they often think it requires millions of rows of data, or terabyte-size data stores, too complex to be easily viewed or understood. In that sense, PowerPivot, with its capacity to quickly load and filter millions of rows of data, seems a natural fit with data mining. After all, isn’t more data always better?

In practice, more data is not always better. Jamie MacLennan, architect of SQL Server data mining, explains: “Algorithms learn from data and, generally, feeding them with more data will improve the accuracy. This improvement, however, is asymptotic. If the training data is statistically representative of the problem space, then a classifier will stop learning after a certain limit: new data points will support existing patterns and will help reducing the risk of over-training,  but will not add more information to the model.”

Another danger with too much data is that the data can contain errors that skew the results, but you cannot afford to isually inspect millions of rows. There are of course statistical tools to assess the success of a model, such as cross-validation and classification matrices (both provided in SQL Server 2008), and scoring methods such as lift charts and ROC curves. However, no mathematical solution can completely solve the following common problems:

Bad data: Duplicate values, missing values, and entry errors. These values can create clumps of bad data or appear as outliers, all of which can distort statistical patterns, or cause a model to fail entirely.

Noisy data: Data contains many discrepancies that “throw off” the hunt for a pattern. And when you combine heterogenous data sets, discrepancies from each set of data can lead to a huge number of unique values. Ideally you want to collapse these messy values into a single attribute, or to categorize the discrepanicies to find patterns. But such cleaning and categorization is costly.

Bad distribution: Data from combined sources might contain too many cases of one kind, or too few cases of the target type. To sample data correctly you might need to merge sources first.

Too much information: Even in smaller data sets, algorithms can find multiple paths or patterns. In large data sets, the problem is worse. Although most data mining algorithms provide automatic feature reduction, to get the best results you must often collapse values into categories. Choosing the right level of categorization generally requires manually reducing the number of categories, or choosing only the most meaningful categories.

Collinearity: A second type of “extra information” that poses a particular problem for data mining is when the data contains multiple columns that might be closely related or even derived from each other, which can lead to false correlations. With PowerPivot, it is very easy to create calculated columns, which could exacerbate this problem.

As this short list of pitfalls illustrates, obtaining a large volume of data that you can work with at lightning speed does not enable instant hands-free data mining, even if data is fairly clean.

Put another way, the inability to access millions of rows via PowerPivot does not prevent you from doing data mining; rather, PowerPivot provides you with a tool that can work efficiently with data that you prepare as a candidate for data mining. No matter which tool you use, you must be selective about the data you use. Finding patterns in the data remains an art that is part data preparation and understanding, part statistics, and part interpretation.

Scenario: Using PowerPivot to Reinforce and Complement Data Mining Insights

There are several approaches that you can take when building data mining models and exploring the data with PowerPivot and Excel.

  • One approach is to load all of the related data into PowerPivot, build some PivotTables and charts to explore the data, and through analysis identify data for data mining. Then you would selectively copy just the target data into Excel, and build a data mining model.  This approachis represents a good practice for data mining, since you are likely to want to build multiple related models and compare them.
  • Another approach is to start with a smaller data set that you trust in Excel, build a data mining model to discover insights, and then share the data into PowerPivot via Linked Tables, to continue exploring the insights of the data mining model.

MVP Mark Tabladillo has published a series of videos that demonstrate the first approach, using PowerPivot to collect and summarize data, and then using the Data Mining Add-ins to discover patterns. See the Resources section for links to the video of the presentation and to the sample workbooks.
The second approach is demonstrated in the following walkthrough.

Introducing the Sample Data

The data used for this scenario is from the workbook of sample data that was provided together with the SQL Server 2008 Data Mining Add-ins for Office 2007. This sample data set was constructed so that you can easily experiment with different model types.  For this example you will use the data on the Associate tab, together with the Market Basket tool, to create a mining model that analyzes products and then makes recommendations. After generating the recommendations you will use PowerPivot to add more data, to enrich the analyses and support the recommendations.

Creating the Data Mining Model and Data Mining Reports

Click anywhere inside the table on the Associate tab and use the Shopping Basket analysis tool, on the Table Analysis tools tab, to create a Market Basket analysis. (See the Help in the Data Mining Add-ins for instructions on how to do this.)

The Shopping Basket analysis tool creates two reports: one report shows the bundles, meaning the set of products that are frequently purchased in a single customer transaction.

The other report shows recommendations that you can make to customers who buy one item of a bundle. The recommendations are based on probabilities calculated by the data mining model, given the data in your Excel table.

After viewing these reports, you decide the recommendations are potentially of value, and you intend to suggest that the company’s e-commerce site be modified to automatically suggest bundles for customers. To support this proposal, you will provide some additional analysis, by calculating the profit margins for each bundle, and showing the sales for the selected and recommended products.

Using the Data Mining Reports with PowerPivot

The results of all data mining wizards are stored as Excel tables. Therefore, you can simply convert each table to a Linked Table to view the data mining results in PowerPivot. However, conversion to a Linked Table alone will not enable you to calculate sales. You will need to add some columns, add some related data, and create some relationships.

Step 1. Import sales data

Import FactInternetSales, which is the table containing detailed sales data for the products, by using the PowerPivot Table Import Wizard. For reference purposes, also import DimProduct, a table that contains a detailed list of products with IDs.

Ideally, at this point you would link your list of bundles to the sales table and be done! However, the table of sales data contains detailed product names, not model names, and product IDs were left out at some point.   Moreover, there is duplicate data in some tables, so that you cannot create a relationship between the sales data and the bundles. Therefore you need to add some new columns, and look up relationships.

A good way to work with complex data in PowerPivot is to sketch out a diagram of how all the data should be related, so that you can identify the data that is missing. In this case you look at the tables you have, and determine that you need to make the following adjustments to perform all the calculations for the analysis:

  1. Look up IDs. For better data mining results, the original analysis collapsed several categories of discontinued and related products. This created logical groupings of products and improved better forecasting. However, as a result, the names in the report from the data mining wizard do not always match the detailed names in the product list. Therefore you need to create some lookups, to get the product IDs based on model names.
  2. Normalize the data. PowerPivot does not support many-to-many relationships. Therefore, you can’t match the list of models to the models in the products table. Instead, you need to set up your list of potential bundles, selected products, and recommended products in such a way that each table has no duplicate values. Moreover, when correctly connected, the tables should form a kind of chain that links models to sales of individual products.
  3. Aggregate where needed. Some models represent multiple products. For example, the Sport-100 model includes several products with different IDs, each product ID representing a different size and color of helmet. Because your sales data is more granular than that, to show sales for the Sport-200 model, you need to create a summary that counts all sales of Sport-100 products as the same product.

Data mismatches of this sort are very common when you combine data from different sources. Even if data originally comes from the same source, data is often modified after exporting it from the database, or the data is summarized in such a way that you cannot easily retrieve details such as ID numbers.  Fortunately, the tools in Excel and in PowerPivot make it easy to recombine the data without a lot of tedious copying and pasting.

For this scenario, to relate the bundles recommended by the data mining model to the detailed sales data, you will work with the data in Excel, and then convert those tables to Linked Tables.

Step 2. Create Excel lists and tables

In Excel, copy the information from the data mining report into separate tables for recommended products and selected products, ensuring that each table has unique entries:

  • Copy the list of Selected Products to a new Excel worksheet and remove duplicates. Convert to table. Add product IDs.
  • Repeat for the list of Recommended Products.

Next, create the master list of all model names, both selected and recommended.

  • Copy the list of Selected products to another worksheet.
  • Append the list of Recommended products to that worksheet as well.
  • Remove duplicates.
  • Add an arbitrary set of temporary IDs to the model list by using the Fill Series option.

To get the product IDs, you can use various approaches, depending on the complexity of your data. For example, the product names did not match the product names in AdventureWorks, because some product names were truncated, and others were collapsed into a single category. If you need to work with a lot of data, you could use fuzzy lookup via Integration Services or even regular expressions. However, in this case, the list was quite short, so we used the DimProducts table as a source for IDs, and simply filtered that table using the PowerPivot filter options  to show only the related products. We then copied the related product information into Excel, and used an Excel formula (similar to the following example) to look up the product ID based on model name:

=IF(MATCH([@ModelName],<lookup array>,0), LOOKUP(@ModelName),<lookup array>, <lookup result>),“not in list”)

Step 3. Share tables into PowerPivot as Linked Tables

Once you have created the tables that contains bundles, models, and products, and each table has IDs that can be used to connect them to other tables, you are ready to convert the lists to Excel tables.

But wait! At this point we highly recommend that you rename the tables, to give them a more descriptive name. By default, Excel names new tables as Table1, Table2, etc., and so forth, and PowerPivot will simply use those names when you create the Linked Tables. This can make it hard to manage relationships. (Don’t worry if you forget to rename your tables -- you can always delete the Linked Table, go back to Excel and rename the table with a descriptive name, and then re-add the Linked Table.)

Your PowerPivot data should now contain five tables: two from the AdventureWorks database, and three from the data mining reports.

Imported from AdventureWorks Tables based on data mining reports in Excel
FactInternetSales RecommendedProductList
DimProduct SelectedProductList

MasterProductList

However, if you tried to build a PivotTable using this data, you would quickly realize that the way the tables are set up will give you an error or wrong results.

Step 4. Fix relationship problems in your data

You used the table, DimProducts, to look up product IDs based on model name. However, when you add your model list and list of recommended and selected products to PowerPivot you discover an issue. Namely, PowerPivot does not allow multiple relationships between tables, because PowerPivot would not know which “path” to use when calculating the summary sales figures. Therefore, to ensure that there is one and only one relationship connecting each product to the detailed sales data, delete DimProduct. (You should have already obtained the IDs so you do not need DimProducts anymore.)

By the same token, you can’t connect both lists of products (selected and recommended) to a single table of sales details (FactInternetSales), because to do so would create multiple relationships that could be traversed when calculating aggregates. The PowerPivot restriction on having one and only one relationship between columns in any two tables also applies to indirect relationships. 

You return to your diagram of how the data is related and quickly see the problem: FactInternetSales, which is the table with all the sales data, is being used by multiple “chains” of tables.

To prevent problems in your PowerPivot formulas, you add another copy of the FactInternetSales table. Then you can use one copy of the table to calculate sales for selected products, and the other copy of the table to calculate sales for recommended products.

Now you have all the data you need to create your reports. To convince your Web development team that it is worth implementing the recommendations from the data mining model, you will create a PowerPivot PivotTable that summarizes sales for each of the targeted products.

Step 5. Create calculations in PowerPivot

Once the data has been set up correctly, it is easy to create calculations that represent key decision factors such as the sales margin per bundle, and the sales for each model or product.

You might decide to create PivotTables that show the sales for selected and recommended products, but remember that these are essentially two separate sets of data: there is a list of Recommended products that is connected to one copy of the sales data, and there is a list of Selected products that is tied to another copy of the sales data. If you try to “mix”the two in one PivotTable, you will get odd results.

In this case, because this analysis is static – the bundles will never change– you can get all of the results in one table, by creating the sum of sales as a calculated column inside the product list.

  • In the Linked Tables containing the lists of recommended and selected products, create a calculated column that calculates product margin for each row.
  • Create a calculated column that sums the sales of each product, for example: =SUMX(RELATEDTABLE(copyFactInternetSales),copyFactInternetSales[SalesAmount])

The following screenshot shows the summary of sales for each model in the selected and recommended product lists. 

 

Once you have these results, you might decide to push bundles that have a higher combined profit margin, or you might decide to prioritize bundles based on past sales of the linked products. By putting all this information together in one workbook, you have leveraged the value of data mining, and provided your sales team with the data needed to support the proposed product linkages.

Back to top


Resources

This section provides links to additional resources.

Information about system requirements, architecture, and performance:

Tips on maximizing compression, and determining which columns are using the most memory:

How much training data is enough?

PowerPivot deployment whitepaper:

Mark Tabladillo presentation of integrating PowerPivot  with data mining:

PowerPivot team blog:

More information about installation issues:

Back to top


Appendix: Troubleshooting Installations of PowerPivot with Data Mining

You can install the PowerPivot add-in and the data mining add-ins in any order. However, in some cases one of the add-ins might fail to initialize, or some menu items might not appear. Also, uninstalling and reinstalling the add-ins often does not fix this problem. This section describes common problems and some suggestions for resolving the issues.

If you have a hard time determining the cause and likely solution, you can use the following procedure to get a less generic error message that might be useful for troubleshooting.

To display alerts from the PowerPivot engine

  1. Right-click My Computer or Computer, and select Properties.
  2. Click the Advanced tab or Advanced System Settings link.
  3. Click Environment Variables.
  4. Under User variables, click New to create a new user variable.
  5. For Variable name, type VSTO_SUPPRESSDISPLAYALERTS. For Variable value, type 0. (zero) Click OK.
  6. In Control Panel, select Programs and Features (or Add/Remove
    Programs), select PowerPivot, and select Repair.
  7. Open Excel again. This time an error message should display.

Problem: Installation of the PowerPivot add-in disables the data mining add-in or some of its components

You open Excel and either PowerPivot or the data mining add-ins fails to display; however, there is no error message indicating the cause. In general, your first step should be to check the list of COM add-ins that are used by Excel to see whether the add-in you want is installed.

To check whether the add-in is in the list of disabled items

  1. On the Excel File menu, select Options, and thec click Add-Ins.
  2. Check the lists for Active Application Add-Ins, and Disabled Application Add-Ins.
  3. The following add-ins are used for PowerPivot and for data mining.
Data Mining Client SQLServer.DMCClientXLAddIn
Table Analysis Tools SQLServer.DMXLAddin
PowerPivot for Excel Microsoft.AnalysisServices.Modeler.FieldList.vsto

If the item is in the Add-ins list, it is installed but not necessarily enabled.

  • Add-ins that are installed and loaded are displayed in the upper part of the list.
  • Add-ins that are installed but disabled are displayed in the part of the list titled, Inactive Application Add-ins.

To enable the add-in

  1. In the Manage dropdown list, choose Disabled Add-ins, and then click Go.
  2. Select the disabled add-in, click Enable, and then click OK.
  3. Click Close.
  4. Check the check box next to it to the add-in you want to use.
  5. Close and then re-open Excel again.
  6. If Data Connections have been disabled, click Enable.

Problem: Component cannot be enabled via Excel Add-ins Manager

Sometimes, even when you enable the add-in by using the Excel Options dialog box, the add-in still does not load properly. In this case you might be able to re-enable the data mining or PowerPivot add-ins by using one of the following procedures.

To modify the registry entry that controls add-in load behavior

  1. Open a Command Prompt window.
  2. Type regedit, and then press Enter.
  3. In the Registry Editor, locate the following key:

 CURRENT_USER\Software\Microsoft\Office\14.0\Excel\Resiliency\DisabledItems

Because this key is stored as binary data, you must view the value data for the registry key to verify whether the entry applies to the disabled add-in you want to re-enable. The value data contains a full path to the add-in, as well as the title of the add-in. If you find the name of the add-in you want to use, delete the registry key.

  1. Open Excel, open Add-in Manager, and enable the add-in.
  2. Close Excel and open the file again. The add-in should now be enabled.

Problem: Critical error in Data Mining Add-in when starting Excel

When starting Excel, a message appears saying that the Data Mining add-in has a critical error, and will be unloaded. You have the option to unload the add-ins; however, even if you choose to not unload the add-ins, the add-ins are unavailable.

If you use the Excel Add-In Manager to check the status of the add-ins (SQLServer.DMCClientXLAddIn and SQLServer.DMXLAddin(, you find that both are unchecked and therefore not loaded. However, checking these options does not restore the data mining add-ins.

In this case, the problem can be with mscoree.dll, a component required by the Data Mining Add-ins. In certain security contexts, registration or loading of mscoree.dll can be blocked. Herefore, to use the Data Mining add-ins, you must enable mscoree.dll and then restart Excel.

To register mscoree.dll

  1. Open Excel, and from the File menu, select  Options.
  2. In the Manage dropdown list, select COM Addins. Click Go.
  3. Select mscoree.dll Click OK.

Error message: Managed add-in loader failed to initialize

If you installed Office 2010 and then downloaded the PowerPivot add-in, but PowerPivot menu options are not available, you might not have installed VSTO, which is a set of components that Office uses for automation and programmability extensions. To install VSTO, you must run Office setup.

To install VSTO

  1. From Control Panel, choose Programs and Features (or Add/Remove Programs).
  2. Select Office 2010 in the list, and click Change.
  3. Select Add or Remove Features.
  4. Click Installation Options.
  5. Under Microsoft Excel, make sure that “.NET Programmability Support” is installed.

Error message: PowerPivot was disabled because it prevented the normal operation of Excel

You open Excel, and find that one of the add-ins was missing. When you try to re-enable the add-in, a warning appears explaining that PowerPivot was disabled because it prevented the normal operation of Excel.

To re-enable PowerPivot

  1. In the Manage dropdown list, choose Disabled Add-ins, and then click Go.
  2. Select the disabled add-in, click Enable, and then click OK.
  3. Click Close.
  4. Check the check box next to it to the add-in you want to use.
  5. Close Excel, and then re-open Excel.
  6. If data connections have been disabled, click Enable when prompted. The PowerPivot menus should be enabled.

Back to top


See Also

 

Sort by: Published Date | Most Recent | Most Useful
Comments
Page 1 of 1 (1 items)