I have a question from a statistical member of my team exploring the Data Mining add-in for Excel 2010 built on a SQL2012 db:
"There is a model score calculated when using the Accuracy Chart tool for models with a continuous output – how is this calculated? I’ve ruled out R-squared and mean-squared-error, but can’t figure out what it’s actually doing. "
In a Lift Chart in Data Mining, the value for Score helps us compare models by calculating the effectiveness of the model across a normalized population, and a higher score is better. The Score is the log scatter score for a scatter plot of (x=Actual Value,y=Predicted Value) where each point has an associated probability. Here is how it works:
0) Each point on a scatter plot corresponds to a test case and has the form (a,b(M)), where a is the actual attribute value for the case and b(M) is the value predicted using the model M;
1) First define the score(a,b(M)) for *one* individual scatter point; to do this compare the (a,b(M)) to the best prediction we can do without using any model; that prediction is of course marginalMean; so
score(a,b(M)) = likelihood ( b(M), given a, given M) / likelihood( marginalMean, given a)
2) Then average out all the scores across the entire scatter plot;
Technically is was simpler to average them out as
score = ( Product[ score(a,b) | forall (a,b) in the scatter plot])^(1/n), where n is the number of points in the scatter plot.
For detailed information, please see Shuvro’s answer in the following thread:
If you have any feedback on our support, please click here
We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.
My colleague reviewed your response and that of Shuvro, but had the following response:
If I’ve understood the calculation correctly, there are two properties of the model score that concern me.
- The score for an individual point has the denominator equal to pdf( N(a, marginalStdev), marginalMean) – doesn’t this mean that the denominator will be smaller for points in the extremes (i.e. where the actual value isn’t close to the mean of all actual values), and thus that an identical |predicted value – actual value| error will give a higher score in the extremes than in the middle. In effect, values away from the centre are considered less important to predict well, and a model which predicts the centre very well but is rubbish away from it might be preferred (falsely) over a model which predicts decently well for all values.
- The PredictStdev value is used for the pdf of the numerator. If I’ve understood correctly, this is based on how well the model fits the training data. Therefore, a model which fits the training data well could be preferred over a model which predicts the test set better, but fits the training data less well. This seems to be absolutely contrary to the purpose of having an independent test set, which is to guard against overfitting of the training data.
Are there any published references which support the use of this model score rather than a standard metric such as Rsquared or MSE?