Answered by:
MDX Correlation function
Question

I am performing analysis of a survey, and need to compute the correlation of the responses (scores) between questions. For example, does a high score to question 1 correlate with a high score to question 2. (All my questions have possible scores of 1, 2, 3, 4, or 5, with 5 being the best score and 1 the worst).
My underlying data table has three relevant fields: person id, question number, and score. In the cube, Person id is a link to a corporate heirarchy (a linked dimension with multiple levels), and question number links to the question dimension. Score is a measure that aggregates as sum. I also have a calculated measure named "Average" defined as [measures].[score]/[measures].[Record Count]. This cube works just fine for all my purposes, except for calculating the correlation between questions.
To do so, I created a copy of the Question dimension and called it "Correlated Question". Then, I have tried to define the correlation as:
Correlation({[Corporate Hierarchy].[Person Dim Id].members},([Question].currentmember,[Measures].[Score]),([Correlated Question].currentmember,[Measures].[Score]))
with the intent of displaying this with question on rows and correlated question on columns.
Can someone help me understand what I am doing wrong, or provide me with some code snippets that use the correlation function ?
Tuesday, July 31, 2007 10:01 PM
Answers

Recently I've implemented something similar (actually the square of the Correlation, aka RSquared) for survey results. A couple of points that might help:

As Darren mentioned, the Survey section of the "manytomany revolution" whitepaper is instructive  in your scenario, you might not need to set up a manymany Question/Answer dimension (though that could give you more flexibility). But I think you'll still need a second copy/alias of the fact table/named query, which relates to your "Correlated Question" dimension rather than to the "Question" dimension (in the paper, there is a named query fact alias for each copy of the QuestionsAnswers dimension). All other dimensions would relate to both fact tables/measure groups. In the Correlation function, you would then use: ([Question].currentmember,[Measures].[Score1]) and ([Correlated Question].currentmember,[Measures].[Score2]).

It looks like you're trying to compute correlation over [Corporate Hierarchy].[Person Dim Id].members, which will work if that's the grain of the survey fact table (in other words, if there is only 1 survey per [Corporate Hierarchy].[Person Dim Id] member). But the set should probably be: [Corporate Hierarchy].[Person Dim Id].[Person Dim Id].members, to avoid including the [All] member. In my case, in order to compute correlation at the fact table granularity, I'm using intermediate cube measures with a custom correlation calculation (based on a standard correlation formula), to improve performance. This approach might be useful with large volumes of survey data.
Wednesday, August 1, 2007 7:06 AM 
All replies

You don't actually describe what your issue is, but my guess would be that you are only seeing correlation coeffients for each question with itself and not with other questions. If this is correct it would be because you have a regular relationship between "Question", "Correlated Question" and the fact table.
The problem is that with 2 regular relationships Correlated Question 1 only ever has a score for Question 1.
Have a look at the manytomany revolution whitepaper here http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/7/Default.aspx
It has a whole section on survey analysis and if you set up your cube with a similar structure you should be able to get the correlation function working.
You might also want to look into the Data Mining features of SSAS at some point as they might allow for more sophisticated analysis than simple correlations between 2 questions.
Wednesday, August 1, 2007 12:51 AM 
Recently I've implemented something similar (actually the square of the Correlation, aka RSquared) for survey results. A couple of points that might help:

As Darren mentioned, the Survey section of the "manytomany revolution" whitepaper is instructive  in your scenario, you might not need to set up a manymany Question/Answer dimension (though that could give you more flexibility). But I think you'll still need a second copy/alias of the fact table/named query, which relates to your "Correlated Question" dimension rather than to the "Question" dimension (in the paper, there is a named query fact alias for each copy of the QuestionsAnswers dimension). All other dimensions would relate to both fact tables/measure groups. In the Correlation function, you would then use: ([Question].currentmember,[Measures].[Score1]) and ([Correlated Question].currentmember,[Measures].[Score2]).

It looks like you're trying to compute correlation over [Corporate Hierarchy].[Person Dim Id].members, which will work if that's the grain of the survey fact table (in other words, if there is only 1 survey per [Corporate Hierarchy].[Person Dim Id] member). But the set should probably be: [Corporate Hierarchy].[Person Dim Id].[Person Dim Id].members, to avoid including the [All] member. In my case, in order to compute correlation at the fact table granularity, I'm using intermediate cube measures with a custom correlation calculation (based on a standard correlation formula), to improve performance. This approach might be useful with large volumes of survey data.
Wednesday, August 1, 2007 7:06 AM 

Thank you Darren for a very good white paper.
I did successfully implement the correlation function (and verified the results). The real key was:
Deepak Puri wrote: But I think you'll still need a second copy/alias of the fact table/named query, which relates to your "Correlated Question" dimension Many thanks again.
Wednesday, August 1, 2007 5:58 PM