none
Request for advice on DW structure for survey data

    Întrebare

  • Hi there,

    I'm posting this in the hope that someone out there can give me a bit of advice with my DW design. I'll try to describe the business scenario, and also what I'm trying to achive.

    The DW is going to to hold survey response data, which is being imported from the source system as a .csv file (I have no control over this). In the source data there is one row per survey response, and one column per survey question which a) may or may not be answered, b) will have different answer types (i.e free text, drop-down list of choices, true/false, number range etc). The admins of the survey application may also add new questions without my knowledge, which will result in new columns in the .csv file. My ETL will need to handle this eventually, but for now this is not the priority issue.

    What I'm struggling to come up with is the correct design for the 'Question & Answer' dimension table and how this links back to the fact table. I suppose in the star schema world I should have a single dimension of dimQuestion which contains all questions and all answers, which gets incrementally loaded with new questions and answers as they are found in the source data. Then I need to link the surrogate key for the unique question & answer combination back to the column in the fact table, for the relevant question.

    I was just wondering if anyone has experience of a similar scenario? And whether this design is the right way to go? I've had a go at it but when I come to build the dimQuestion dimension in SSAS, I seem to get multiple 'copies' of it - one for each question. This is clearly not the desired result, what I would like is a single dimension containing the list of questions, and then create measures based on the responses.

    I'd be grateful for any advice on the best sort of structure for this kind of data.

    Thanks
    Matt


    • Editat de Matt FTW 15 martie 2012 10:15
    15 martie 2012 10:14

Toate mesajele

  • I faced a similar scenario few months ago. I ended up designing 3 tables:

    DimSurvey - 1 row per survey

    DimSurveyItem - 1 row per question

    DimResponse - 1 row per distinct response value

    FactItemResponse - 1 row per response,questios, respondant in a giving date.



    I faced a similar scenario few months ago. I ended design 3 tables:

    DimSurvey -1 row per survey<o:p></o:p>



    DimSurveyItem- 1 row per question<o:p></o:p>



    DimResponse- 1 row per distinct response value<o:p></o:p>



    FactItemResponse- 1 row per response, questions, respondent in a giving date. <o:p></o:p>



    In my case the source sent the in XML, so it new questions were added, new nodes got appended to the xml fragment. Our ETL and an step to parse the file and normalized in such a way that will fit the granularity of the tables above. I would imagine you would need to do something similar for the csv files as columns are
    added. <o:p></o:p>



    I don'treally understand the issue you are having in SSAS. If you record only distinct item question in DimQuestions, I don't see how that can produce duplicates.<o:p></o:p>



    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more


    16 martie 2012 01:02
  • Hi Rafael,

    Thanks for your response and advice! I will have a go with this structure and see if it produces the desired results. Can I just clarify how you structured the fact table: are you saying that you have one row per survey question, and then a column each for SurveyItemKey, ResponseKey, DateKey etc? So each survey will have multiple rows in the fact table: one per question answered. Is that correct?

    At the moment I have one row per survey, with a column for each question. I think this is perhaps where I am going wrong.

    Thanks again
    Matt

    P.S tbh I don't really understand the SSAS problem either. I think once I have the structure correct and clear in my mind I will soon be able to sort that out.

    19 martie 2012 09:59