Many to Many Dimension - Does it make sense to use in this scenario?

Proposed Answer Many to Many Dimension - Does it make sense to use in this scenario?

  • Tuesday, May 29, 2012 12:16 AM
     
     

    Hi there!  Hoping that I can get some advice here.

    I have a fact table which is linked to both time (hours - minutes - seconds) and days (years - months - weeks - days).

    Some of my calculations (cumulative sums with period to date function) must take place at the day level (minutes/hours).  Now I've been able to produce my sums accurately but the calculation start and end times are not at the times within the day that I desire which is only normal because my time dimension table starts at the first second in the day and ends at the last second in the day.

    If I would like the ability to generate my calculations based on specific segments of the day does it make sense to bridge a table between my time dimension table and a session dimension?

    In this case a session would define a segment of time.  Example, times from 09:30 AM - 4:00 PM will map to Session 1.  Times from 4:00PM - 09:30 AM will map to Session 2.

    So the overall desired structure is a Fact Table which points to Date and Time dimensions.  And an intermediate table which connects a Session dimension with my Time dimension.

    Appreciate your comments in advance.

All Replies

  • Tuesday, May 29, 2012 5:40 AM
     
     Proposed Answer

    Can you not simply add an attribute to the time dimension that would map the time component to the Session? This should give you much better performance, unless you have to take into account the day of week as well?

    Looking at your example, you are not getting any advantage from using a Many to Many. If you had Session 3 that was say 2:00PM to 6PM then I would possibly consider the Many to Many.

  • Wednesday, May 30, 2012 2:49 AM
     
     

    Can you not simply add an attribute to the time dimension that would map the time component to the Session? This should give you much better performance, unless you have to take into account the day of week as well?

    Looking at your example, you are not getting any advantage from using a Many to Many. If you had Session 3 that was say 2:00PM to 6PM then I would possibly consider the Many to Many.

    I've done what you said, I took a Named calculation using some simple case statements and added a reference dimension.  This works if you do not have any overlapping dimensions.

    But if you do have overlapping dimensions and I could see this being possible down the road, would that justify the Many to Many case?

  • Wednesday, May 30, 2012 3:53 AM
     
     Proposed Answer

    Unless you need a great deal of flexibility in assigning segments and these segments are changing frequently, I would avoid using a M2M to address these problems. If you can't define the segment as an attribute of the Time dimension directly as Michael suggested as it dependent on the day or other dimension, I would create a separate dimension in your cube space and either modify your ETL or encompass your logic in a view to associate each fact with the Segment dimension. This approach, of course, only applies if you segment definitions aren't going to need to be modified frequently. If these definitions are not relatively static and would frequently change, the M2M approach that Marco Russo and Alberto Ferrarri discusses would be a better approach. http://www.sqlbi.com/articles/many2many/

    HTH, Martin


    http://martinmason.wordpress.com

    • Proposed As Answer by Michael Riemer Wednesday, May 30, 2012 7:57 AM
    •  
  • Wednesday, May 30, 2012 5:50 AM
     
     

    Can you not simply add an attribute to the time dimension that would map the time component to the Session? This should give you much better performance, unless you have to take into account the day of week as well?

    Looking at your example, you are not getting any advantage from using a Many to Many. If you had Session 3 that was say 2:00PM to 6PM then I would possibly consider the Many to Many.

    Day of the week is important, for the reason that.

    A session can belong to multiple parts of the day but the session itself is a different session.

    Eg. Start at 4:30PM 5/29/12 end at 09:29 AM on 5/30/12, start again at 5/30/12 at 4:30PM.  Here we have times that fall on the same day.  But really they should not be included with each other since these are two distinct sessions.

  • Wednesday, May 30, 2012 7:57 AM
     
     

    I would look at the approach Martin has proposed above and see if you can't associate the facts with individual sessions.

    When you say, that the evening session should be different to the morning session (which is from the previous evening) how would you identify the fact that it is different? I think that is all the more reason to use Martin's approach above.

  • Wednesday, May 30, 2012 2:41 PM
     
     

    I would look at the approach Martin has proposed above and see if you can't associate the facts with individual sessions.

    When you say, that the evening session should be different to the morning session (which is from the previous evening) how would you identify the fact that it is different? I think that is all the more reason to use Martin's approach above.

    You are right.

    What should the grain of the session dimension be. Should I have 1 entry per session i.e. similar to Date dimension.  Or should I have it based on day of week.  The benefit I see of doing the former is that it will be much easier for me to compute things like lead and lag.

  • Wednesday, May 30, 2012 3:47 PM
     
     

    Based on your requirement with Lead and Lag around the actual sessions, I would probably go with one session per day as you mention, although I would consider how that actually gets used.

    If there is a means of getting the data into something like day of the week, it "may" provide better performance, in that your dimension will be smaller, but that could make your lead/lag calculations more difficult.

    I would also consider how likely the requirements around the sessions are to actually change in future too.

  • Wednesday, May 30, 2012 4:51 PM
     
     

    Based on your requirement with Lead and Lag around the actual sessions, I would probably go with one session per day as you mention, although I would consider how that actually gets used.

    If there is a means of getting the data into something like day of the week, it "may" provide better performance, in that your dimension will be smaller, but that could make your lead/lag calculations more difficult.

    I would also consider how likely the requirements around the sessions are to actually change in future too.

    Sessions will definitely never change.  One thing though, if I need to have 3 sessions instead of 2 e.g. One session runs from 9:00 AM - 3:00PM, other runs from 9:00 AM - 4:PM would I need M2M or could this be handled some other way?

  • Wednesday, May 30, 2012 4:57 PM
     
     

    If sessions will definitely never change, I would avoid a M2M dimension and implement a separate Session dimension. M2M are nice but not to address these requirements. A session ID should be assigned to each fact record during your ETL or implementing logic in a SQL view.

    HTH, Martin


    http://martinmason.wordpress.com

  • Wednesday, May 30, 2012 9:56 PM
     
     

    If sessions will definitely never change, I would avoid a M2M dimension and implement a separate Session dimension. M2M are nice but not to address these requirements. A session ID should be assigned to each fact record during your ETL or implementing logic in a SQL view.

    HTH, Martin


    http://martinmason.wordpress.com

    Martin and Michael this works great.  Thank you!

    I have things setup so that my facts are joining on a key defined by Date,Session and Type.  The type is simply a parent so that I can break my session into parts that over the course of the session.

    So hierarchy is:

    Date

    Type

    Session

    A couple of questions.  Is it possible to have a default session so that when I slice by type only one session is chosen before another.  Second, would it make sense to link my Session Dimension to my Date Dimension?  Can my fact dimension be related to my Date dimension and my Session dimension be related to my Date dimension?