none
Many small reference tables: should they be dimensions or attributes RRS feed

  • Question

  • Hi Folks, I'd like your opinion on best practices and design patterns, here.

    Consider a buncha commingled data . . . you know the kind of thing . . . there's a main table with numbers and strings, and several lookup tables.  

    Some of the lookup tables only have a few entries, and maybe a single code/value pair.  Indeed, in the main table, there are a number of strings that could be changed into codes, and externalised as lookup code/value tables.  As this modelling process proceeds, we would end up with a main table containing numbers (going to be our Fact Table), some lookup tables that have a lot of entries and are clearly candidates for building hierarchical Dimensions, and some other stuff such as (say) sex, marital status, ethnic, and other code/value pairs.  

    Now, in a Kimball relational modelling scenario, I'd likely build a junk dimension to hold the non-associated code-value pairs . . . I don' wanna build no centipede after all!  But is that valid for SSAS?  After all, a 'cube' is actually the sparse intersection of numerous attribute hierarchy coordinates.  

    1.  I could build many little dimension tables, put the codes into the main Fact table, build many little dimensions in SSAS, then and trust in SSAS to bring it all together for me.  That would give me many little dimensions.

    2.  I could just forget about the code/value pair business, and translate all codes to strings (say 'Male', Female') right there in the main fact table (essentially considering them to be collapsed dimensions), and let SSAS create attribute hierarchies in a Fact Dimension created from the main table, with the numbers populating an associated Measure Group. That would give me one big Fact Dimension with a buncha sliceable, diceable, and sortable attributes.

    What do you think?  Do you prefer 1 or 2?  Or is there a 3?

    What do you think is best practice?


    Donna Kelly

    Tuesday, July 16, 2013 4:20 PM

All replies

  • Hi Donna,

    I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated. 

    Thank you for your understanding and support.

    If you have any feedback on our support, please click here.

    Best Regards,


    Elvis Long
    TechNet Community Support

    Friday, July 19, 2013 1:40 AM
    Moderator
  • Hi Donna,

    Junk dimensions are certainly valid for SSAS. The table representing your junk dimension may contain several low cardinality attributes with the rows in that table representing the unique combinations occurring in your fact tables. In SSAS, you'd create a single dimension from that table with the tables surrogate key being the Key attribute of the SSAS dimension. You'd set the Visible property for that surrogate key to false (used only to relate to your measure groups) and each of the Junk dimensions attributes would be directly related to the Key. I'd rename the Junk dimension to be more representative of the types of attributes your Junk dimension contains such as Misc. Demographic Attributes rather than naming it Junk dimension.

    HTH, Martin 


    http://martinmason.wordpress.com

    Friday, July 19, 2013 2:54 AM
  • Hello Donna,

    Thanks for your post.

    If the fact record is not large, you can use option 2 to build a large fact dimension that holds a lot of attributes, these attributes are not related to each other.

    If the fact record is large, however, for example over 10 to 20 million rows, you may run into either the 4GB limit issue, or processing performance issue. Since every attribute needs to go against a large table, processing so many attributes with Select Distinct ... From ... for each attribute can take time. If you use the ProcessGrouping setting=ByTable, it can cause processing run out of memory.

    Hope this helps.

    Thanks.


    Meer Al - MSFT

    Tuesday, July 23, 2013 9:51 PM
  • Meer Al . . . 4GB limit?  Surely that only applies to 32 bit?  I could not see any such limitation in Maximum Capacity Specifications for SQL Server http://msdn.microsoft.com/en-us/library/ms143432.aspx

    Martin:  junk dimensions are not really what's at issue here.  These  attributes are not a bunch of unrelated bits and pieces to be bundled together for UX simplicity, but on reflection are actually dimensional attributes such as Marital Status of Person.  I think I've just answered my own question!

    In fact, I've gone down the option 2 path, and this seems to work well.  The only concern I have is with regard to dimension maintenance.  If something like Marital Status or indeed any code/value pair is externalised as an SCD 1, then maintenance of the text field is straightforward.  But if it is embedded within the fact table . . .and eventually processed into a Fact Dimension, then maintenance is not so straightforward.

    Ummm.  Need to think about this one a bit more . . .


    Donna Kelly

    Monday, July 29, 2013 3:35 PM
  • Hello Donna,

    Perhaps, I should have been more clear, I was referring to 4GB string store limitation, specific to Analysis Service. More discussions here:

    http://connect.microsoft.com/SQLServer/feedback/details/396299/analysis-services-reports-filestore-error-when-saving-dimensions-greater-then-4gb-in-size

    Hope this will clarify.

    Thanks.


    Meer Al - MSFT

    Monday, August 5, 2013 5:29 PM
  • I'd like to highly discourage option 2, embedding attributes directly in the fact table. Going down that path is, at least IMHO on a technical level, the principal reason many data warehouse projects become abysmal failures. If nothing else, that approach violates mathematical principals foundational to the dimensional approach in the first place. Marital Status isn't an actor involved in your business process, is it? No, it's really a customer that's involved and Marital Status is just one possible customer related attribute that analysts need to slice and dice by. Therefore, Marital Status should be included in the Customer dimension and not the fact or any other fact table. If customer related attributes are directly associated with a fact record, you're going to eventually create a horrible mess for your analysts to navigate...and an almost impossible situation for you to extend when you also need to associate Sales with Orders and Procurement or other processes associated with your business domain. Or the ability for you to incorporate additional sources into your Sales data mart.

    I can't stress enough how important it is to read and have a thorough understanding of Kimball's The Data Warehouse Toolkit. I've read it and reread it several times and every time I read it, I learn a little bit more. It's not complete enough to give you a complete arsenal to attack every dimensional modeling problem you'll face, but at least if you follow the core principals, you'll not create a major problem for someone else to clean up.

    HTH, Martin


    <a href="http://martinsbiblog.spaces.live.com" target="_blank">http://martinmason.wordpress.com</a>

    Monday, August 5, 2013 8:13 PM