Answered enterprise data warehouse

  • Thursday, January 24, 2013 9:03 PM
     
     

    Hi.  We are faced with either creating a new star db that is partially sourced by our current star or baking it into our existing star.  The new data and modified look of old data would be of interest to a very specialized audience.  

    Because the new star would be partially sourced by data from outside the current star, any attempt to keep their shared dims in only one of the dbs would probably lead to disaster.

    I'd always hoped the existing star would be our "enterprise" data warehouse.   But my instincts are telling me to keep the "new" stuff separate even if redundancy occurs.

    The thing that bothers me is that this would seem to fly in the face of Kimball's and perhaps Inmon's best practices.

    The reasons my gut tells me to keep them separate are many but here are a few:

    1) Until the new stuff matures, I'd hate to upset the apple cart (current star) that sources our largest cube and largest audience.

    2) The complexity that would result in putting everything in the same star might be a good reason not to.  This may be true as well for other specialized marts that come up in the future. 

    3) The audiences of the two stars are different.  This means that I might treat an emergency in one different from the other, ie recovery priorities and approaches could be different.

    4) the volume of data that could result in the current star (in spite of partitions and/or indexes) might be an argument against combining the stars.  For example, the new star requires what we call zero records with valid biz keys.   99% of the company doesn't care about these.  Also, our etl would run longer and might therefore keep our largest audience waiting too long when we get behind.  Also restores and backups would take longer.

    I'm trying to contact a mentor that used to be with Kimball but until that happens or as an alternative path, can the community comment on my dilemma?  I suppose the sum of the two stars could still be considered an enterprise wh but who knows.   I'd like to avoid re reading parts of the two "bibles" if at all possible.   


    db042188

All Replies

  • Wednesday, January 30, 2013 2:35 PM
     
     Answered

    In the Kimball practice, you would create a different Star Schema (not database necessarily) and use conformed dimensions where you are able.

    The 2 Star Schemas together (and any additional star schemas added) would be you EDW.

    Thomas


    TheSmilingDBA Thomas LeBlanc MCITP 2008 DBA

    • Marked As Answer by db042188 Wednesday, February 06, 2013 5:28 PM
    •  
  • Wednesday, January 30, 2013 5:40 PM
     
     

    Thx Thomas.  If I prefer to create the new measure group in a separate db (relational star), would Kimball scold me for recreating some of the dim tables there also...with a fair amount of dim duplication...but completely independent surrogate keys for loose coupling reasons?

    Is it possible that this kind of decision is where silos emerge and company's start defeating the purpose of attempting to have an enterprise data warehouses?


    db042188

  • Wednesday, January 30, 2013 9:21 PM
     
     Proposed Answer

    Well, you really need to re-think creating the same Dimension table in different star Schemas because not only of duplcaition of data, but is one ETL package is updated and someone forgets the about updating the second one, there could be problems with data integrity.

    If the business user starts to see problems with the data, they will stop trusting and relying on IT for BI reports, and start requesting exports of data and use womething like Excel and calculations in Excel rather that a centralized EDW.

    Thomas


    TheSmilingDBA Thomas LeBlanc MCITP 2008 DBA

    • Proposed As Answer by TheSmilingDBA Monday, February 04, 2013 4:43 PM
    •  
  • Thursday, January 31, 2013 2:02 PM
     
     
    Thx.  Its an interesting tradeoff.  Overcomplicating (and perhaps breaking) the 1st or introducing duplicate effort.  We do use reusable subpackages in our etl, so the duplicate effort may not be a big problem.  And we know there will be quality in the dims.   Users will trust both. 

    db042188

  • Monday, February 04, 2013 9:10 PM
     
     

    I offered this opinion to our reluctant group.

    Duplication of data seems to be a necessary byproduct (sometimes) of the storage technology chosen. 

    If we were using a different data source technology, perhaps we wouldnt need to duplicate data but I'm sure other challenges would arise.  At the moment we are happy with what we have. 

    Performance and quality are paramount.  If duplication is necessary to achieve these things in some technologies, I dont see a problem.   In my mind, quality includes being able to support different service level agreements on what might look like the same data but isnt.  


    db042188

  • Thursday, February 07, 2013 10:18 PM
     
     

    Go for it!!!

    Thomas


    TheSmilingDBA Thomas LeBlanc MCITP 2008 DBA