none
Dynamic loading of Staging tables to DW tables RRS feed

  • General discussion

  • Hello everyone,

    I've been musing this idea and I'm not sure if it's brilliant or just daft, it would probably be described as unconventional! I thought I'd put it forward here to see if anyone can tell me why it's bad and why I shouldn't be doing it, or indeed agree!

    So I've created an abstraction from the operational system and loaded this into a bunch of staging tables, these are all in the staging schema. So I have Staging.issues Staging.Jobs etc, these tables are all heaps. I have then created a DW schema and have duplicated the tables dw.issues dw.jobs etc, these are indexed as necessary and contain a primary key (some don't but I'll get onto why later!... I said this was unconventional!).

    I then dynamically check sys.objects to see what tables are in the staging schema, if the table doesn't exist in the dw schema it simply does an 'select into dw.[newtable]' thus creating the dw table (obviously this won't have any indexes, these would need to be added manually). If a dw.[table] doesn't have a primary key it truncates the dw.[table] and inserts the entire staging.[table] into the dw.[table].

    If the dw.[table] has a Primary Key(s), the rows that exist in the staging.[table] (based on the keys) are deleted from the dw.[table] and the staging.[table] is inserted (therefore dealing with updates and new rows)

    I know that tables not having a primary key is bad (there won't be too many of these) but given these are just going to be full table loads the ID column will still have an index on it, I don't see it as that bad!? Of course I could create a control table for full loads vs delta loads but that would mean another thing to maintain, other suggestions to identify a full load vs delta are welcome.

    So that's the theory, and it means that to create new DW tables we simply need to add more to the ETL that creates that staging tables and we're done. Does this make sense?

    This feels like a pretty good solution, but I'm sure I'm missing something, it feels too easy!

    Any thoughts welcome on this.

    Cheer, Rob.



    • Edited by RobFarley Wednesday, April 13, 2016 2:21 PM
    Wednesday, April 13, 2016 2:15 PM

All replies

  • Well, I've written this now and it works really well, would be great to hear any comments around this just to sense check the concept
    Wednesday, April 20, 2016 9:19 AM
  • Yes, it sounds like it will work.

    In a data warehouse though, sometimes loading a DW.[table[ is not as simple as copying from a stage to the table.

    The T in ETL stands for transform. Which means some of the data might need to be transformed before put into DW.table like a range of sizes instead of the sizes. Transforming a single character to a more descriptive character, etc.

    Also, the stage might not be in the format of a Dimensional Model.

    You could also look into SQL Server replication to do the same thing is this is just a copy of some OLTP tables. Might be simpler and you can add your reporting specific indexes to the new tables that are replicated in the DW not OLTP.

    Thomas


    TheSmilingDBA Thomas LeBlanc MCITP 2008 DBA

    Tuesday, May 17, 2016 2:03 PM
  • Correct Rob,
    that is "too easy" and does not implement many of the features you would want from a data warehouse that is going to be useful over time.

    Some Egs.

    1. Incremental updating of rows that have changed while discarding rows that have not changed.

    2. You seem to have not created generated integer keys for your dimension tables. We actually also generate integer keys for fact tables. Not many people understand why.

    3. Transforming data values and correcting them along the way. For example error processing to mark rows in error in the staging area to stop them from flowing forward in to the data warehouse if they are in error.

    4. You seem to not have a dimensional model in your "data warehouse".

    All in all. You are doing what was done 25 years ago as in "copy operational data in to a relational database and call it a data warehouse". We discouraged this sort of thing even back then. I can not believe that 25 years later we actually have to tell people it is a bad idea to just copy operational data to another place and call it a data warehouse....the fails with even the smallest of projects.

    The work that is needed to be done to create a proper data warehouse is substantial and I have built the worlds best tools to do that over the last 20 years.

    The end user accessible database should be dimensional.

    We have a way of putting all history in to a dimensional model so no need any more for an archive (such as data vault) and analytical layer (dimensional) if you can live with a few limitations.

    Dimensional can be cubes or star schemas...I prefer star schemas.

    So yes....you are right to feel like you are missing something...you are missing quite a lot. 


    Peter Nolan

    Tuesday, June 7, 2016 1:56 PM
  • Check out the exam 70-463 content topics:

    https:/www.microsoft.com/en-us/learning/exam-70-463.aspx

    Designing and implementing dimensions and fact tables should provide you with more insight to data warehousing.

    Tuesday, June 7, 2016 2:21 PM