locked
What purpose does a staging database serve relative to the data warehouse? RRS feed

  • Question

  • As Im getting my feet wet with my first exposure to building and populating a data warehouse, I am seeing a lot of references to the use of a "staging" database.  Where would this be beneficial? 
    Friday, August 23, 2013 1:59 AM

Answers

  • Hi,

    A staging database is used as a "working area" for your ETL.  Olaf has a good definition: A staging database or area is used to load data from the sources, modify & cleansing them before you final load them into the DWH; mostly this is easier then to do this within one complex ETL process.

    One of the objectives of the staging area is to facilitate restartability and minimise the impact the extraction has on your source system.  You extract data from your source system only once and store a copy of it in your staging database.  Should your ETL fail further down the line, you do not need to impact your source system by extracting the data for a second time.  If you store the results of each logical step of your transformation in staging tables, you can restart you ETL from the last successful staging step.

    Hope this was useful :)

    Regards,


    Marius  (Microsoft BI Solutions Architect)
    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Twitter Icon
    • Marked as answer by shiftbit Monday, August 26, 2013 2:07 PM
    Monday, August 26, 2013 6:49 AM
  • A staging database or area is used to load data from the sources, modify & cleansing them before you final load them into the DWH; mostly this is easier then to do this within one complex ETL process.

    Olaf Helper

    [ Blog] [ Xing] [ MVP]

    • Marked as answer by shiftbit Friday, August 23, 2013 10:46 AM
    Friday, August 23, 2013 2:07 AM

All replies

  • A staging database or area is used to load data from the sources, modify & cleansing them before you final load them into the DWH; mostly this is easier then to do this within one complex ETL process.

    Olaf Helper

    [ Blog] [ Xing] [ MVP]

    • Marked as answer by shiftbit Friday, August 23, 2013 10:46 AM
    Friday, August 23, 2013 2:07 AM
  • Olaf is completely correct.

    In most Data Warehouse situations data can come from all manner of sources.  In one instance I work with we receive numerous csv files which have excess columns, records we're not interested in etc.  Therefore you would load these files in full into a staging database.

    Once in the staging database we then run data cleansing, remove the records and fields that we are not interested in, correct some formatting, and add IDs relevant to the company...  then the resulting dataset is loaded into the data warehouse.

    Hopefully that makes sense.


    Friday, August 23, 2013 5:55 AM
  • Okay, kind of what I had suspected.  However, its probably not a real show stopper if you don't have one, at least until the data flow and processing times are getting large.  In the case of a smaller company, using an SSIS package scheduled as a job to do the work of pulling the data in directly from the sources, doing a few transformations, then loading the DW should be fine.

    Friday, August 23, 2013 10:51 AM
  • Hi,

    A staging database is used as a "working area" for your ETL.  Olaf has a good definition: A staging database or area is used to load data from the sources, modify & cleansing them before you final load them into the DWH; mostly this is easier then to do this within one complex ETL process.

    One of the objectives of the staging area is to facilitate restartability and minimise the impact the extraction has on your source system.  You extract data from your source system only once and store a copy of it in your staging database.  Should your ETL fail further down the line, you do not need to impact your source system by extracting the data for a second time.  If you store the results of each logical step of your transformation in staging tables, you can restart you ETL from the last successful staging step.

    Hope this was useful :)

    Regards,


    Marius  (Microsoft BI Solutions Architect)
    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Twitter Icon
    • Marked as answer by shiftbit Monday, August 26, 2013 2:07 PM
    Monday, August 26, 2013 6:49 AM