locked
About Delta Load and ETL Process RRS feed

  • Question

  • Hello!

    I got some questions:

    When you do the loading process from stage to the fact and dimension table and I have questions about the surrogate key.

    ----------------------

    1.
    A question about the surrogate key.

    When you do the loading process from stage table to the fact and dimension table and does it mean that you also load the surrogate key from stage to the dimension table in relation to new rows?

    Or do you create new surrogate key in dimension table by using the sql code Identity for the table? (https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-2017)?

    Which approach is correct?

    2.
    When you do the delta load, does it mean insert and update rows in the fact and dimension table?


    3.
    Is the ETL process for the delta load and full load exactly the same that you need to retrieve all the data, all the data!, from the source system?

    ----------------------

    Other information:
    *I'm newbie in ETL and Business Intelligence
    *I'm using only T-SQL, no SSIS.

    Thank you!
    Friday, November 16, 2018 2:05 PM

Answers

  • 1) Your mixing models. Depending on the audit philosophy of your dimensional model, you use the keys from your source data. Surrogate keys created in your staging area do normally not matter, cause they do not help in an audit case, cause an dimensional model user has no access to the staging area and maybe only to the real source data.

    2) "Delta load" means that you only look at modified data in your source in comparison to your last load. Thus you only insert, update or delete data in your dimensional model, which is affected. In opposite to push all data from the source to your dimensional model.

    3) It depends on the source data and how it is provided. For some data provider you need to a full extract always. sometimes you can ask the source provider for modified data only.

    • Marked as answer by Sakura Data Sunday, November 18, 2018 4:11 PM
    Friday, November 16, 2018 3:46 PM
  • No. Normally the dimensional surrogate keys are created during the load in the dimensional model. Cause the staging area is not a dimensional model. Thus keys from here are normally useless.
    • Marked as answer by Sakura Data Sunday, November 18, 2018 4:11 PM
    Sunday, November 18, 2018 3:04 PM

All replies

  • 1) Your mixing models. Depending on the audit philosophy of your dimensional model, you use the keys from your source data. Surrogate keys created in your staging area do normally not matter, cause they do not help in an audit case, cause an dimensional model user has no access to the staging area and maybe only to the real source data.

    2) "Delta load" means that you only look at modified data in your source in comparison to your last load. Thus you only insert, update or delete data in your dimensional model, which is affected. In opposite to push all data from the source to your dimensional model.

    3) It depends on the source data and how it is provided. For some data provider you need to a full extract always. sometimes you can ask the source provider for modified data only.

    • Marked as answer by Sakura Data Sunday, November 18, 2018 4:11 PM
    Friday, November 16, 2018 3:46 PM
  • 1) Your mixing models. Depending on the audit philosophy of your dimensional model, you use the keys from your source data. Surrogate keys created in your staging area do normally not matter, cause they do not help in an audit case, cause an dimensional model user has no access to the staging area and maybe only to the real source data.

    Thank you for your help!

    Maybe I need to provide more information.

    Do you create the surrogate key, note natural key, in the stage table and transfer the same surrogate key in the dimension table?

    Thank!

    Saturday, November 17, 2018 9:04 PM
  • No. Normally the dimensional surrogate keys are created during the load in the dimensional model. Cause the staging area is not a dimensional model. Thus keys from here are normally useless.
    • Marked as answer by Sakura Data Sunday, November 18, 2018 4:11 PM
    Sunday, November 18, 2018 3:04 PM