How to manage a SCD populated from more than one data source?


  • I have a dimension type 2 where a row of data is populated from 2 different data sources. E.g. DIMCUSTOMER has a Customer_Number stored on both data sources.

    However the DOB is held on data source 1 and the Address is held on data source 2.

    Do I run the update on the DIMCUSTOMER for the first data source and then the second data source:


    CUST1234    250285    ADDR123        Version 1

    CUST1234    250285    NEWADDR123        Version 2    

    CUST1234    250284    NEWADDR123        Version 3    

    ..or do I need to make the updates all as one?

    mercredi 4 juillet 2018 17:34

Toutes les réponses

  • You need to make updates all at once

    Use a merge join or lookup task in SSIS to merge the data based Customer_Number at first and bring as a single resultset containing Customer_number, DOB and Address. Then use this to lookup against the target table (DImCustomer) and then do the Type 2 processing based on whether there was a match (modified) or not (new)

    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    mercredi 4 juillet 2018 18:40
  • Thanks

    I would like to embed some aspect of data lineage so that I can trace back to source. In the example above Version ID would be a FK to a version log reference table which would contain the name of the source, when it was executed etc.

    This is because there can be many future source systems that would also provide data for the DIMCUSTOMER.

    Hence it would have been good to see Version 2 linked to source 1 and Version 3 linked to source 2 say.

    mercredi 4 juillet 2018 21:17