none
Incremental load

    Question

  • Hi,

    We need to implement incremental load in database. A sample scenario is, there is a view (INCOMEVW) which is build on top of a query like

    CREATE VIEW INCOMEVW

    AS

    SELECT CLIENTID,COUNTRYNAME,SUM(OUTPUT.INCOME) AS INCOME

    (SELECT EOCLIENT_ID AS CLIENTID,EOCOUNTRYNAME AS COUNTRYNAME,EOINCOME  AS INCOME FROM EOCLIENT C INNER JOIN EOCOUNTRY CT ON C.COUNTRYCODE=CT.COUNTRYCODE

    UNION ALL

    SELECT ENCLIENT_ID AS CLIENTID,ENCOUNTRYNAME AS COUNTRYNAME,ENINCOMEAS as INCOME  FROM ENCLIENT EC INNER JOIN ENCOUNTRY ECT ON

    EC.COUNTRYCODE=ECT.COUNTRYCODE) OUTPUT

    GROUP BY  CLIENTID,COUNTRYNAME

    This is a sample view. As of now there is a full load happening from the source(select * from INCOMEVW) and loads to target table tbl_Income.

    We need to pick only the delta and load to the target table using a staging. The challenge is,

    1) If we get the delta(Insert,update or deleted rows in the source tables EOCLIENT,EOCOUNTRY,ENCLIENT,ENCOUNTRY, how to load the incremental to single target table tbl_Income.

    2) How to do the Sum operation with group by in incremental load?

    3) We are planning to have a daily incremental load and thinking to create the same table structure as source with Date and Flag column to identify the date and whether that source row is an Insert or Update or Delete with the flag. But not sure how to frame something like this view and load to single target with Sum operations.

    Any suggestion??

    Regards

    Andrew

    • Moved by Kalman Toth Sunday, December 22, 2013 5:22 PM Better fit
    Sunday, December 22, 2013 3:10 PM

Answers

  • Hello Andrew,

    In your case, you could make use of the Change data capture functionality (about the SSIS task you can read here). This would allow you to capture all the changes from one or more specific tables (insert, delete, updates) and perform incremental load on a hourly/daily/monthly basis. 

    By using CDC combined with a SSIS package, you can get all your changed data from the production, do one or more operations on the respective data (sums, for example), and finally put the modified information in the destination.

    One small warning, CDC does not work on views, so you actually have to implement it on tables. 

    Warm regards,

    Razvan


    Per aspera ad astra!
    journeyintobi.com

    Monday, December 23, 2013 6:37 AM
  • Hi Andrew,

    As per question:

    1 and 2.) 

    Since you are using view , which have aggregate function in them, there is no easy to pass the diff records and then consolidate them into aggregate function. 

    In case my suggest would just load your base table used in the view to the stage environment and create view in the stage enviroment to populate in the Destination database or DW.

    3.) You can use both option either a date column or flag to track the record got change(Insert/update or delete). 

    Each has own pro and cons, In date you need to track the last modification date, where as in flag you need to always update the flag in source after successful load.

    You can always goes with CDC(Change data capture as suggested by Razvan. ) only thing to take care here is that CDC is only available in enterprise edition only. 


    Regards Harsh

    Monday, December 23, 2013 7:08 AM

All replies

  • I am moving it to data warehouse.


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012


    Sunday, December 22, 2013 5:21 PM
  • Hello Andrew,

    In your case, you could make use of the Change data capture functionality (about the SSIS task you can read here). This would allow you to capture all the changes from one or more specific tables (insert, delete, updates) and perform incremental load on a hourly/daily/monthly basis. 

    By using CDC combined with a SSIS package, you can get all your changed data from the production, do one or more operations on the respective data (sums, for example), and finally put the modified information in the destination.

    One small warning, CDC does not work on views, so you actually have to implement it on tables. 

    Warm regards,

    Razvan


    Per aspera ad astra!
    journeyintobi.com

    Monday, December 23, 2013 6:37 AM
  • Hi Andrew,

    As per question:

    1 and 2.) 

    Since you are using view , which have aggregate function in them, there is no easy to pass the diff records and then consolidate them into aggregate function. 

    In case my suggest would just load your base table used in the view to the stage environment and create view in the stage enviroment to populate in the Destination database or DW.

    3.) You can use both option either a date column or flag to track the record got change(Insert/update or delete). 

    Each has own pro and cons, In date you need to track the last modification date, where as in flag you need to always update the flag in source after successful load.

    You can always goes with CDC(Change data capture as suggested by Razvan. ) only thing to take care here is that CDC is only available in enterprise edition only. 


    Regards Harsh

    Monday, December 23, 2013 7:08 AM