none
Copy from staging table to multiple tables RRS feed

  • Question

  • We are using an SSIS package to fast load our data source into a staging table for processing.

    The reason we are using a staging table is that we need to copy the data from staging to our actual DB tables (4 in total), and the insertion order matters as we have foreign key relationships.

    When copying the data from our staging table, should we enumerate through all the records and use an insert-select method for each row or is there a more efficient way to do this?

    Friday, October 10, 2014 12:38 PM

Answers

  • You can insert the new site data with a MERGE statement (or insert when not exists) in a "EXECUTE SQL STATEMENT".

    Then you can update the stage tables with a "EXECUTE SQL STATEMENT" with the ID of the SiteID and then in a "DATA FLOW TASK" just bulk insert without any transformation (SOURCE->DESTINATION).

    Another thing you can do is after merging to use a cache transform and then inside the "DATA FLOW TASK" perform a LOOKUP for the SiteID and send the output match to the DESTINATION.

    Or you can just join on Site Name to get SiteID and be done with it :)


    "If there's nothing wrong with me, maybe there's something wrong with the universe!"



    Friday, October 10, 2014 1:19 PM
  • Since your only dependency is site ID a join on source and bulk insert (fast load) on destination would be best.

    "If there's nothing wrong with me, maybe there's something wrong with the universe!"

    Friday, October 10, 2014 2:13 PM

All replies

  • You can use mutlicast to replicate the rows for multiple outputs.

    If you have foreign key dependencies that prevent you from inserting the data at all tables simultaneously you can insert all rows with bulk insert in sequence... based on the dependencies.

    Without the schema of your tables it's difficult to be more precise, but I assume you have some IDENTITY or SEQUENCE fields on some tables that need to be also inserted on some other table.

    In that case you can split the job to e.g. to 10 workers (using a conditional split) and each one will get a tenth of your data to process.

    Send some dumbed-down example showing of what you actually want to accomplish so you can get a better approach.


    "If there's nothing wrong with me, maybe there's something wrong with the universe!"

    Friday, October 10, 2014 12:52 PM
  • Because you're already staging the data the best option would be to call a stored procedure to do the insertion in sequence to all the tables honouring the FK relationships.

    You can use OUTPUT clause to achieve this requirement which will capture the generated id values and use it to insert to child tables to satisfy the FK reference

    See logics used below

    http://visakhm.blogspot.in/2010/04/using-xml-to-batch-load-master-child.html

    http://visakhm.blogspot.in/2014/09/t-sql-tips-multifaceted-merge-statement.html


    Please Mark This As Answer if it helps to solve the issue Visakh ---------------------------- http://visakhm.blogspot.com/ https://www.facebook.com/VmBlogs

    Friday, October 10, 2014 12:58 PM
  • Our raw data source is a mdb file and we are using SSIS to fast load into SQL Server, and we are looking to transform the data set into 3 tables (using a stored proc):

    Site (SiteID, Name)

    Gas (ID, Date, Time, GasFIeld1, GasField2....., SiteID)

    GenSet (ID, Date, Time, GenSetField1, GenSetField2....., SiteID)

    Each record in our raw data source contains a Name field which identifies the Site. We only need to add a new site to the Site table if it does not already exist. This is already coded and working using insert-select and NOT EXISTS.

    We now need to iterate over all records and extract a subset of data for the Gas table and extract a subset of data for the GenSet table and link each row with the associated SiteID, using Name field.

    The insertion order should be Site table first then remaining tables.

    Are you saying it would be better to transform this data using SSIS and not to use a staging table and stored procedure?

    Friday, October 10, 2014 1:14 PM
  • You can insert the new site data with a MERGE statement (or insert when not exists) in a "EXECUTE SQL STATEMENT".

    Then you can update the stage tables with a "EXECUTE SQL STATEMENT" with the ID of the SiteID and then in a "DATA FLOW TASK" just bulk insert without any transformation (SOURCE->DESTINATION).

    Another thing you can do is after merging to use a cache transform and then inside the "DATA FLOW TASK" perform a LOOKUP for the SiteID and send the output match to the DESTINATION.

    Or you can just join on Site Name to get SiteID and be done with it :)


    "If there's nothing wrong with me, maybe there's something wrong with the universe!"



    Friday, October 10, 2014 1:19 PM
  • Thanks for your help, I have considered a Join. Would this be more efficient than creating a function to get site id for site name?
    Friday, October 10, 2014 1:27 PM
  • Our raw data source is a mdb file and we are using SSIS to fast load into SQL Server, and we are looking to transform the data set into 3 tables (using a stored proc):

    Site (SiteID, Name)

    Gas (ID, Date, Time, GasFIeld1, GasField2....., SiteID)

    GenSet (ID, Date, Time, GenSetField1, GenSetField2....., SiteID)

    Each record in our raw data source contains a Name field which identifies the Site. We only need to add a new site to the Site table if it does not already exist. This is already coded and working using insert-select and NOT EXISTS.

    We now need to iterate over all records and extract a subset of data for the Gas table and extract a subset of data for the GenSet table and link each row with the associated SiteID, using Name field.

    The insertion order should be Site table first then remaining tables.

    Are you saying it would be better to transform this data using SSIS and not to use a staging table and stored procedure?

    I would prefer using staging + sp appproach here as that would involve set based logic and would be faster performance wise

    Please Mark This As Answer if it helps to solve the issue Visakh ---------------------------- http://visakhm.blogspot.com/ https://www.facebook.com/VmBlogs

    Friday, October 10, 2014 1:30 PM
  • Since your only dependency is site ID a join on source and bulk insert (fast load) on destination would be best.

    "If there's nothing wrong with me, maybe there's something wrong with the universe!"

    Friday, October 10, 2014 2:13 PM