none
What technique to use to update production table from staging table RRS feed

  • Question

  • Hi I have a production table in SQL Server 2005 that has approx 500,000 records---every 6 hours this table needs to be truncated and filled

    The basic SSIS package uses a script compentant and imports the data into a staging table which has the same structure as the production table. I have a final Execute SQL Task that Truncates the production table and does a Insert into production-select * from stage table.

    Takes around 30 seconds to run this last Execute SQL Task--problem is there is a risk that our webservice will query this table during the Execute SQL Task and return incorrect results.

    Q1: In this last Execute SQL Task  if I used a BEGIN TRANSACTION and COMMIT TRANSACTION; will this be any quicker ?

    Q2: In this last Execute SQL Task- would it be better to use a RENAME TABLE technique in TSQL--any code examples ??

    Q3:Is there any way in TSQL I can compare EVERY FIELD in two tables ie Stage and Production which have identical data structures and figure out a way to update only the records that changed? Is SQL Server Replication the best way to do this

     

    thanks kindly

    Dave

    Friday, March 9, 2007 4:01 AM

All replies

  • 1: Probably not.

    2: You could use sp_rename. You'd have to drop the other table first.

    3: The checksum function might help with this, but the built-in SQL Server function tends to have a high collision rate (same checksum value for different input values). You also need a primary key to compare on. So you'd do a join between the tables on the primary key, run all the columns from each table into its own checksum function, and include a WHERE clause looking for checksum values that don't match. Not sure how good performance will be. It helps if you can modify the table definition to include the checksum value as a column, so you don't have to do the calc each time.

    Hope this helps.

    Friday, March 9, 2007 4:57 AM
    Moderator
  • Would replication help here?  Just curious, as I don't know the answer to that....
    Friday, March 9, 2007 5:56 AM
    Moderator
  • Thanks Im really interested in the checksum technique but have no idea of how to implement this; I have complete control of the table design If I make up a simple example like this can you please show me the TSQL to achieve (do a join between the tables on the primary key, run all the columns from each table into its own checksum function, and include a WHERE clause looking for checksum values that don't match.)

    Stage table
    --------------------
    S_Foxprofile1
    S_ID (AutoInt) PK
    PhoneNum (BigInt)
    Account (Varchar 50)

    Prod Table
    --------------------
    P_Foxprofile1
    P_ID (AutoInt) PK
    PhoneNum (BigInt)
    Account (Varchar 50)

    Finally this must be a very common ETL problem to update a production table with only the data that has changed from its mirror stage table with all fields in scope.
    I was hoping for a common design pattern in the Microsoft world for this.

    thanks kindly Dave

    Friday, March 9, 2007 9:15 PM
  • Oh the Replication idea was probaly not correct --- the Checksum idea is a really good lead
    Thanks Im really interested in the checksum technique but have no idea of how to implement this; I have complete control of the table design If I make up a simple example like this can you please show me the TSQL to achieve (do a join between the tables on the primary key, run all the columns from each table into its own checksum function, and include a WHERE clause looking for checksum values that don't match.)

    Stage table
    --------------------
    S_Foxprofile1
    S_ID (AutoInt) PK
    PhoneNum (BigInt)
    Account (Varchar 50)

    Prod Table
    --------------------
    P_Foxprofile1
    P_ID (AutoInt) PK
    PhoneNum (BigInt)
    Account (Varchar 50)

    Finally this must be a very common ETL problem to update a production table with only the data that has changed from its mirror stage table with all fields in scope.
    I was hoping for a common design pattern in the Microsoft world for this.

    thanks kindly Dave

    Friday, March 9, 2007 9:21 PM
  • Have you read the sticky post on the first page of this forum???  Checking to see if a record exists, if so do XXX if not do YYY?


    Friday, March 9, 2007 9:26 PM
    Moderator
  • Also, I just created a blog entry in regards to using the Konesan's Checksum Transformation.

    http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
    Friday, March 9, 2007 10:41 PM
    Moderator
  • Thanks Phil,

    Would you mind ziping up that SSIS project in your blog entry and emailing to me [removed e-mail address to keep spam bots away - Phil Brammer]

    I have also thought of a much simpler idea --stick to idea of loading data into a stage table that mirrors production and then have a final execute SQL task that does as outlined here http://www.mssqltips.com/tip.asp?tip=1023

    example of using a CHECKSUM is to store the unique value for the entire row in a column for later comparison. This would be helpful in a situation where all of the rows in a table need to be compared in order to perform an UPDATE.

    SET NOCOUNT ON
    -- SQL Server 2005
    USE ProfilerTest;
    GO
    UPDATE a
    SET TextData = 'Not Applicable'
    FROM dbo.ProfilerResults3 a
    WHERE a.RowCheckSum = CHECKSUM(RowNumber, EventClass, ApplicationName);
    GO
    
    SELECT * 
    FROM dbo.ProfilerResults3;
    GO
    
    Sunday, March 11, 2007 8:37 PM
  • I would be careful when using the built-in SQL Checksum function. HashBytes might be a better choice. From Books Online:

    "CHECKSUM satisfies the properties of a hash function: CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. For this definition, null values of a specified type are considered to compare as equal. If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM."

    http://msdn2.microsoft.com/en-us/library/ms189788.aspx

    I've also had good experiences using the Checksum transform from SQLIS.com (http://www.sqlis.com/21.aspx).

    Monday, March 12, 2007 1:19 PM
    Moderator
  •  Mr Pro Tools wrote:

    Thanks Phil,

    Would you mind ziping up that SSIS project in your blog entry and emailing to me 



    Sent.

    Phil
    Monday, March 12, 2007 7:27 PM
    Moderator