none
Updating a column in a table that contains 50 million rows RRS feed

  • Question

  • I'm looking for some performance assistance on updating a column value in a table that contains approximately 50 million rows.  I have a permanent table in another database that has the key column and value to be set.  My query is listed below, but I'm afraid it will run quite awhile.  Any suggestions would be appreciated.

     

    update mytable

    set column2 = b.column2

    from mytable as a

    join mytable1 as b

    on a.column1 = b.column1

     

     

    There is a one to one relationship between the two tables.

    Wednesday, February 27, 2008 1:16 AM

Answers

  • I suggest that you craft a criteria that would allow executing the query in batches of about 1M rows per batch.

    (Even a TOP 1M WHERE not changed.)

     

    The pressure on the Transaction log is going to be very large. If possible, change the database to Simple Logged Model and then run a script that executes 50 (1M row) batches.

     

    One 50M row operation is likely to be much slower, and require far more transaction log resources than doing this in batches.

     

     

     

    Wednesday, February 27, 2008 8:26 AM
    Moderator

All replies

  • For one thing, set a WHERE condition so that you update only when the value changes.  Can you select a count(*) on this join where the column is changing so that you can see how many actual records will be changed?

    Wednesday, February 27, 2008 1:29 AM
    Moderator
  • 99% of the rows would be updated and a where condition isn't much use in this case.

    Wednesday, February 27, 2008 1:31 AM
  • 1% of 50 Million rows is still 500000 unnecessary rows.  There's not much else you can do to optimize - assuming that you have indexes on the key fields and on your WHERE clause's fields, it's about as lean as it's going to get.

     

    Wednesday, February 27, 2008 3:51 AM
    Moderator
  • I suggest that you craft a criteria that would allow executing the query in batches of about 1M rows per batch.

    (Even a TOP 1M WHERE not changed.)

     

    The pressure on the Transaction log is going to be very large. If possible, change the database to Simple Logged Model and then run a script that executes 50 (1M row) batches.

     

    One 50M row operation is likely to be much slower, and require far more transaction log resources than doing this in batches.

     

     

     

    Wednesday, February 27, 2008 8:26 AM
    Moderator
  • This type of updates tend to be only as fast as your drives are capable.

     

    As with any server, there are limits and bottlenecks at different places (memory, cpu, i/o bandwidth etc).

    The main goal is to never exceed any of those 'resource-limits', because if you do, then things start to pile up and queue, and performance drops to the floor.

    So, as Arnie suggests, you need to find the 'best chunk-size' that will fit within your given resource limits, and sort of loop through the entire working set without hitting any hard bottlenecks.

    That's about how fast it would be possible to do it.

     

    /Kenneth

     

    Wednesday, February 27, 2008 9:24 AM
  • I agree with both Arnie and Ken, but another line of thought might be to drop the column and always read the data from the other source.  I suppose you are probably cornered and this is not an option.  Oh well.

     

    Wednesday, February 27, 2008 12:54 PM
    Moderator
  • I would suggest also the WITH (UPDLOCK) option

    update mytable WITH (UPDLOCK)
    ...
    Wednesday, February 27, 2008 3:24 PM