none
DML operations on more than 500 million rows table RRS feed

  • Question

  • HI, I am working on improving performance of Insert, Delete, Update operations on a huge table that has over 500 million records, these operations are done most of the time on Clustered Index field along with other fields. Please let me know the steps and things to look for to improve performance.

    Thanks....

    Friday, March 2, 2012 12:03 AM

Answers

  • varchar column for a clustered index may be not the best column. The best column for the clustered index is an identity column. Also, changing 1mln records at once will not perform good. For changes of such scale it's better to do in batches, say, 10K rows at once.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Friday, March 2, 2012 4:08 AM
    Moderator
  • Consider partitioning as well. Check for index fragmentation, too and rebuild or reorganize the indexes if necessary - especially if the index key colums are affected by the update statements.

    Regards,

    Zoli

    Friday, March 2, 2012 6:26 AM
  • Hi Naomi, Thanks very much for your response. I do understand that varchar datatype would not help but that's the way it has been setup. Is there anything else I could do to keep the Clustered Index upto date?

    Regards.....

    It's only a rough figure, but to comparing a simple 5 Characters can be up to 5 times as slow as comparing a single Integer. And one thing the DBMS does with Primary keys, is comparing them.

    Since you have 500 Million row, the key has to be up to 9 characters long, so it's up to 9 times of what one Int would take*. Int only takes 4 Byte and could still take 4 times the current number of rows (around 2147 Million).

    If my guess/calculation isn't totally wrong, the automatically created index on the Primary Key alone has to be around 4 Gigabyte in size* (int would be half that) - so storing (especially in the RAM) and processing him or any other Index containing the PK could be an issue.

    I don't say it's the primary culptript, but with Foreign Key Constrainsts and similar things it propably has a relevant part on the cost of any insert, update or delete.

    And if this table has a Varchar key, what about the tables referenced by this table? Do they also have char based keys?

    So, are you 100% certain that you couldn't change the way the key is stored/processed inside the DBMS, without changing how it is shown to/accepted from the clients?

    *this asumes it started at '0'. Otherwise, things might be worse. Much worse.

    Friday, March 2, 2012 9:51 AM

All replies

  • How many rows the DML operations affect at once and what kind of clustered index do you have?

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Friday, March 2, 2012 3:16 AM
    Moderator
  • Hi Naomi, each time it varies from hundred thousand to a million records. This particular table has business key which is a Varchar as the Clustered Index and this column keeps changing, how do I maintain this Clustered Index column once I am done improving performance of Insert, Delete, Update operation on this table for faster retrieval of data?


    Thanks....
    Friday, March 2, 2012 4:03 AM
  • varchar column for a clustered index may be not the best column. The best column for the clustered index is an identity column. Also, changing 1mln records at once will not perform good. For changes of such scale it's better to do in batches, say, 10K rows at once.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Friday, March 2, 2012 4:08 AM
    Moderator
  • Hi Naomi, Thanks very much for your response. I do understand that varchar datatype would not help but that's the way it has been setup. Is there anything else I could do to keep the Clustered Index upto date?

    Regards.....

    Friday, March 2, 2012 4:44 AM
  • Consider partitioning as well. Check for index fragmentation, too and rebuild or reorganize the indexes if necessary - especially if the index key colums are affected by the update statements.

    Regards,

    Zoli

    Friday, March 2, 2012 6:26 AM
  • Hi Naomi, Thanks very much for your response. I do understand that varchar datatype would not help but that's the way it has been setup. Is there anything else I could do to keep the Clustered Index upto date?

    Regards.....

    It's only a rough figure, but to comparing a simple 5 Characters can be up to 5 times as slow as comparing a single Integer. And one thing the DBMS does with Primary keys, is comparing them.

    Since you have 500 Million row, the key has to be up to 9 characters long, so it's up to 9 times of what one Int would take*. Int only takes 4 Byte and could still take 4 times the current number of rows (around 2147 Million).

    If my guess/calculation isn't totally wrong, the automatically created index on the Primary Key alone has to be around 4 Gigabyte in size* (int would be half that) - so storing (especially in the RAM) and processing him or any other Index containing the PK could be an issue.

    I don't say it's the primary culptript, but with Foreign Key Constrainsts and similar things it propably has a relevant part on the cost of any insert, update or delete.

    And if this table has a Varchar key, what about the tables referenced by this table? Do they also have char based keys?

    So, are you 100% certain that you couldn't change the way the key is stored/processed inside the DBMS, without changing how it is shown to/accepted from the clients?

    *this asumes it started at '0'. Otherwise, things might be worse. Much worse.

    Friday, March 2, 2012 9:51 AM
  • Hi Naomi, each time it varies from hundred thousand to a million records. This particular table has business key which is a Varchar as the Clustered Index and this column keeps changing, how do I maintain this Clustered Index column once I am done improving performance of Insert, Delete, Update operation on this table for faster retrieval of data?

    You should use INT (or BIGINT) IDENTITY SURROGATE KEY for PRIMARY KEY.

    Make the current (varchar) key  a UNIQUE index.

    Put the clustered index on the column where most needed: range searches if any.

    Optimization article: http://www.sqlusa.com/articles/query-optimization/


    Kalman Toth, SQL Server & Business Intelligence Training; SQL SERVER BEST PRACTICES

    Friday, March 9, 2012 12:39 PM
    Moderator