none
Indexing on large tables with heavy inserts and updates and no unique columns

    Question

  • Hi Team,

    we have some large tables (200-500 million rows) which are

    1. filled from an external source (heavy inserts),

    2. and then continuously updated.

    These tables do not have a really good column for a clustered index: no unique columns, no ever-increasing columns, no Identity.

    So a clustered index is not really an option, also because of the heavy inserts: as the records are coming all across, not in an increasing order, the table would probably suffer from page splits.

    But for the updates, we definitely need indexing on these tables.

    => Is a non-clustered index a good solution for such a case?

    Like a compromise: to avoid page splits during the heavy inserts and allow fast inserting, but also to support the updates.

    Friday, July 13, 2018 8:00 PM

Answers

  • You can, of course, add a new identity column, and this might be a good idea.  In any relational database every table must have a unique key, or else you are not in third normal form - actually not even in first normal form.  Adding an identity is a crude way to get there, but so it goes.

    Of course the best indexing scheme depends on use.  Are just the newest rows the ones that are updated the most?  If so, you might look at partitioned tables.

    Regarding the page splits, hey if they split they split!  Now if you add the identity you may have much less problem with that, but in the worst case you can *lower* the fill factor to get fewer splits, or else just let them happen.  In any case you will want to rebuild the tables now and then as a maintenance job.

    Finally, if you have application logic that inserts a row now and updates it later, you may also not have a fully normalized data model.  If this is something like opening and closing price levels for a stock, instead you might have an opening row and a closing row, no updates.

    Josh

    Saturday, July 14, 2018 1:43 AM

All replies

  • Not even composite key possible? 

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, July 13, 2018 8:35 PM
  • Not even composite key possible?

    Not really. Even taking 3-4 columns, and it's still not unique.

    By taking more columns, the index would be too wide.

    Friday, July 13, 2018 8:43 PM
  • If you need indexing and a clustered is not possible, then non-clustered is your only choice for this...

    What columns will your queries use to identify which rows to Select, Update and Delete?  There's your index starting point.

    Also, this Stack thread is a good read:

    https://stackoverflow.com/questions/4332982/do-clustered-indexes-have-to-be-unique


    22 years of database experience, most with SQL Server. Please 'Mark as answered' those posts that helped you.


    • Edited by KevinBHill Friday, July 13, 2018 8:55 PM
    Friday, July 13, 2018 8:52 PM
  • Not even composite key possible?

    Not really. Even taking 3-4 columns, and it's still not unique.

    By taking more columns, the index would be too wide.

    Just want to point out that CLUSTERED INDEX need not be unique, so if you could identify right columns and create one Clustered Index , it would be better than not having it.

    create table test(c1 int)
    go
    
    insert into test values (1)
    insert into test values (1)
    insert into test values (1)
    insert into test values (1)
    insert into test values (1)
    go
    create clustered index cx_test on test(c1)


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, July 13, 2018 8:57 PM
  • You can, of course, add a new identity column, and this might be a good idea.  In any relational database every table must have a unique key, or else you are not in third normal form - actually not even in first normal form.  Adding an identity is a crude way to get there, but so it goes.

    Of course the best indexing scheme depends on use.  Are just the newest rows the ones that are updated the most?  If so, you might look at partitioned tables.

    Regarding the page splits, hey if they split they split!  Now if you add the identity you may have much less problem with that, but in the worst case you can *lower* the fill factor to get fewer splits, or else just let them happen.  In any case you will want to rebuild the tables now and then as a maintenance job.

    Finally, if you have application logic that inserts a row now and updates it later, you may also not have a fully normalized data model.  If this is something like opening and closing price levels for a stock, instead you might have an opening row and a closing row, no updates.

    Josh

    Saturday, July 14, 2018 1:43 AM
  • Hi,

    You should have clustered index on the columns used as conditions on the update statements. As for heavy inserts, try to reduce it if not possible to stop it altogether. Come up with a plan for example do not insert any rows instead store them in a temp area for hours to several days and do a bulk insert after dropping and recreating the  index. Good luck.

    Saturday, July 14, 2018 3:37 AM
  • Hi

    What is the goal? To improve INSERT/UPDATE operations? What is about SELECT queries?


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Sunday, July 15, 2018 6:36 AM
    Answerer
  • Hi

    What is the goal? To improve INSERT/UPDATE operations? What is about SELECT queries?

    Yes, to improve INSERT and UPDATE.

    SELECTs are mostly filtering on the same columns which are being updated. So the condition is the same.

    Sunday, July 15, 2018 10:17 AM
  • Ok, first step would be removing unused indexes.  Looking on the UPDATE query  where condition, create an index on that column (somecol) include all columns  you are going to update 

    UPDATE tbl
     SET    col1 =  0.0285714286,
            col2 =  0.0285714286,
       
     WHERE  somecol = '20019378'

    For Inserts I would suggest having an ID column (probably IDENTITY property) and have an index  on that column


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Sunday, July 15, 2018 10:34 AM
    Answerer
  • it is not necessary to have a clustered index, or a unique clustered index. 

    you could create a non-unique clustered index with one or more columns. bear in mind that SQL Server will have a hidden 4-byte value to make it unique, so if you do cluster on a non-unique key, do not have more than 4B duplicates on a given key. This won't happen on the actual table, but it could happen in an unusual intermediate situation.

    I would definitely consider partitioning if you could find some suitable key that would compartmentalize the inserts and updates. you can even drop the clustered index after creating the partitioned clustered index.

    Definitely have an index, either clustered or nonclustered on the Update SARG, per Uri's example.


    jchang

    Sunday, July 15, 2018 7:40 PM