locked
Poor optimization with partitioned tables RRS feed

  • Question

  • I have a simple issue regarding a straight forward optimization.
    The example below uses PRIMARY twice, but using other FGs does not alter things

    CREATE PARTITION FUNCTION [T1_PFN](smallint) AS RANGE LEFT FOR VALUES (1)
    CREATE PARTITION SCHEME [T1_PTN] AS PARTITION [T1_PFN] TO ([PRIMARY],[PRIMARY])
    create table T1 ([rowid] bigint identity (1,1) not null,
                     [ptnid] smallint not null,
                     [data] varchar (100)
     CONSTRAINT [PKeyT1] PRIMARY KEY CLUSTERED
    (
        [rowid] ASC,
        [ptnid] ASC
    ) ON [T1_PTN] ([ptnid]))
    ON [T1_PTN] ([ptnid])
    GO
    insert into T1 ([ptnid], [data]) values (1,'Some data')
    insert into T1 ([ptnid], [data]) values (2,'More data')

    select max(rowid) from T1


    If you "Display Estimated Execution Plan" for the select statement you will see a clustered index scan on PKeyT1 with an "Estimated number of rows" = 2 and "Ordered" = False.

    The same table without the partitioning will show for the same query "Estimated number of rows" = 1 and "Ordered" = True.

    Pretty obviously, if I insert 6.7 billion rows into this table, it may take some time to return a value :(

    I would think that this could be optimized in a far better manner. Somewhat like the plan for: select * from T1 where rowid in (1,2)

    I really do have a table with 6.7 billion rows and I really do need a timely answer to this style of query.

     

    Philip

    • Moved by Tom Phillips Tuesday, August 16, 2011 3:05 PM possibly better answer from TSQL forum (From:SQL Server Database Engine)
    Tuesday, August 16, 2011 2:03 PM

Answers

All replies

  • You need to consider the execution plan on the populated table.

    It is a stretch to make projection from 2 rows to 6 billions.

    Can you post the plan after table population? Thanks.

     


    Kalman Toth, SQL Server & Business Intelligence Training; sqlusa.com
    Tuesday, August 16, 2011 6:51 PM
    • Proposed as answer by Naomi N Tuesday, August 16, 2011 7:47 PM
    • Marked as answer by Kalman Toth Tuesday, August 23, 2011 4:30 AM
    • Unmarked as answer by Philip Lewis Tuesday, August 23, 2011 8:44 AM
    • Marked as answer by Philip Lewis Thursday, August 25, 2011 1:58 PM
    Tuesday, August 16, 2011 7:25 PM
  • No, it's not a stretch. It will do exactly the same thing because the optimizer cannot do it properly
    Wednesday, August 17, 2011 7:53 AM
  • Thanks for that.
    Wednesday, August 17, 2011 7:54 AM
  • Folks,

    The optimization answer is trivial, I simply did not include it.

    However it presupposes information not available to the original query, and subsequently requires dynamic SQL based on an interrogation of meta data values. Not really a very acceptable workaround for such a trivial query.

     

    select max(rowid) from
     (select max(rowid) as rowid from T1 where ptnid=1
      union
      select max(rowid) from T1 where ptnid=2) t

     

    I was really more interested in knowing whether the optimizer is in fact incapable of this sort of query, whether I had missed something obvious.

     

    For me, this is just another example that demonstrates that SQLServer is really a very unsuitable tool for large databases. No news there.

     

    philip

     

     

     

    Wednesday, August 17, 2011 8:03 AM
  • I shall indeed do this. Not that I have any expectations whatsoever.

     

    philip

    Wednesday, August 17, 2011 8:04 AM
  • Sorry, I did not realise this forum is not threaded. I will quote in future
    Wednesday, August 17, 2011 12:12 PM