none
CLR Aggregate performs an expensive sort

    Question

  • I've found that my SQL CLR Aggregates have a SORT in the execution plan which makes them VERY expensive.

    I've tried using the IsInvariantToOrder attribute however this has had no effect.  If it wasn't for the sorting the aggregate would be far cheaper than the T-SQL equivalent.

     

    Is this a feature of CLR aggregates and if not, how do I stop sorts happening?

    Wednesday, November 03, 2010 9:16 PM

Answers

  • Let me try this when I get home, because, although its possble that SQLCLR UDA would need a sort (because its implementation really is opaque to the engine and optimizer) and T-SQL version (which I assume is a function because T-SQL doesn't support UDAs) does not, it sure seems odd. I may ask to see the T-SQL function as well.

    As an unrelated aside, Denali CTP1 (released yesterday) is supposed to have more optimizations for UDAs (I don't currently know the specifics) so it will be a good test case.

    Cheers, Bob

     

    • Marked as answer by Ai-hua Qiu Thursday, November 11, 2010 8:00 AM
    Wednesday, November 10, 2010 2:20 PM
    Moderator

All replies

  • IsInvariantToOrder doesn’t do anything in the current implementation because the query processor is not able to guarantee sorted input on any kind of aggregate currently. It’s been there for ‘future use’ since 2005. It’s going to sort if it needs to do the equivalent of a streamed aggregate and there no index on the columns that the aggregate uses for input. Have you attempted adding an index on the grouping (aggregate input) columns?
     
    Cheers, Bob
     
    "PuzzledDBA" wrote in message news:957a5b94-c7d0-49d8-928d-7cccff14b0c6...

    I've found that my SQL CLR Aggregates have a SORT in the execution plan which makes them VERY expensive.

    I've tried using the IsInvariantToOrder attribute however this has had no effect.  If it wasn't for the sorting the aggregate would be far cheaper than the T-SQL equivalent.

     

    Is this a feature of CLR aggregates and if not, how do I stop sorts happening?

    Wednesday, November 03, 2010 9:32 PM
    Moderator
  • It was the first thing I did.

    My function is doing basic linear regression to find m in y=mx+c.

    I'm running the CLR and T-SQL versions side by side against exactly the same table so the differences are pronounced.

    If I tell both aggregates to run against the entire table with no GROUP BY then there are marginal performance differences, slightly in favour of the T-SQL version.

    The instant I add the GROUP BY the T-SQL version wins hands down taking 9% of the batch vs 91% of the batch.

    I've tried adding an ORDER BY to the T-SQL version but this has little detrimental affect because it is sorting the aggregated record set where as the CLR appears to be sorting the base recordset first.

    The table is very simple

    CREATE TABLE dbo.test_table (

    GroupingColumn TINYINT NOT NULL ,
    Col1 INT NOT NULL ,
    Col2 INT NOT NULL

    )

    CREATE INDEX idx_Test_Table_GroupingColumn ON dbo.test_table(GroupingColumn)

    I haven't put a clustered index on the table because it is only sample data used to test the veracity of the CLR function.  In the real world my data would have a primary key and a clustered key (I don't always make the primary key clustered if there is a more appropriate field).

    It is quite probable that in the real world my grouping column would not be the clustered key in any case.

    Mechanically the CLR aggregate does exactly what I want it to do, its purely the inclussion of a SORT in the execution plan that is ruining my day.  The SORT accounts for 94% of the cost of the CLR version of the query.

    As ever, very grateful for any help.

    PS.  I couldn't find your blog entry on blitable data types re the use of SqlDecimal.  Any chance of posting the link?

    Thursday, November 04, 2010 12:39 PM
  • Sorry for the delay, I'm in the midst of traveling.

    If you could send me the exact query that you're using and some small sample of data (here or to my email) I'll attempt to recreate/troubleshoot the sort. You may very well be right that that's the way it works, but if a T-SQL equivalent requires no sort.... hmm.

    Re: the blog postings...It may well be that the blog posting got purged in one of our blog reorgs. Or I didn't actually blog on that subject, but wrote it in a book or article instead. Sorry. So here goes with a short explanation....

    Only blittable types can use Format.Native. List of blittable types: http://msdn.microsoft.com/en-us/library/75dwhxf7.aspx. Note that System.Single (real in T-SQL) and System.Double (float in T-SQL) are on the list of blittable types, but System.Decimal is NOT.

    .NET System.Decimal is a variable precision decimal, based on IEEE 754 http://en.wikipedia.org/wiki/Decimal_floating_point. T-SQL DECIMAL/NUMERIC are fixed-precision decimals where the precision must be declared at variable/column declaration time and doesn't change. They have different value spaces too. .NET System.Decimal maximum precision is 28 http://msdn.microsoft.com/en-us/library/system.decimal.aspx and SQL Server decimal is 38.

    System.Data.Sql.Types.SqlDecimal attempts to act like T-SQL's decimal. But most .NET routines that you might call expect System.Decimal, producing rounding and value space idiosyncracies.

    Hope this helps, Cheers, Bob

     

    Friday, November 05, 2010 10:09 PM
    Moderator
  • Thanks Bob, I generated a test table in AdventureWorks and populated it as follows CREATE TABLE dbo.test_table( col1 FLOAT NOT NULL, col2 FLOAT NOT NULL ) GO INSERT INTO dbo.test_table SELECT TOP 30000 ROW_NUMBER() OVER(ORDER BY s1.id desc),ROW_NUMBER() OVER(ORDER BY s1.id ASC) FROM sysobjects,sysobjects s1 I deliberately chose 30,000 records so I could copy/paste them into Excel and produce a trend line and see the formula for the line. The next step was to confirm that the CLR and T-SQL versions provide the same value. select dbo.Slope(col1,col2) from test_table select (COUNT(*) *SUM(col1 * col2) - (SUM(col1) *SUM(col2))) / (COUNT(*) * SUM(col1 * col1) - (SUM(col1) *SUM(col1))) from test_table Having done that I simply added a column to my test table and populated it with the modulo of Col1 % 10 alter table test_table add GroupingColumn tinyint not null default(0); update test_table set GroupingColumn =cast(col1 as decimal(9,0)) % 10; I then added my GROUP BY clause to my first two queries. I have found that putting a clustered key against the columns participating in the GROUP BY eliminates the sort however as I can't predict what factors our data analysts are going to want to GROUP BY this is a bit of a non-starter. My aggregate CLR was as per the thread http://social.msdn.microsoft.com/Forums/en-US/sqlnetfx/thread/54e7a593-7735-4e5e-a2d6-c144cfeaf6af/#b3adbe2e-434d-43f8-be38-77b7d7548871 Nothing really special in it.
    Monday, November 08, 2010 7:37 PM
  • Let me try this when I get home, because, although its possble that SQLCLR UDA would need a sort (because its implementation really is opaque to the engine and optimizer) and T-SQL version (which I assume is a function because T-SQL doesn't support UDAs) does not, it sure seems odd. I may ask to see the T-SQL function as well.

    As an unrelated aside, Denali CTP1 (released yesterday) is supposed to have more optimizations for UDAs (I don't currently know the specifics) so it will be a good test case.

    Cheers, Bob

     

    • Marked as answer by Ai-hua Qiu Thursday, November 11, 2010 8:00 AM
    Wednesday, November 10, 2010 2:20 PM
    Moderator
  • I just got a chance to look at this some more. The reason behind the sort is the use of the StreamAggregate iterator (which requires sorted input) vs. the HashAggregate (which does not require sorted input) in the T-SQL query plan. So I tried specifying OPTION (HASH GROUP) hint in the SQLCLR query on this produced a "8622 - could not produce a plan with this hint" error. This seems to be no reason except for the use of the SQLCLR aggregate for this plan+hint to fail, so I submitted a suggestion or this on Connect (http://connect.microsoft.com/SQLServer/feedback/details/629920/allow-option-hash-group-with-sqlclr-udas).

    WRT my previous posting, it didn't work any differently in Denali CTP1.

    As you said one workaround is a clustered index on group by column. Another is covering index that includes all the columns needed with group by column first. But, as you said, if the grouping column is a choice of many and not know (because query submission is adhoc) this is not an option.

    Cheers, Bob

    Thursday, December 09, 2010 12:57 AM
    Moderator
  • Sorry for this old thread revival,

     

    I have the exact same problem, did you find a solution for that Bob ?

    I tried adding more indexed view in my queries, but that's not possible, the sort clr aggregate is still very consuming

     

    I also send a mail to Eric Hanson from the other thread, but I cant find you mail Bob :)

     

    Charles

    Tuesday, September 27, 2011 9:10 AM
  • Hi Charles,

    I didn't get a resolution, but did put in a Connect suggestion here: https://connect.microsoft.com/SQLServer/feedback/details/629920/allow-option-hash-group-with-sqlclr-udas . It has 8 votes so far (and won't be in Denali).

    Cheers, Bob

    Tuesday, September 27, 2011 6:36 PM
    Moderator