none
SQL Server Possible Order By Bug

    Question

  • Hello, 

    We noticed some odd behaviour of the ORDER BY clause inside the following pieces of code: 

    -- Batch 1:

    DECLARE @i INT = 0

    SELECT @i = @i + a

      FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) x

      ORDER BY a

    SELECT @i

    GO

    -- Batch 2:

    DECLARE @i INT = 0

    SELECT @i = @i + a

      FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) x

      ORDER BY 1

    SELECT @i

    Does anyone know why in Batch 2 example only the last value of the UNION is added to the @i variable ? 

    Thursday, November 21, 2013 8:16 AM

All replies

  • http://vyaskn.tripod.com/differences_between_set_and_select.htm

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, November 21, 2013 8:30 AM
    Answerer
  • Hello!
    The ORDER BY leads to a wrong execution plan.

    There is a connect item describing a similar problem with varchar.
    http://connect.microsoft.com/SQLServer/feedback/details/657590/optimizer-error-when-tertiary-weights-is-used-internaly
    The "order by" provides a guarantee on the order of rows returned to the client. However it does not provide a guarantee on the order of evaluation of computed values.

    In your case the ORDER BY is not necessary for the select and in SSMS 2012 it is marked with a red line, because it seems not to be right in this case.

    What do you think ORDER BY should do for you?

    Have a nice day,
    Christoph
    --
    Microsoft SQL Server MVP - <a href="http://www.insidesql.org/blogs/cmu">Blog</a>

    Thursday, November 21, 2013 8:31 AM
  • This is because they have two different execution plan. In the next figure it's illustrated that in first batch SQL Server First sort and then compute:


    sqldevelop.wordpress.com

    Thursday, November 21, 2013 8:41 AM
  • A quick question,

    When I put @I=@I+A, why it add all the values from the derived table.


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Thursday, November 21, 2013 9:41 AM
  • Dear "Uri Dimant SQL Server MVP",

    I have tested the problem reported above. Your answer is useless. Kindly please keep yourself away of the keyboard if you do not understand the problem.

    • Edited by Cristi Boboc Thursday, November 21, 2013 11:39 AM
    Thursday, November 21, 2013 11:14 AM
  • Dear Sharma,

    It is supposed to do exactly that sum.

    The problem is why, when using ORDER BY 1, it gives other result than the sum?

    The addition of integers is not subject to any particular order, it should be the same, isn't so?


    • Edited by Cristi Boboc Thursday, November 21, 2013 11:41 AM
    Thursday, November 21, 2013 11:22 AM
  • I have carefully tested the problem above and I have found no answers yet.

    But I can add information to clarify the problem:

    Because integer numbers are supposed to give the same sum if added in no matter what particular order, the ORDER BY above should NOT modify the result.

    The strange thing is that it does change the result.

    It looks like all the records are skipped except the last one !

    Before giving any quick answer, kindly please try to understand deeply the problem and test it in various ways, not only the one in the example above. The meaning is very deep and obvious at the same time.

    Thursday, November 21, 2013 11:33 AM
  • Dear Christoph.

    Exactly this is the problem, the ORDER BY here should have no impact, as the addition of integers is not sensitive with respect to the order of the numbers.

    Then why is it possible to have two different results, depending on the ORDER BY clause?


    • Edited by Cristi Boboc Thursday, November 21, 2013 11:44 AM
    Thursday, November 21, 2013 11:43 AM
  • hi bobocc

    I must say have not yet read the question, and already I do not feel that I want to try to help and answer. What if I'm wrong and I put an answer that you don't like? It's not that someone pays for the support we give. This volunteer at the expense of our own time, and our only proceeds is a good feeling when we know we help people or at least we tried, and someone appreciate it.

    * I didn't check the link yet... 


    [Personal Site] [Blog] [Facebook]signature

    Thursday, November 21, 2013 12:21 PM
    Moderator
  • Thursday, November 21, 2013 12:22 PM
  • Hello Pituach.

    Basically your question has much sense. Nobody should be blamed for something we may dislike or like.

    But this is not a matter of taste when someone uses its titles of specialist near the answer. And it is also a bit of respect that should be granted to the author of the problem, why bother him or her with completely useless words? His time isn't precious? The fact that here nobody is payed may be a reason to do our worst and populate the internet pages with anything we want? I felt the need to react because I am confident that the need was the reason of the first post. I was not able to help with an answer, but I have tried to add some clarifying ideas and some breaks for useless or insufficiently verified ideas.

    Thursday, November 21, 2013 12:39 PM
  • This is not a bug.  As Saeid Hasani points out, the two queries have different execution plans.  One of them assigns the variable after the sort, the other before.  Assigning values to variables in a SELECT statement are not guaranteed to occur after the ORDER BY.  Both queries return the correct values based on their execution plans.  Adding values in a variable dynamically in this way is not a supported feature, and is in fact a side-effect of the implementation.  If you want to build up a list in a variable then use concatenation methods such as FOR XML (example: http://sqlandme.com/2011/04/27/tsql-concatenate-rows-using-for-xml-path/) or for summing numeric values use aggregation functions such as SUM.

    @bobocc - Uri's link was intended to show the correct way of assigning values to variables with regard to queries that return multiple values.  Whether you feel this was a useful answer or not, responses such as the one you gave are NOT welcome in forums such as this.

    Thursday, November 21, 2013 12:49 PM
  • Bobocc,

    it is great, that you try to answer questions and help understanding a problem.

    The proposed link in Uri's posting answers some (other) questions about set and select and can be helpful, too.

    Please let Zgondea Paul decide, which answers were helpful for him. You can vote for helpful answers as well.

    Have a nice day,
    Christoph
    --
    Microsoft SQL Server MVP - http://www.insidesql.org/blogs/cmu

    Thursday, November 21, 2013 12:55 PM
  • Dear bobocc 

    It seems that you did not read properly the article I posted above.. I was about to expand the  answer but seen Mike's repply tells everything I had in my mind..I would not call it useless  or insufficient because of matter that people think about assigning the values to the variables in that way... But does not matter

    I am sure you will find the answer... Good luck.

    Uri Dimant SQL Server MVP

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Thursday, November 21, 2013 1:00 PM
    Answerer
  • It looks like a bug but... :-)

    Saeid is right and the question is why the Execution is different :-)

    and the answer is that using "order by 1" we use "ORDER BY position number". This is not like using "order by (select 1)". the number 1 is the position 1. In the select in our position 1 we have @i, there for the server Preferred first do the computed and then the "order by" (the sort ). "The order by does not guarantee the order of evaluation of computed values" (From Microsoft Team).

    * It is great Question (I added to my list of strange Q for interviews)


    [Personal Site] [Blog] [Facebook]signature

    Thursday, November 21, 2013 1:03 PM
    Moderator
  • Hello Mike.

    Is there a "right way" and a "wrong way" to initialize variables?

    What about this link: http://technet.microsoft.com/en-us/library/ms187330.aspx? Isn't it official? Does not express the official point of view of Microsoft about the usage of variables inside the SELECT?

    I do believe it is a bug, as also was stated by Cristoph. This post was definitely not about concatenation of the strings, which you brought into discussion for I do not know which reason.

    I disagree with the fact that such kind of "side effect" as you call it is not a bug. Then why at compile time the side effects encountered in functions are completely forbidden and this is not?

    I always thought that the result should be correct or an error should be thrown.

    If the discussion goes into better or worse ways to do things I always have thought it should be subject to some best practice documents.

    Are you absolutely sure your post is OK and I deserve what you have said about my post? My feeling is that you jumped on me too soon, and I take in advance all the blame for my feeling if wrong.

    Thursday, November 21, 2013 2:11 PM
  • Hi HimanshuSharma

    SQL Server as most database servers, is working on data set (one set of records). Therefor, when we use a query like the above, then operation is done on each one of the records in the data set. This is a well known optimal  way to to the job, that could be done using loop or cursor if we want (AND WE DONT!!!) to do it record by record.


    [Personal Site] [Blog] [Facebook]signature

    Thursday, November 21, 2013 2:47 PM
    Moderator
  • What about this link: http://technet.microsoft.com/en-us/library/ms187330.aspx? Isn't it official? Does not express the official point of view of Microsoft about the usage of variables inside the SELECT?

    This BOL article supports what I was saying.  At no point does it mention assigning an expression to a variable referencing itself (e.g. @i = @i + a).  It only states that "If the SELECT statement returns more than one value, the variable is assigned the last value that is returned."  At no point does it state that the "last value" is computed after or before the ORDER BY, or row-by-row.  So yes, assigning variables in a SELECT statement is supported, but the self-referencing form is not documented and is therefore unsupported.

    In both queries above, the last value is assigned to @i, as per the link above.  It just so happens that in the first case the order of operators in the plan means the value of @i is computed for every row, whereas in the second query it is only calculated once after the ordered dataset is returned.  Both results are correct, albeit frustratingly different.

    In cases like this, if you can't find specific documentation or examples then you must assume it is undocumented and unsupported, and avoid using it.

    My issue with your post had nothing to do with who was right or wrong, it was purely based on the fact it was rude and insulting in its manner.  Everyone is entitled to their opinion.  However rude and/or insulting responses are not necessary.  Constructive arguments supporting your own point should be used.

    Thursday, November 21, 2013 2:54 PM
  • Mike, I have tried with @i += a, exactly as it is documented there in the list of operators.

    The result is the same as with @i = @i + a.

    I believe your explanation regarding the importance of the ORDER BY place in the execution plan is the right one and this is what the SQL Server does in its depth.

    If you accept my best feelings for clearly state this explanation, then kindly please let me express them. This is the kind of answer expected from a professional.

    However, despite the accuracy of the answer, we can see that in one case the "last row" is the result of a computing, while in the other case the "last row" is taken without any calculation.

    Meantime I have made more tests and if no ORDER BY at all then 6 is returned. My feeling is it must be something more here. Is there any chance that the reason why we see different results is the fact that ORDER BY occurs only after all the computings are done? Which means also that ofter ORDER BY no computing could be expected to occur?

    I am simply trying to find a reason to accept this behavior as "intended", or "normal" or "that was supposed to be".

    What is "normal" for operators like "+="? From the link I have indicated above, which is the connection between "+= Add and assign" and "the variable is assigned the last value that is returned"?

    Wouldn't it be normal to always compute 0 + 3 which is the current value of the variable plus the value in the last row? Then why the value is 6 sometimes?

    You are, again, right. Rude or offending response should not occur. If my post was rude (I have used "kindly please" to ask for a clear stated favor) why if a professional is posting an answer which is vaguely related to the problem is not even more rude and offending? I would greatly appreciate your answer as English is not my native language and it seems I am missing some point here.

    Thursday, November 21, 2013 3:54 PM
  • Other tests, different results, only ORDER BY is changed:

    DECLARE @i INT = 0
    DECLARE @j INT = 0

    SELECT @i += 1,
           @j += a
      FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) x
      ORDER BY 1

    GO

    SELECT @i, @j

    DECLARE @i INT = 0
    DECLARE @j INT = 0

    SELECT @i += 1,
           @j += a
      FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) x
      ORDER BY 2

    SELECT @i, @j

    GO
    • Edited by Cristi Boboc Thursday, November 21, 2013 4:14 PM
    Thursday, November 21, 2013 4:14 PM
  • Thank you Cristoph!

    Thursday, November 21, 2013 4:26 PM
  • I must agree with you, I must find out an explanation of why this behaviour is happening and why is the SQL engine treating Batch 1 differently than Batch 2.
    • Edited by Zgondea Paul Thursday, November 21, 2013 4:30 PM
    Thursday, November 21, 2013 4:29 PM
  • Just to be clear - the expression is always fully evaluated.  When looking at the query plan and therefore the order of execution, if the variable is assigned its value after the ORDER BY then the expression is only evaluated once.  As the variable was initialised to 0, then the result in this case is 0 + <last value>.  You can prove this by not initialising the value (therefore it will be NULL).  You'll see that the end result is also NULL.

    In theory, if you know the execution plan then you can predict the outcome, however for unsupported features such as this it's generally better to simply avoid using them and find alternative solutions.  You never know, the next service pack or update may change the way the functionality works, leading to unwanted results.

    With regards to your comment, it sounds as though your intentions were perhaps not meant to be as rude as they appeared.  However as a rule, avoid comments that belittle other contributions and contributors.  If someone has posted something you feel is incorrect, then by all means correct them with supporting evidence.  Comments that you feel don't fully answer the question are not considered offensive, so please don't treat them as such.

    Thursday, November 21, 2013 4:31 PM
  • bobocc & all: POLITENESS is the key to successful forum threads beside the know-how of the participants.

    I get especially concerned when an alias is attacking forum contributors who are using their real names.

    Uri is an outstanding contributor to the MSDN SS forums.

    I am puzzled by the version with newid():

    DECLARE @i INT  = 0
    SELECT @i =  @i + a
       FROM (SELECT 1 a UNION  ALL SELECT 2 UNION ALL SELECT 3) x   ORDER BY newid()
    SELECT @i
    -- 3
    -- 2
    -- 3
    -- 1
    -- ...

    I recall a KB article (reference by Erland) which states that the ORDER BY is not reliable with SELECT @I=@I+a.... type of constructs.


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012




    Thursday, November 21, 2013 8:28 PM
    Moderator
  • I don't know if it is wise to put a lot of effort in finding out why it behaves the way it does, because the undefined behavior that you are building your query on may change under different circumstances or after patches or upgrades.

    The BOL article to which bobocc referred mentions this:
    expression

    Is any valid expression. This includes a scalar subquery.

    I have emphasized the word "scalar," because that is where it goes wrong here.

    In your case, the derived table is not a scalar (it returns 3 values), it is not an expression. The behavior of trying to add a set to a local variable is undefined.

    If you actually want the values added together, I suggest you select the SUM(a) and add that to the local variable.


    Gert-Jan

    Thursday, November 21, 2013 8:44 PM
  • Hi Gert-Jan,

    So how do you explain the newid() behavior?  It looks peculiar to me. Thanks.


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012



    Thursday, November 21, 2013 8:52 PM
    Moderator
  • What are you trying to prove with your example?

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Thursday, November 21, 2013 8:55 PM
    Moderator
  • The ORDER by newid() version eliminates the different execution plans theory which stood on shaky grounds anyhow.

    With reference to the original statements,  SSMS gives a red squiggly for the ORDER BY 1 as noted by Cristoph. BOL: "Specifies a column or expression on which to sort the query result set. A sort column can be specified as a name or column alias, or a nonnegative integer representing the position of the column in the select list."  LINK: http://technet.microsoft.com/en-us/library/ms188385.aspx

    There is not SELECT list in the outer query, therefore the ORDER BY 1 is not valid.


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012






    Thursday, November 21, 2013 9:01 PM
    Moderator
  • So how do you explain the newid() behavior?  It looks peculiar to me.

    Let's start with establishing that the ORDER BY clause it not part of the derived table, but part of the main query. That makes the ORDER BY pointless, since the main query return no rows; it only assigns a value to a variable.

    The behavior of "SELECT @i = a FROM (SELECT 1 a UNION SELECT 2) x" is undefined, because "SELECT 1 a UNION SELECT 2" is not a scalar.

    So the query has to force a set into a scalar. Any value out of the set can be argued to be correct (1, 2 or 3).

    To make matters worse, "SELECT @i = @i + (...)" can be interpreted in even more ways. One way is to force the set into a scalar. See above. The other way to to process each row of the subquery in the outer query. In that case, the value 6 can be argued to be a correct result as well.


    Gert-Jan

    Thursday, November 21, 2013 9:06 PM
  • Thanks Gert-Jan. Can you "enlighten" us, why the summation not occurring with the newid(), a valid clause?

    DECLARE @i INT  = 0;
    WITH x AS (SELECT ShiftID AS a from AdventureWorks2012.HumanResources.Shift )
    SELECT @i =  @i + a FROM  x   ORDER BY a;
    SELECT @i;
    GO
    -- 6
    
    DECLARE @i INT  = 0;
    WITH x AS (SELECT ShiftID AS a from AdventureWorks2012.HumanResources.Shift )
    SELECT @i =  @i + a FROM  x   ORDER BY NEWID();
    SELECT @i;
    GO
    -- 2 2 3 1 3 1 2 3....


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012



    Thursday, November 21, 2013 10:28 PM
    Moderator
  • Thanks Gert-Jan. Can you "enlighten" us, why the summation not occurring with the newid()

    Saeid Hasani was the first to explain this. It is caused by a different query plan.

    The point is, that both query plans are correct interpretations of the (ambiguous) query.

    a valid clause?

    Valid but pointless. Without a TOP clause, the ORDER BY clause in this query has no meaning, since the query has no output.


    Gert-Jan

    Thursday, November 21, 2013 11:18 PM
  • Hello Mr. Kalman Toth.

    I have a question about what you have said above: "There is not SELECT list in the outer query, therefore the ORDER BY 1 is not valid."

    In this case shouldn't throw an error?

    And then again: SQL language is not enough by itself to tell the results that would be obtained by a statement? As far as I 'm concerned there is a RFC stating that "the order in which the results are returned is not guaranteed" and the ORDER BY was defined exactly for that purpose.

    Kindly please look at my two batches example above having "ORDER BY 1, 2" and conversely. Is there any RFC, any official documentation, or any "common sense" reason as it comes out of the SQL standard used by Microsoft that could be invoked to explain why there are different results just because of the ORDER BY which is not supposed to change the returned values but only their order?

    Am I wrong while I assume ORDER BY is not supposed to change the result value?

    If not, then this is definitely an ugly bug. (The word "ugly" here is solely my perception and, therefore, it has no scientific value).

    About using an alias, I have the alias "bobocc" since 20 years ago and I never hide my identity when I have used it. I cannot say if other people have used the same alias but if used with my yahoo valid email: bobocc@yahoo.com it should be me.

    But I have changed it to my real name since I have seen people feels better that way.

    Friday, November 22, 2013 6:16 AM
  • May be few more variations(from my testing...):

    DECLARE @i INT = 0
    SELECT @i = @i + a
      FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) x
      ORDER BY a
    SELECT @i
    
    GO
    
    -- Batch 2:
    DECLARE @i INT = 0
    SELECT @i = @i + a
      FROM (Select Top 100 percent  * From (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3) A ORDER BY 1 ) x
    SELECT @i
    
    Go 
    --Batch 3:
    DECLARE @i INT = 0
    SELECT @i = SUM(a)
      FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3 ) x
     ORDER BY 1
    SELECT @i


    Friday, November 22, 2013 6:42 AM
  • Mr. Kalman Toth,

    Your argument "Uri is an outstanding contributor to the MSDN SS forums." is not fair.

    His behavior by posting what he has posted was rude over any limit, not his qualities as specialist.

    In need to emphasize the words "behavior" and "specialist". To invoke his skills in order to explain his behavior - which I cannot accept - is simply not fair!

    He has no excuse for posts like that! I mean posts that are sending us to somewhere that in the best case may be vaguely related to the posted problem. If he was not that qualified it could have been a mistake. From his level of qualification it was a dirty insult addressed with rude intention to us, the people interested in Paul's post.


    Do I need to go up to raise an explicit complaint to the moderators for the mistreat addressed to us by one of the specialists as they are signed? Or we can end here considering enough the fact that I have expressed my feelings about the insult that was thrown against us?

    Friday, November 22, 2013 6:58 AM
  • Hi Cristi,

    You could mark a post abusive if that is how you feel about it. But in this case just ignore it. Not all posts are excellent.

    TO ALL: If the answer is a LINK which exactly on the topic then fine. If not exactly on the topic, then additional explanation should be given to augment the link.

    About the queries in question: ORDER BY is supposed to be performed last in a SELECT query when every other processing is done. Therefore, it is strange that the ORDER BY changes the result of the query. Granted though that local variable self-assignment is not an ANSI SQL SELECT feature.

    SSMS 2012 shows red squiggly for the ORDER BY 1 clause. Apparently the SS engine team did not feel that it should be an error.


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012






    Friday, November 22, 2013 11:27 AM
    Moderator
  • Thank you Mr. Kalman Toth four your kind answer.

    The sum of the posts here was and still is very valuable to me and I think it is the time to express here all my best consideration and gratitude for people who have expressed constructive ideas and have contributed with all they could to clarify the problem.

    I wish also to express my gratitude for people that will post from now on for their interest and willingness to help!

    I was interested by this problem much beyond than the technical point of view because it addressed one of the particular features of SQL Server SELECT statement and I am fond of "special features".


    Friday, November 22, 2013 12:12 PM
  • Friday, November 22, 2013 10:01 PM
    Moderator
  • Mr. Kalman Toth,

    Thank you for the last post! After reading the article (and those linked tot it) I have no other comment or doubt about this problem.

    I still have to re-think how I have used this feature, but it is solely my problem now.

    From my perspective, you have marked the final and most important point (in fact the single important point) of this discussion.

    Monday, November 25, 2013 10:09 AM
  • The problem is why, when using ORDER BY 1, it gives other result than the sum?

    The addition of integers is not subject to any particular order, it should be the same, isn't so

    Hi Cristi Boboc,

    Here what I am asking is which is the property running in the backend which makes its as a "SUM".


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Tuesday, November 26, 2013 9:33 AM