Advantages of Aggregates with OVER Clause?

Unanswered Advantages of Aggregates with OVER Clause?

All Replies

  • Sunday, January 27, 2013 6:09 PM
     
     
    As the PARTITON BY specifies the same column, the result of this operation can be reused. This results in a lazy spool operation in the execution plan. The sub-queries do not reuse the intermediate results of the WHERE clause, they are handled separately. Thus you can see five execution paths in the plan.
  • Sunday, January 27, 2013 6:21 PM
    Moderator
     
     

    The reason it is compared to subqueries is because GROUP BY can *not* achieve the same result.

    With Group by you must group by all the columns you don't aggregate over in the SELECT clause. Since the above SELECT returns (SalesOrderID, ProductID, OrderQty) without aggregating over them, it would have to also GROUP BY those same columns. But each over clause only uses SalesOrderID.

    Grouping over only SalesOrderID is not the same as grouping over (SalesOrderID, ProductID, OrderQty).


    Tibor Karaszi, SQL Server MVP | web | blog

  • Sunday, January 27, 2013 6:22 PM
    Moderator
     
     

    Hi Stefan,

    Why subqueries at all?  My first choice would be a GROUP BY alternative.


    Kalman Toth SQL 2008 GRAND SLAM
    Paperback: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012

  • Sunday, January 27, 2013 6:38 PM
     
      Has Code
    My instinct is GROUP BY as alternative way of getting same results.

    This query wants ProductID, and OrderQty in the result.  If you did this query with GROUP BY SalesOrderID, then you can't have ProductID and OrderQty in the result.  So you would have to do something like a cte to get the SalesOrderID and the aggregate values for each SalesOrderID and then join back to the table to get the ProductID and OrderQty columns.  So something like

    ;With cte As (SELECT SalesOrderID
        ,SUM(OrderQty)  AS Total
        ,AVG(OrderQty)  AS "Avg"
        ,COUNT(OrderQty)  AS "Count"
        ,MIN(OrderQty)  AS "Min"
        ,MAX(OrderQty)  AS "Max"
    FROM Sales.SalesOrderDetail
    WHERE SalesOrderID IN(43659,43664)
    GROUP BY SalesOrderID)
    Select s.SalesOrderID, ProductID, OrderQty, Total, Avg, Count, Min, Max
    From Sales.SalesOrderDetail s
    Inner Join cte c On s.SalesOrderID = c.SalesOrderID;

    But since this makes SQL go back to the original table, it's more expensive than the OVER(PARTITION BY method.

    Tom

  • Sunday, January 27, 2013 7:06 PM
    Moderator
     
      Has Code

    If you remove the WHERE clause, the GROUP BY versions doing better; relative costs: 38%, 31%, 31%.

    I only know one thing about optimization which is 100% true: never make a blanket statement because it comes back and bytes (?!) you! Use "maybe" frequently! Also "probably"! The all time favorite: "it depends".

    SELECT SalesOrderID, ProductID, OrderQty
        ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS Total
        ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Avg"
        ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Count"
        ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Min"
        ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Max"
    FROM Sales.SalesOrderDetail 
    --WHERE SalesOrderID IN(43659,43664);
    GO
    
    ;With cte As (SELECT SalesOrderID
        ,SUM(OrderQty)  AS Total
        ,AVG(OrderQty)  AS "Avg"
        ,COUNT(OrderQty)  AS "Count"
        ,MIN(OrderQty)  AS "Min"
        ,MAX(OrderQty)  AS "Max"
    FROM Sales.SalesOrderDetail
    --WHERE SalesOrderID IN(43659,43664)
    GROUP BY SalesOrderID)
    Select s.SalesOrderID, ProductID, OrderQty, Total, Avg, Count, Min, Max
    From Sales.SalesOrderDetail s
    Inner Join cte c On s.SalesOrderID = c.SalesOrderID;
    GO
    
    Select s.SalesOrderID, ProductID, OrderQty, Total, Avg, Count, Min, Max
    From Sales.SalesOrderDetail s
    Inner Join (SELECT SalesOrderID
        ,SUM(OrderQty)  AS Total
        ,AVG(OrderQty)  AS "Avg"
        ,COUNT(OrderQty)  AS "Count"
        ,MIN(OrderQty)  AS "Min"
        ,MAX(OrderQty)  AS "Max"
    FROM Sales.SalesOrderDetail
    --WHERE SalesOrderID IN(43659,43664)
    GROUP BY SalesOrderID) c On s.SalesOrderID = c.SalesOrderID;


    Kalman Toth SQL 2008 GRAND SLAM
    Paperback: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012

  • Monday, January 28, 2013 3:50 AM
     
     
    I only know one thing about optimization which is 100% true: never make a blanket statement because it comes back and bytes (?!) you! Use "maybe" frequently! Also "probably"! The all time favorite: "it depends".

    I certainly <grin> agree.

    Tom

  • Monday, January 28, 2013 6:36 AM
     
      Has Code

    May be without any where clause i.e for all data , there is increased use of tempdb in case of LAZY spool so it is showing a high cost as compared to plain GROUP BY...Also I think using is aggregates with OVER Clause are programmed to use the plan which involves the spools as I tried adding hint but it didn't changed the query plan...For the below query still is it using the spools or did I missed anything else

    SELECT SalesOrderID, ProductID, OrderQty
        ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS Total
        ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Avg"
        ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Count"
        ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Min"
        ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Max"
    FROM Sales.SalesOrderDetail 
    OPTION (MERGE JOIN, RECOMPILE)
    --WHERE SalesOrderID IN(43659,43664);
    GO


    Thanks and regards, Rishabh K