Advantages of Aggregates with OVER Clause?
-
Sunday, January 27, 2013 5:16 PMModerator
BOL: In this example, using the OVERclause is more efficient than using subqueries to derive the aggregate values.
SELECT SalesOrderID, ProductID, OrderQty
,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS Total
,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Avg"
,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Count"
,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Min"
,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Max"
WHERE SalesOrderID IN(43659,43664);
http://msdn.microsoft.com/en-us/library/ms189461
Why is it compared to subqueries?
My instinct is GROUP BY as alternative way of getting same results.
Thanks.
Kalman Toth SQL 2008 GRAND SLAM
Paperback: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012
All Replies
-
Sunday, January 27, 2013 6:09 PMAs the PARTITON BY specifies the same column, the result of this operation can be reused. This results in a lazy spool operation in the execution plan. The sub-queries do not reuse the intermediate results of the WHERE clause, they are handled separately. Thus you can see five execution paths in the plan.
-
Sunday, January 27, 2013 6:21 PMModerator
The reason it is compared to subqueries is because GROUP BY can *not* achieve the same result.
With Group by you must group by all the columns you don't aggregate over in the SELECT clause. Since the above SELECT returns (SalesOrderID, ProductID, OrderQty) without aggregating over them, it would have to also GROUP BY those same columns. But each over clause only uses SalesOrderID.
Grouping over only SalesOrderID is not the same as grouping over (SalesOrderID, ProductID, OrderQty).
-
Sunday, January 27, 2013 6:22 PMModerator
Hi Stefan,
Why subqueries at all? My first choice would be a GROUP BY alternative.
Kalman Toth SQL 2008 GRAND SLAM
Paperback: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012 -
Sunday, January 27, 2013 6:38 PM
My instinct is GROUP BY as alternative way of getting same results.
This query wants ProductID, and OrderQty in the result. If you did this query with GROUP BY SalesOrderID, then you can't have ProductID and OrderQty in the result. So you would have to do something like a cte to get the SalesOrderID and the aggregate values for each SalesOrderID and then join back to the table to get the ProductID and OrderQty columns. So something like
;With cte As (SELECT SalesOrderID ,SUM(OrderQty) AS Total ,AVG(OrderQty) AS "Avg" ,COUNT(OrderQty) AS "Count" ,MIN(OrderQty) AS "Min" ,MAX(OrderQty) AS "Max" FROM Sales.SalesOrderDetail WHERE SalesOrderID IN(43659,43664) GROUP BY SalesOrderID) Select s.SalesOrderID, ProductID, OrderQty, Total, Avg, Count, Min, Max From Sales.SalesOrderDetail s Inner Join cte c On s.SalesOrderID = c.SalesOrderID;But since this makes SQL go back to the original table, it's more expensive than the OVER(PARTITION BY method.
Tom
-
Sunday, January 27, 2013 7:06 PMModerator
If you remove the WHERE clause, the GROUP BY versions doing better; relative costs: 38%, 31%, 31%.
I only know one thing about optimization which is 100% true: never make a blanket statement because it comes back and bytes (?!) you! Use "maybe" frequently! Also "probably"! The all time favorite: "it depends".
SELECT SalesOrderID, ProductID, OrderQty ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS Total ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Avg" ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Count" ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Min" ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Max" FROM Sales.SalesOrderDetail --WHERE SalesOrderID IN(43659,43664); GO ;With cte As (SELECT SalesOrderID ,SUM(OrderQty) AS Total ,AVG(OrderQty) AS "Avg" ,COUNT(OrderQty) AS "Count" ,MIN(OrderQty) AS "Min" ,MAX(OrderQty) AS "Max" FROM Sales.SalesOrderDetail --WHERE SalesOrderID IN(43659,43664) GROUP BY SalesOrderID) Select s.SalesOrderID, ProductID, OrderQty, Total, Avg, Count, Min, Max From Sales.SalesOrderDetail s Inner Join cte c On s.SalesOrderID = c.SalesOrderID; GO Select s.SalesOrderID, ProductID, OrderQty, Total, Avg, Count, Min, Max From Sales.SalesOrderDetail s Inner Join (SELECT SalesOrderID ,SUM(OrderQty) AS Total ,AVG(OrderQty) AS "Avg" ,COUNT(OrderQty) AS "Count" ,MIN(OrderQty) AS "Min" ,MAX(OrderQty) AS "Max" FROM Sales.SalesOrderDetail --WHERE SalesOrderID IN(43659,43664) GROUP BY SalesOrderID) c On s.SalesOrderID = c.SalesOrderID;Kalman Toth SQL 2008 GRAND SLAM
Paperback: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012 -
Monday, January 28, 2013 3:50 AM
I only know one thing about optimization which is 100% true: never make a blanket statement because it comes back and bytes (?!) you! Use "maybe" frequently! Also "probably"! The all time favorite: "it depends".
I certainly <grin> agree.
Tom
-
Monday, January 28, 2013 6:36 AM
May be without any where clause i.e for all data , there is increased use of tempdb in case of LAZY spool so it is showing a high cost as compared to plain GROUP BY...Also I think using is aggregates with OVER Clause are programmed to use the plan which involves the spools as I tried adding hint but it didn't changed the query plan...For the below query still is it using the spools or did I missed anything else
SELECT SalesOrderID, ProductID, OrderQty ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS Total ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Avg" ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Count" ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Min" ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS "Max" FROM Sales.SalesOrderDetail OPTION (MERGE JOIN, RECOMPILE) --WHERE SalesOrderID IN(43659,43664); GO
Thanks and regards, Rishabh K

