ROW_NUMBER OVER PARTITION BY Puzzle

# ROW_NUMBER OVER PARTITION BY Puzzle

• Monday, February 18, 2013 2:33 PM
Moderator

Can the following be done in one CTE (I want to keep the outer query clean without RN predicate)?  Thanks.

The nested CTE-s are selecting the recent salary rate for each employee of AdventureWorks Cycles fictitious company, a manufacturer of mountain, road and touring bikes.

```WITH CTE AS (SELECT EmpID=BusinessEntityID, RN=ROW_NUMBER()
ORDER BY RateChangeDate DESC), Rate
FROM HumanResources.EmployeePayHistory),
CTEfinal AS (SELECT EmpID, Rate FROM CTE WHERE RN = 1)

SELECT * FROM CTEfinal ORDER BY EmpID;```

EmpID Rate

1 125.50

2 63.4615

3 43.2692

4 29.8462

5 32.6923

6 32.6923

....

Kalman Toth Database & OLAP Architect
Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012

### All Replies

• Monday, February 18, 2013 3:01 PM

WITH CTE AS (SELECT EmpID=BusinessEntityID, RN=ROW_NUMBER()
ORDER BY RateChangeDate DESC), Rate
FROM HumanResources.EmployeePayHistory)
SELECT * FROM CTEfinal WHERE RN= 1 ORDER BY EmpID;

Or am I missing something?

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
• Monday, February 18, 2013 3:10 PM

Trivially yes, but really no.  Window functions can only be used in SELECT and ORDER BY.

You can always switch between CTEs and Subqueries, but a CTE "pipeline" is much nicer than a bunch of lexically nested subqueries.

```WITH CTEfinal AS
(
SELECT EmpID, Rate
from
(
ORDER BY RateChangeDate DESC), Rate
FROM HumanResources.EmployeePayHistory
) CTE
where RN = 1
)
SELECT * FROM CTEfinal ORDER BY EmpID;```
David

David http://blogs.msdn.com/b/dbrowne/

• Monday, February 18, 2013 4:00 PM

Guessing you don't want the old-fashioned way of doing this?  You could of course wrap this in a CTE but no need for this simple example:

```SELECT eph.BusinessEntityID AS EmpID, eph.Rate
FROM HumanResources.EmployeePayHistory eph
INNER JOIN
(
FROM HumanResources.EmployeePayHistory
AND eph.RateChangeDate = max_ph.RateChangeDate```

• Monday, February 18, 2013 4:25 PM
Moderator

Is GROUP BY better than OVER PARTITION BY? Thanks.

Kalman Toth Database & OLAP Architect
Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012

• Monday, February 18, 2013 4:43 PM

Well I had to scale up to 2.5 million rows to see any difference, but the GROUP BY with subquery does appear slightly quicker, similar reads, and better CPU ( eg ~10k versus 2k ).  To be honest the difference is fairly trivial, although in terms of CPU the GROUP BY query makes a slightly better neighbour on a server.  See what you think.

My simple rig:

Warning!! Do not run this on a production server!

```USE AdventureWorks2012
GO

IF OBJECT_ID('HumanResources.EmployeePayHistory2') IS NOT NULL DROP TABLE HumanResources.EmployeePayHistory2
GO

SELECT *
INTO HumanResources.EmployeePayHistory2
FROM HumanResources.EmployeePayHistory
GO

-- OK we haven't done any indexes ....

INSERT INTO HumanResources.EmployeePayHistory2
SELECT BusinessEntityID, DATEADD( day, 1, RateChangeDate ), Rate + (Rate * 0.1), PayFrequency, GETDATE()
FROM HumanResources.EmployeePayHistory2
GO 13

SELECT FORMAT( COUNT(*), '#,#') FROM HumanResources.EmployeePayHistory2
GO

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO

;WITH CTE AS
(
OVER ( PARTITION BY BusinessEntityID ORDER BY RateChangeDate DESC), Rate
FROM HumanResources.EmployeePayHistory2)
, CTEfinal AS (SELECT * FROM CTE WHERE RN = 1)
SELECT * FROM CTEfinal ORDER BY EmpID;
GO

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO

FROM HumanResources.EmployeePayHistory2 eph
INNER JOIN
(
FROM HumanResources.EmployeePayHistory2
AND eph.RateChangeDate = max_ph.RateChangeDate
ORDER BY EmpID
GO

-- Cleanup
IF OBJECT_ID('HumanResources.EmployeePayHistory2') IS NOT NULL DROP TABLE HumanResources.EmployeePayHistory2
GO```

Profiler result:

• Monday, February 18, 2013 8:48 PM
Moderator

I repeated your test. The GROUP BY version beats OVER PARTITION BY for the sample data.

Kalman Toth Database & OLAP Architect
Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012

• Monday, February 18, 2013 9:52 PM

Once you have some index structure, cross apply sometimes gets really good (and parallelizable) plans.

```create  clustered index ix_eph2

from HumanResources.Employee emp
cross apply
(select top 1 Rate from HumanResources.EmployeePayHistory2
order by RateChangeDate desc) Rates```

David

David http://blogs.msdn.com/b/dbrowne/

• Monday, February 18, 2013 10:49 PM

One word of caution.  The GROUP BY method really is equivalent to the RANK() (or DENSE_RANK) function, not the ROW_NUMBER() function.  It works in this case since since for each BusinessEntityID, there is only one row in EmployeePayHistory2 with the maximum value in RateChangeDate.  If for a particular BusinessEntityID there were multiple rows with RateChangeDate equal to the maximum value of RateChangeDate for that BusinessEntityID, all of those rows would be returned.  That is what RANK or DENSE_RANK would do, ROW_NUMBER will return only one of the rows.

Tom

• Tuesday, February 19, 2013 1:05 AM
Moderator

Including CROSS APPLY method in the test.

The first "select" is the CROSS APPLY method, a sure winner (see "Reads", "CPU") for this sample data.

```DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO
from HumanResources.Employee emp
cross apply
(select top 1 Rate from HumanResources.EmployeePayHistory2
order by RateChangeDate desc) Rates
ORDER BY EmpID;
GO
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO

;WITH CTE AS
(
OVER ( PARTITION BY BusinessEntityID ORDER BY RateChangeDate DESC), Rate
FROM HumanResources.EmployeePayHistory2)
, CTEfinal AS (SELECT * FROM CTE WHERE RN = 1)
SELECT * FROM CTEfinal ORDER BY EmpID;
GO

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO

FROM HumanResources.EmployeePayHistory2 eph
INNER JOIN
(
FROM HumanResources.EmployeePayHistory2
AND eph.RateChangeDate = max_ph.RateChangeDate
ORDER BY EmpID
GO```

Kalman Toth Database & OLAP Architect
Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012

• Tuesday, February 19, 2013 1:28 AM

This was my winner (with original index):

```ALTER TABLE HumanResources.EmployeePayHistory2 ADD CONSTRAINT PK PRIMARY KEY ( BusinessEntityID, RateChangeDate )
GO

(
select top 1 Rate from HumanResources.EmployeePayHistory2
order by RateChangeDate desc ) Rate
from HumanResources.Employee emp
order by 1```

Seems counter-intuitive that a correlated subquery with a sort should win.  Still, goes to show you should always try more than one method with your data.

• Tuesday, February 19, 2013 1:37 AM

Good point, however the original primary key on HumanResource.EmployeePayHistory is on ( BusinessEntity, RateChangeDate ) would have enforced this.  So for this particular example they would be equivalent.
• Tuesday, February 19, 2013 3:10 AM

While they are the same here since we only want one column from group, the OUTER APPLY can return any number of columns from the first row in the group.  So it's semantically equivalent to the ROW_NUMBER() query.

David

David http://blogs.msdn.com/b/dbrowne/