Find consecutive days from overlapping date ranges to keep from double counting them in a DATEDIFF

# Find consecutive days from overlapping date ranges to keep from double counting them in a DATEDIFF

• Wednesday, January 09, 2013 1:55 PM

Consider a table this script to that produces a table

create table test
(personID int,in_Date date, out_Date date

constraint pk_person_ID_in_Date primary key (personID,in_Date))
insert into test
values (1,'2009-10-06','2009-11-05')
,(1,'2009-11-03','2009-12-03')--this row's in_Date is within the range of the previous rows
,(1,'2007-12-26','2007-12-31')
,(1,'2007-12-27','2008-01-05')
,(2,'2008-08-28','2008-09-27')
,(2,'2007-09-04','2007-12-03')
,(2,'2007-03-08','2007-04-08')
,(2,'2007-05-21','2007-08-19')

The first row has a span of dates that encompasses the next row's.  I'm looking for the number of days that these records' dates overlap.

In the first row this would only be four days (12-27 through 12-31).  I've seen a solution something similar (on this site and others) to

select
person_ID
,MIN(in_date)
,MAX(out_date)
,DATEDIFF(day,min(in_date),MAX(out_Date))
from
(
select
t.person_ID
,t.in_Date
,t.out_Date
,DATEADD(day,-row_number() over(partition by person_id order by in_Date),out_date) as groupDate
from test as t
) as x
group by person_id,groupDate
order by person_ID

this does NOT return the correct answer, but it's meandering its way toward it. Could someone help me a.) find a solution, and b.) help me understand what's going on when you group by a date range

***EDIT the question has been edited to correctly express my intent.

I would like to find the DATEDIFF(day,in_Date,out_Date) of every row, but NOT include the overlapping date ranges. to keep from double counting them.

### All Replies

• Wednesday, January 09, 2013 2:08 PM

Hi,

this may work for you:

```;WITH CTE (rownumber, person_id, in_date, out_date)
AS
(
SELECT	ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY in_date),
personid,
in_Date,
out_date
FROM	dbo.Test
)
SELECT f.rownumber,
f.person_id,
f.in_date,
f.out_date,
CASE WHEN s.in_date BETWEEN f.in_date AND f.out_date
THEN DATEDIFF (dd, s.in_date, f.out_date)
ELSE 0
END
FROM CTE f LEFT JOIN CTE s
ON	(
f.rownumber = s.rownumber - 1 AND
f.person_id = s.person_id
);```

If you have SQL 2012 have a look to LEAD and LAG which may cover the problem in a much easier way than CTE!

Uwe Ricken

MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development

db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only)

• Wednesday, January 09, 2013 2:10 PM

@ Uwe Could you please explain a little bit of your methodology here?
• Edited by Wednesday, January 09, 2013 2:11 PM clarity
•
• Wednesday, January 09, 2013 2:13 PM

@Uwe, I don't think I made myself clear enough in the question title.  What I meant to convey is I need the DATEDIFF of the in_date and out_date, but do not double count the overlapping days.
• Wednesday, January 09, 2013 2:20 PM

Hallo sqlservernoob,

i'm using a CTE (common table expression) which groups the results by person_id and add a rownumber for each group. So result is 1, 2, 3, ... for each person_id. Furthermore the resultset for each person_id will be sorted by the in_date (which is essential for the numbering.

The seond part of the cte "uses" the result of cte by using a "self join" of the cte. The join is using the rownumber as JOIN and the person_id. The special thing with the rownumber is that i join the rownumbers not x = x but x = x - 1.

The alias f is for "first" and the alias s is for "second". (e.g. 1 = 2 - 1)

The last is to compare the in_date of the second row with the in_date and the out_date of the first row. If the in_date is in between i calculate the differences. If not it is 0.

Get more details concerning CTE, JOIN, DATEDIFF here:

Uwe Ricken

MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development

db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only)

• Wednesday, January 09, 2013 2:41 PM

@Uwe, I don't think I made myself clear enough in the question title.  What I meant to convey is I need the DATEDIFF of the in_date and out_date, but do not double count the overlapping days.

But that's what my solution will show. e.g. you have an overlap of 4 days

1: (1,'2007-12-26','2007-12-31')
2: (1,'2007-12-27','2008-01-05')

If you check my query you'll see a diff of 4 days because record 2 overlaps record 1 for 4 days.

Uwe Ricken

MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development

db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only)

• Wednesday, January 09, 2013 2:56 PM

Consider this query

select person_id,SUM(datedifference) as TotalDaysEventsAreConcurrent
from
(
select t.*,DATEDIFF(day,in_date,out_date) as DateDifference
from test as t
) x
group by person_id

That returns --

id  TotalDaysEventsAreConcurrent

1  74

2  288

74 is not correct because there are 6 days that the records overlap.  Your solution will work,but I would have to do a little trickery.  Is there a way to subtract the overlapping dates AND keep the datediff(day,in_date,out_date) if that do not overlap?

• Wednesday, January 09, 2013 3:03 PM

Maybe this is what you mean:

```;WITH CTE (rownumber, person_id, in_date, out_date)
AS
(
SELECT	ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY in_date),
personid,
in_Date,
out_date
FROM	dbo.Test
)
SELECT f.rownumber,
f.person_id,
f.in_date,
f.out_date,
DATEDIFF(dd, f.in_date, f.out_date) -
CASE WHEN s.in_date BETWEEN f.in_date AND f.out_date
THEN DATEDIFF (dd, s.in_date, f.out_date)
ELSE 0
END
FROM CTE f LEFT JOIN CTE s
ON	(
f.rownumber = s.rownumber - 1 AND
f.person_id = s.person_id
);```

Uwe Ricken

MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development

db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only)

• Marked As Answer by Wednesday, January 09, 2013 8:17 PM
•
• Wednesday, January 09, 2013 8:18 PM

This looks like exactly what I needed, and I can understand where it's coming from.  I didn't know you could use a case statement like you did in this example. Learn something new everyday, thanks.
• Thursday, January 10, 2013 1:00 AM

Your DDL is incomplete; I was able to use your narrative to do what you should have done for us. Remember that 80 to 95% of the work in SQL is in the DDL.

CREATE TABLE Test
(person_id INTEGER NOT NULL,
in_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY (person_id, in_date),
out_date DATE NOT NULL,
CHECK (in_date <= out_date));

INSERT INTO Test
VALUES (1, '2009-10-06', '2009-11-05'),
(1, '2009-11-03', '2009-12-03'), --this row's in_date is within the range of the previous rows
(1, '2007-12-26', '2007-12-31'),
(1, '2007-12-27', '2008-01-05'),
(2, '2008-08-28', '2008-09-27'),
(2, '2007-09-04', '2007-12-03'),
(2, '2007-03-08', '2007-04-08'),
(2, '2007-05-21', '2007-08-19');

The first row has a span of dates that encompasses the next row's.  I'm looking for the number of days that these records [rows are not records] overlap.

SELECT T.person_id, C.cal_date
FROM Test AS T, Calendar AS C
WHERE C.cal_date BETWEEN in_date AND out_date
GROUP BY T.person_id, C.cal_date
HAVING COUNT(*) > 1; -- overlap count

1 2007-12-27
1 2007-12-28
1 2007-12-29
1 2007-12-30
1 2007-12-31
1 2009-11-03
1 2009-11-04
1 2009-11-05

If your calendar has a julian sequence numer, you can use that to do the groupings with ROW_NUMBER() .

--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

• Thursday, January 10, 2013 1:09 AM

WITH Overlaps (person_id, cal_date)
AS
(SELECT T.person_id, C1.cal_date
FROM Something_Tests AS T, Calendar AS C1
WHERE C1.cal_date BETWEEN in_date AND out_date
GROUP BY T.person_id, C1.cal_date
HAVING COUNT(*) > 1),-- overlap count

Groupings (person_id, cal_date, grp_nbr)
AS
(SELECT O.person_id, O.cal_date,
C2.julian_seq -
ROW_NUMBER() OVER (PARTITION BY O.person_id ORDER BY O.cal_date)
FROM Overlaps AS O, Calendar AS C2
WHERE C2.cal_date = O.cal_date)

SELECT person_id, MIN(cal_date), MAX(cal_date)
FROM Groupings
GROUP BY person_id, grp_nbr;

--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL