Find consecutive days from overlapping date ranges to keep from double counting them in a DATEDIFF
-
Wednesday, January 09, 2013 1:55 PM
Consider a table this script to that produces a table
create table test
(personID int,in_Date date, out_Date date
constraint pk_person_ID_in_Date primary key (personID,in_Date))
insert into test
values (1,'2009-10-06','2009-11-05')
,(1,'2009-11-03','2009-12-03')--this row's in_Date is within the range of the previous rows
,(1,'2007-12-26','2007-12-31')
,(1,'2007-12-27','2008-01-05')
,(2,'2008-08-28','2008-09-27')
,(2,'2007-09-04','2007-12-03')
,(2,'2007-03-08','2007-04-08')
,(2,'2007-05-21','2007-08-19')The first row has a span of dates that encompasses the next row's. I'm looking for the number of days that these records' dates overlap.
In the first row this would only be four days (12-27 through 12-31). I've seen a solution something similar (on this site and others) to
select
person_ID
,MIN(in_date)
,MAX(out_date)
,DATEDIFF(day,min(in_date),MAX(out_Date))
from
(
select
t.person_ID
,t.in_Date
,t.out_Date
,DATEADD(day,-row_number() over(partition by person_id order by in_Date),out_date) as groupDate
from test as t
) as x
group by person_id,groupDate
order by person_IDthis does NOT return the correct answer, but it's meandering its way toward it. Could someone help me a.) find a solution, and b.) help me understand what's going on when you group by a date range
***EDIT the question has been edited to correctly express my intent.
I would like to find the DATEDIFF(day,in_Date,out_Date) of every row, but NOT include the overlapping date ranges. to keep from double counting them.
- Edited by sqlservernoob Wednesday, January 09, 2013 2:18 PM
- Edited by sqlservernoob Wednesday, January 09, 2013 2:20 PM
All Replies
-
Wednesday, January 09, 2013 2:08 PM
Hi,
this may work for you:
;WITH CTE (rownumber, person_id, in_date, out_date) AS ( SELECT ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY in_date), personid, in_Date, out_date FROM dbo.Test ) SELECT f.rownumber, f.person_id, f.in_date, f.out_date, CASE WHEN s.in_date BETWEEN f.in_date AND f.out_date THEN DATEDIFF (dd, s.in_date, f.out_date) ELSE 0 END FROM CTE f LEFT JOIN CTE s ON ( f.rownumber = s.rownumber - 1 AND f.person_id = s.person_id );
If you have SQL 2012 have a look to LEAD and LAG which may cover the problem in a much easier way than CTE!
Uwe Ricken
MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development
db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only)
- Edited by Uwe RickenMicrosoft Community Contributor Wednesday, January 09, 2013 2:10 PM
- Edited by Uwe RickenMicrosoft Community Contributor Wednesday, January 09, 2013 2:34 PM
-
Wednesday, January 09, 2013 2:10 PM@ Uwe Could you please explain a little bit of your methodology here?
- Edited by sqlservernoob Wednesday, January 09, 2013 2:11 PM clarity
-
Wednesday, January 09, 2013 2:13 PM@Uwe, I don't think I made myself clear enough in the question title. What I meant to convey is I need the DATEDIFF of the in_date and out_date, but do not double count the overlapping days.
-
Wednesday, January 09, 2013 2:20 PM
Hallo sqlservernoob,
i'm using a CTE (common table expression) which groups the results by person_id and add a rownumber for each group. So result is 1, 2, 3, ... for each person_id. Furthermore the resultset for each person_id will be sorted by the in_date (which is essential for the numbering.
The seond part of the cte "uses" the result of cte by using a "self join" of the cte. The join is using the rownumber as JOIN and the person_id. The special thing with the rownumber is that i join the rownumbers not x = x but x = x - 1.
The alias f is for "first" and the alias s is for "second". (e.g. 1 = 2 - 1)
The last is to compare the in_date of the second row with the in_date and the out_date of the first row. If the in_date is in between i calculate the differences. If not it is 0.
Get more details concerning CTE, JOIN, DATEDIFF here:
CTE: http://msdn.microsoft.com/en-us/library/ms175972.aspx
JOIN: http://msdn.microsoft.com/en-us/library/ms191517(v=sql.105).aspx
DATEDIFF: http://msdn.microsoft.com/en-us/library/ms189794.aspxUwe Ricken
MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development
db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only)- Proposed As Answer by Satheesh Variath Wednesday, January 09, 2013 2:59 PM
-
Wednesday, January 09, 2013 2:41 PM
@Uwe, I don't think I made myself clear enough in the question title. What I meant to convey is I need the DATEDIFF of the in_date and out_date, but do not double count the overlapping days.
But that's what my solution will show. e.g. you have an overlap of 4 days
1: (1,'2007-12-26','2007-12-31')
2: (1,'2007-12-27','2008-01-05')If you check my query you'll see a diff of 4 days because record 2 overlaps record 1 for 4 days.
Uwe Ricken
MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development
db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only) -
Wednesday, January 09, 2013 2:56 PM
Consider this query
select person_id,SUM(datedifference) as TotalDaysEventsAreConcurrent
from
(
select t.*,DATEDIFF(day,in_date,out_date) as DateDifference
from test as t
) x
group by person_idThat returns --
id TotalDaysEventsAreConcurrent
1 74
2 288
74 is not correct because there are 6 days that the records overlap. Your solution will work,but I would have to do a little trickery. Is there a way to subtract the overlapping dates AND keep the datediff(day,in_date,out_date) if that do not overlap?
-
Wednesday, January 09, 2013 3:03 PM
Maybe this is what you mean:
;WITH CTE (rownumber, person_id, in_date, out_date) AS ( SELECT ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY in_date), personid, in_Date, out_date FROM dbo.Test ) SELECT f.rownumber, f.person_id, f.in_date, f.out_date, DATEDIFF(dd, f.in_date, f.out_date) - CASE WHEN s.in_date BETWEEN f.in_date AND f.out_date THEN DATEDIFF (dd, s.in_date, f.out_date) ELSE 0 END FROM CTE f LEFT JOIN CTE s ON ( f.rownumber = s.rownumber - 1 AND f.person_id = s.person_id );
Uwe Ricken
MCITP Database Administrator 2005
MCITP Database Administrator 2008
MCITP Microsoft SQL Server 2008, Database Development
db Berater GmbH
http://www-db-berater.de
SQL Server Blog (german only)- Marked As Answer by sqlservernoob Wednesday, January 09, 2013 8:17 PM
-
Wednesday, January 09, 2013 8:18 PMThis looks like exactly what I needed, and I can understand where it's coming from. I didn't know you could use a case statement like you did in this example. Learn something new everyday, thanks.
-
Thursday, January 10, 2013 1:00 AM
Your DDL is incomplete; I was able to use your narrative to do what you should have done for us. Remember that 80 to 95% of the work in SQL is in the DDL.
CREATE TABLE Test
(person_id INTEGER NOT NULL,
in_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY (person_id, in_date),
out_date DATE NOT NULL,
CHECK (in_date <= out_date));
INSERT INTO Test
VALUES (1, '2009-10-06', '2009-11-05'),
(1, '2009-11-03', '2009-12-03'), --this row's in_date is within the range of the previous rows
(1, '2007-12-26', '2007-12-31'),
(1, '2007-12-27', '2008-01-05'),
(2, '2008-08-28', '2008-09-27'),
(2, '2007-09-04', '2007-12-03'),
(2, '2007-03-08', '2007-04-08'),
(2, '2007-05-21', '2007-08-19');
The first row has a span of dates that encompasses the next row's. I'm looking for the number of days that these records [rows are not records] overlap.
Google “Calendar Tables”
SELECT T.person_id, C.cal_date
FROM Test AS T, Calendar AS C
WHERE C.cal_date BETWEEN in_date AND out_date
GROUP BY T.person_id, C.cal_date
HAVING COUNT(*) > 1; -- overlap count
1 2007-12-27
1 2007-12-28
1 2007-12-29
1 2007-12-30
1 2007-12-31
1 2009-11-03
1 2009-11-04
1 2009-11-05If your calendar has a julian sequence numer, you can use that to do the groupings with ROW_NUMBER() .
--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL
-
Thursday, January 10, 2013 1:09 AMWITH Overlaps (person_id, cal_date)
AS
(SELECT T.person_id, C1.cal_date
FROM Something_Tests AS T, Calendar AS C1
WHERE C1.cal_date BETWEEN in_date AND out_date
GROUP BY T.person_id, C1.cal_date
HAVING COUNT(*) > 1),-- overlap count
Groupings (person_id, cal_date, grp_nbr)
AS
(SELECT O.person_id, O.cal_date,
C2.julian_seq -
ROW_NUMBER() OVER (PARTITION BY O.person_id ORDER BY O.cal_date)
FROM Overlaps AS O, Calendar AS C2
WHERE C2.cal_date = O.cal_date)
SELECT person_id, MIN(cal_date), MAX(cal_date)
FROM Groupings
GROUP BY person_id, grp_nbr;--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

