Find consecutive days from overlapping date ranges to keep from double counting them in a DATEDIFF

Answered Find consecutive days from overlapping date ranges to keep from double counting them in a DATEDIFF

  • Wednesday, January 09, 2013 1:55 PM
     
     

    Consider a table this script to that produces a table

    create table test
    (personID int,in_Date date, out_Date date

    constraint pk_person_ID_in_Date primary key (personID,in_Date))
    insert into test
    values (1,'2009-10-06','2009-11-05')
          ,(1,'2009-11-03','2009-12-03')--this row's in_Date is within the range of the previous rows
          ,(1,'2007-12-26','2007-12-31')
          ,(1,'2007-12-27','2008-01-05')
          ,(2,'2008-08-28','2008-09-27')
          ,(2,'2007-09-04','2007-12-03')
          ,(2,'2007-03-08','2007-04-08')
          ,(2,'2007-05-21','2007-08-19')

    The first row has a span of dates that encompasses the next row's.  I'm looking for the number of days that these records' dates overlap. 

    In the first row this would only be four days (12-27 through 12-31).  I've seen a solution something similar (on this site and others) to 


    select 
     person_ID
    ,MIN(in_date)
    ,MAX(out_date)
    ,DATEDIFF(day,min(in_date),MAX(out_Date))
    from
    (
    select 
     t.person_ID
    ,t.in_Date
    ,t.out_Date
    ,DATEADD(day,-row_number() over(partition by person_id order by in_Date),out_date) as groupDate
    from test as t
    ) as x
    group by person_id,groupDate
    order by person_ID

    this does NOT return the correct answer, but it's meandering its way toward it. Could someone help me a.) find a solution, and b.) help me understand what's going on when you group by a date range

    ***EDIT the question has been edited to correctly express my intent.

    I would like to find the DATEDIFF(day,in_Date,out_Date) of every row, but NOT include the overlapping date ranges. to keep from double counting them.



All Replies

  • Wednesday, January 09, 2013 2:08 PM
     
      Has Code

    Hi,

    this may work for you:

    ;WITH CTE (rownumber, person_id, in_date, out_date)
    AS
    (
    SELECT	ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY in_date),
    	personid,
    	in_Date,
    	out_date
    FROM	dbo.Test
    )
    SELECT f.rownumber,
    	f.person_id,
    	f.in_date,
    	f.out_date,
    	CASE WHEN s.in_date BETWEEN f.in_date AND f.out_date
    		 THEN DATEDIFF (dd, s.in_date, f.out_date)
    		 ELSE 0
    	END
    FROM CTE f LEFT JOIN CTE s
    ON	(
    		f.rownumber = s.rownumber - 1 AND
    		f.person_id = s.person_id
    	);

    If you have SQL 2012 have a look to LEAD and LAG which may cover the problem in a much easier way than CTE!


    Uwe Ricken

    MCITP Database Administrator 2005
    MCITP Database Administrator 2008
    MCITP Microsoft SQL Server 2008, Database Development

    db Berater GmbH
    http://www-db-berater.de
    SQL Server Blog (german only)



  • Wednesday, January 09, 2013 2:10 PM
     
     
    @ Uwe Could you please explain a little bit of your methodology here?
    • Edited by sqlservernoob Wednesday, January 09, 2013 2:11 PM clarity
    •  
  • Wednesday, January 09, 2013 2:13 PM
     
     
    @Uwe, I don't think I made myself clear enough in the question title.  What I meant to convey is I need the DATEDIFF of the in_date and out_date, but do not double count the overlapping days.  
  • Wednesday, January 09, 2013 2:20 PM
     
     Proposed Answer

    Hallo sqlservernoob,

    i'm using a CTE (common table expression) which groups the results by person_id and add a rownumber for each group. So result is 1, 2, 3, ... for each person_id. Furthermore the resultset for each person_id will be sorted by the in_date (which is essential for the numbering.

    The seond part of the cte "uses" the result of cte by using a "self join" of the cte. The join is using the rownumber as JOIN and the person_id. The special thing with the rownumber is that i join the rownumbers not x = x but x = x - 1.

    The alias f is for "first" and the alias s is for "second". (e.g. 1 = 2 - 1)

    The last is to compare the in_date of the second row with the in_date and the out_date of the first row. If the in_date is in between i calculate the differences. If not it is 0.

    Get more details concerning CTE, JOIN, DATEDIFF here:

    CTE: http://msdn.microsoft.com/en-us/library/ms175972.aspx
    JOIN: http://msdn.microsoft.com/en-us/library/ms191517(v=sql.105).aspx
    DATEDIFF: http://msdn.microsoft.com/en-us/library/ms189794.aspx


    Uwe Ricken

    MCITP Database Administrator 2005
    MCITP Database Administrator 2008
    MCITP Microsoft SQL Server 2008, Database Development

    db Berater GmbH
    http://www-db-berater.de
    SQL Server Blog (german only)

  • Wednesday, January 09, 2013 2:41 PM
     
     

    @Uwe, I don't think I made myself clear enough in the question title.  What I meant to convey is I need the DATEDIFF of the in_date and out_date, but do not double count the overlapping days.  

    But that's what my solution will show. e.g. you have an overlap of 4 days

    1: (1,'2007-12-26','2007-12-31')
    2: (1,'2007-12-27','2008-01-05')

    If you check my query you'll see a diff of 4 days because record 2 overlaps record 1 for 4 days.


    Uwe Ricken

    MCITP Database Administrator 2005
    MCITP Database Administrator 2008
    MCITP Microsoft SQL Server 2008, Database Development

    db Berater GmbH
    http://www-db-berater.de
    SQL Server Blog (german only)

  • Wednesday, January 09, 2013 2:56 PM
     
     

    Consider this query

    select person_id,SUM(datedifference) as TotalDaysEventsAreConcurrent
    from
    (
    select t.*,DATEDIFF(day,in_date,out_date) as DateDifference
    from test as t
    ) x
    group by person_id

    That returns --

    id  TotalDaysEventsAreConcurrent

    1  74

    2  288

    74 is not correct because there are 6 days that the records overlap.  Your solution will work,but I would have to do a little trickery.  Is there a way to subtract the overlapping dates AND keep the datediff(day,in_date,out_date) if that do not overlap?

  • Wednesday, January 09, 2013 3:03 PM
     
     Answered Has Code

    Maybe this is what you mean:

    ;WITH CTE (rownumber, person_id, in_date, out_date)
    AS
    (
    SELECT	ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY in_date),
    	personid,
    	in_Date,
    	out_date
    FROM	dbo.Test
    )
    SELECT f.rownumber,
    	f.person_id,
    	f.in_date,
    	f.out_date,
    	DATEDIFF(dd, f.in_date, f.out_date) - 
    	CASE WHEN s.in_date BETWEEN f.in_date AND f.out_date
    		 THEN DATEDIFF (dd, s.in_date, f.out_date)
    		 ELSE 0
    	END
    FROM CTE f LEFT JOIN CTE s
    ON	(
    		f.rownumber = s.rownumber - 1 AND
    		f.person_id = s.person_id
    	);


    Uwe Ricken

    MCITP Database Administrator 2005
    MCITP Database Administrator 2008
    MCITP Microsoft SQL Server 2008, Database Development

    db Berater GmbH
    http://www-db-berater.de
    SQL Server Blog (german only)

    • Marked As Answer by sqlservernoob Wednesday, January 09, 2013 8:17 PM
    •  
  • Wednesday, January 09, 2013 8:18 PM
     
     
    This looks like exactly what I needed, and I can understand where it's coming from.  I didn't know you could use a case statement like you did in this example. Learn something new everyday, thanks.
  • Thursday, January 10, 2013 1:00 AM
     
     

    Your DDL is incomplete; I was able to use your narrative to do what you should have done for us. Remember that 80 to 95% of the work in SQL is in the DDL. 

    CREATE TABLE Test
    (person_id INTEGER NOT NULL, 
     in_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL, 
     PRIMARY KEY (person_id, in_date),
     out_date DATE NOT NULL, 
     CHECK (in_date <= out_date));


    INSERT INTO Test
    VALUES (1, '2009-10-06', '2009-11-05'), 
          (1, '2009-11-03', '2009-12-03'), --this row's in_date is within the range of the previous rows
          (1, '2007-12-26', '2007-12-31'), 
          (1, '2007-12-27', '2008-01-05'), 
          (2, '2008-08-28', '2008-09-27'), 
          (2, '2007-09-04', '2007-12-03'), 
          (2, '2007-03-08', '2007-04-08'), 
          (2, '2007-05-21', '2007-08-19');

    The first row has a span of dates that encompasses the next row's.  I'm looking for the number of days that these records [rows are not records] overlap. 

    Google “Calendar Tables”

    SELECT T.person_id, C.cal_date               
      FROM Test AS T, Calendar AS C
     WHERE C.cal_date BETWEEN in_date AND out_date 
     GROUP BY T.person_id, C.cal_date
    HAVING COUNT(*) > 1; -- overlap count

    1 2007-12-27
    1 2007-12-28
    1 2007-12-29
    1 2007-12-30
    1 2007-12-31
    1 2009-11-03
    1 2009-11-04
    1 2009-11-05

    If your calendar has a julian sequence numer, you can use that to do the groupings with ROW_NUMBER() .


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

  • Thursday, January 10, 2013 1:09 AM
     
     
    WITH Overlaps (person_id, cal_date)
    AS
    (SELECT T.person_id, C1.cal_date
       FROM Something_Tests AS T, Calendar AS C1
     WHERE C1.cal_date BETWEEN in_date AND out_date 
     GROUP BY T.person_id, C1.cal_date
    HAVING COUNT(*) > 1),-- overlap count

    Groupings (person_id, cal_date, grp_nbr)
    AS
    (SELECT O.person_id, O.cal_date,
           C2.julian_seq -
           ROW_NUMBER() OVER (PARTITION BY O.person_id ORDER BY O.cal_date)
      FROM Overlaps AS O, Calendar AS C2
     WHERE C2.cal_date = O.cal_date)

     SELECT person_id, MIN(cal_date), MAX(cal_date)
       FROM Groupings
      GROUP BY person_id, grp_nbr;

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL