This article will consider a simple classical Gaps & Islands problem asked recently in Transact-SQL Forum at MSDN with non original title "Query Help".


Problem Definition


The thread originator was kind enough to provide DDL of the table and some data to describe the task:

Create table T1
(Id int identity primary key,
VoucherNo varchar(4),
TransNo varchar(10)
)
 
Insert into T1 values ('V100','Trns1'),('V101','Trns1'),('V102','Trns1'),('V103','Trns1'),('V104','Trns1'),('V106','Trns1')

And he also provided the desired output:

TransNo  FirstVoucher  LastVoucher  Quantity 
 Trns1 V100 V104 5
 Trns1 V106 V106 1

The problem is to find consecutive vouchers (100-104, then 106).

Solution


As mentioned, this is a common problem in Transact SQL and it was described by Itzik Ben Gan here or by Plamen Ratchev in this easy to understand blog post
Refactoring Ranges. Knowing main idea of the solution it is easy to provide it assuming that all voucher numbers come in the following format (letter V following by the 3 digit number):

;WITH cte
AS (
    SELECT *
        ,CAST(SUBSTRING(VoucherNo, 2, 3) AS INT) - ROW_NUMBER() OVER (
            ORDER BY VoucherNo
            ) AS Grp
    FROM T1
    )
SELECT TransNo
    ,min(VoucherNo) AS FirstVoucherNo
    ,max(VoucherNo) AS LastVoucherNo
    ,count(*) AS Quantity
FROM cte
GROUP BY TransNo
    ,Grp

So, the idea of this solution is to group consecutive ranges first using ROW_NUMBER() function and then apply aggregate functions based on that group identifier.

Note, it is easy to modify this query to work with different formats of the voucher number (say, some combinations of letters following by any length number). This article is concentrating on the problem posted by the thread originator and is solving it for the particular voucher number format. You may want to see some modifications of my solution suggested by Ronen Ariely in the original thread. 

Another Example

I am here adding another example of Gaps and Islands problem that was asked in the T-SQL Forum "find the missing date from output data"
Here is the DDL and sample data.

CREATE TABLE Person
    (PersonID INT IDENTITY(1,1) PRIMARY KEY,
     PersonName varchar(3),
     SigninDate date
     );
  
INSERT INTO Person VALUES
('AAA','20130301'),
('AAA','20130302'),
('AAA','20130305'),
('AAA','20130306'),
('AAA','20130307'),
('BBB','20130301'),
('BBB','20130302'),
('BBB','20130303'),
('BBB','20130305'),
('BBB','20130307');

The task was to find the missing dates in the given data. So here is the expected result.

PersonName Date
AAA  2013-03-03
AAA 2013-03-04 
BBB  2013-03-04 
BBB  2013-03-06

So the first task was to group the data on the 'PersonName', assign a row number to identify each element and then find the gaps in each these groups.

/*
Solution for a *Pre* SQL 2012 version
The first CTE groups the PersonName and assign a group number and
a row identifier for each group
 */
WITH CTE_GROUPS AS
(SELECT *,
        DENSE_RANK() OVER (ORDER BY PersonName) AS GroupID,
        ROW_NUMBER() OVER(PARTITION BY PersonName ORDER BY SigninDate) RowNum
    FROM Person
),
/*
The second CTE will find the border values for the GAPs
*/
CTE_GAPS AS
(SELECT C2.PersonName,
       C2.SigninDate GapStart,
       C1.SigninDate GapEnd
FROM CTE_GROUPS C1 INNER JOIN CTE_GROUPS C2 ON
    C1.GroupID=C2.GroupID and C1.RowNum=C2.RowNum+1
    AND ISNULL(DATEDIFF(DAY,C2.Signindate,C1.SigninDate),0)>1
)
SELECT * FROM CTE_GAPS

Here is how you could do the same in SQL 2012 using LAG function

WITH CTE_GAPS AS
(SELECT Personname,
        LAG(P.SigninDate) Over(Partition by P.PersonName Order By P.PersonName,p.Signindate) GapStart ,
        SigninDate GapEnd,
        (DATEDIFF(DAY,LAG(P.SigninDate) Over(Partition by P.PersonName Order By P.PersonName,p.Signindate),P.SigninDate)-1) GapDays
FROM Person P)
 
SELECT  P.PersonName,
        C.CalendarDate
FROM MASTER..Calendar C CROSS JOIN CTE_GAPS P
WHERE
GapDays>0 AND
    C.CalendarDate
        BETWEEN DATEADD(DAY,1,P.GapStart ) AND DATEADD(DAY,-1,P.GapEnd)


See Also




This entry participated in the Technology Guru TechNet WiKi for July contest and won the gold prize.