none
consecutive rows sql server 2008

    Question

  • Hi,

    I need to find idno that have policies both with collector 1 followed by collector 2. I need to find the first occurence of each.

    If a certain IDNO has 2 occurrences of 1,2,1,2 i need to return only the first 1,2. If idno has only 1 or 2 then it's not relevant.

    I would appreciate the help please. Thanks :-)

    CREATE TABLE #Payments(
    	[ID] [int] IDENTITY NOT NULL,
    	[Policy] [int] NOT NULL,
    	[IdNo] [int] NOT NULL,
    	[DateCreated] [datetime] NULL)--2009-12-30 00:00:00.000
    	INSERT INTO #Payments VALUES(5667088,30043256,'2012-08-07') 
    	 
    	INSERT INTO #Payments VALUES (5634342,30043256,'2012-08-10') 
    	 
    	INSERT INTO #Payments VALUES (7854544,30043256,'2012-09-07') 
    	 
    	INSERT INTO #Payments VALUES (4345477,30043256,'2012-10-08') 
    	 
    	INSERT INTO #Payments VALUES (651754,27569106,'2012-09-07') 
    	 
    	INSERT INTO #Payments VALUES (651723,27569106,'2012-09-08') 
    	 
    	INSERT INTO #Payments VALUES (181754,0234199,'2009-12-27') 
    	 
    	INSERT INTO #Payments VALUES (655349,0234199,'2009-12-30') 
    	 
    	INSERT INTO #Payments VALUES (181074,2905645,'2012-09-07') 
    	 
    	INSERT INTO #Payments VALUES (123978,2905645,'2012-09-08') 
    CREATE TABLE #POLICIES(
    	[Policy] [int] NOT NULL,
    	[IdNo] [int] NOT NULL,
    	[Collector] [int] NOT NULL)
    	INSERT INTO #POLICIES VALUES (5667088,30043256,1) 
    	 
    	INSERT INTO #POLICIES VALUES (5634342,30043256,2) 
    	 
    	INSERT INTO #POLICIES VALUES (7854544,30043256,1) 
    	 
    	INSERT INTO #POLICIES VALUES (4345477,30043256,2) 
    	 
    	INSERT INTO #POLICIES VALUES (651754,27569106,1) 
    	 
    	INSERT INTO #POLICIES VALUES (651723,27569106,2) 
    	 
    	INSERT INTO #POLICIES VALUES (181754,0234199,1) 
    	 
    	INSERT INTO #POLICIES VALUES (655349,0234199,1) 
    	 
    	INSERT INTO #POLICIES VALUES (181074,2905645,2) 
    	 
    	INSERT INTO #POLICIES VALUES (123978,2905645,2) 

    output:

    IDNO

    Policy

    COLLECTOR

    DateCreated

    30043256

    5667088

    1

    2012-08-07

    30043256

    5634342

    2

    2012-08-10

    27569106

    651754

    1

    2012-09-07

    27569106

    651723

    2

    2012-09-08

    Monday, August 19, 2013 6:41 AM

Answers

  • Thanks for posting DDL and sample data.

    I wonder if we can have a mix of collectors per [IdNo] like (1, 2, 2)?

    In other words, we can have pairs (1, 2) or just groups of one unique collector (1, 1) or (2, 2). The question is important for the approach to find the pairs.

    If it is the case that we can't have partial groups then you can use Relational Division to identify the [IdNo] having both collectors.

    SELECT
    	IdNo
    FROM
    	#POLICIES
    WHERE
    	Collector IN (1, 2)
    GROUP BY
    	IdNo
    HAVING
    	MIN(Collector) <> MAX(Collector);

    Then you can use this result to pull rows for each one of these [IdNo], enumerate then to find the pairs and select the first group for each [IdNo].

    WITH T AS (
    SELECT
    	A.IdNo,
    	A.Policy,
    	B.Collector,
    	A.DateCreated,
    	((ROW_NUMBER() OVER(PARTITION BY A.IdNo ORDER BY A.DateCreated) - 1) / 2) + 1 AS grp
    FROM
    	(
    	SELECT
    		IdNo
    	FROM
    		#POLICIES
    	WHERE
    		Collector IN (1, 2)
    	GROUP BY
    		IdNo
    	HAVING
    		MIN(Collector) <> MAX(Collector)
    	) AS R
    	INNER JOIN
    	#Payments AS A
    	ON R.IdNo = A.IdNo
    	INNER JOIN
    	#POLICIES AS B
    	ON A.IdNo = B.IdNo
    	AND A.Policy = B.Policy
    )
    SELECT
        IdNo,
        Policy,
        Collector,
        DateCreated
    FROM
    	T
    WHERE
    	grp = 1
    ORDER BY
    	IdNo,
    	DateCreated;
    GO

    If the matching have to be enforced for rows with same [IdNo] and [DateCreated] is the immediate one after the one being processed, then it will get a little bit more complex because we will have to UNPIVOT the rows after the matching.

    WITH R AS (
    SELECT
    	A.IdNo,
    	A.Policy,
    	B.Collector,
    	A.DateCreated,
    	ROW_NUMBER() OVER(PARTITION BY A.IdNo ORDER BY A.DateCreated) AS rn
    FROM
    	#Payments AS A
    	INNER JOIN
    	#POLICIES AS B
    	ON A.IdNo = B.IdNo
    	AND A.Policy = B.Policy
    ),
    T AS (
    SELECT
        A.IdNo,
        A.Policy,
        A.Collector,
        A.DateCreated,
        B.Policy AS B_Policy,
        B.Collector AS B_Collector,
        B.DateCreated AS B_DateCreated,
    	ROW_NUMBER() OVER(PARTITION BY A.IdNo ORDER BY A.DateCreated) AS grp
    FROM
    	R AS A
    	INNER JOIN
    	R AS B
    	ON A.IdNo = B.IdNo
    	AND A.rn = B.rn - 1
    	AND A.Collector = 1 
    	AND B.Collector = 2	
    )
    SELECT
    	T.IdNo,
    	CASE WHEN Q.Collector = 1 THEN T.Policy ELSE T.B_Policy END AS Policy,
    	Q.Collector,
    	CASE WHEN Q.Collector = 1 THEN T.DateCreated ELSE B_DateCreated END AS DateCreated
    FROM
    	T
    	CROSS APPLY
    	(VALUES (1), (2)) AS Q(Collector)
    WHERE
    	grp = 1;
    GO

    As you can see, we reference the CTE twice in the last solution, so you could improve the performance materializing this result into a table and adding proper indexes to support future joins.


    AMB

    Some guidelines for posting questions...

    Monday, August 19, 2013 6:32 PM
    Moderator

All replies

  • Sorry , cannot test it right now, but is it possible to have ROW_NUMBER function partition  by IDNO and ORDER BY datecreated column (alias row_no) so then you can filter  for row_no<=2.

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance

    Monday, August 19, 2013 6:53 AM
    Answerer
  • Might not be the best way to write this, but it will give you the answer you want.  It uses the row_number function as suggested by Uri...

    select a.IdNo, a.Policy, a.Collector, a.DateCreated
    from
    (
    	select row_number() over(partition by p.idNO order by dateCreated) id,
    			p.IdNo, p.Policy, p.Collector, y.DateCreated
    	from #POLICIES p
    	join #Payments y
    	on p.Policy = y.Policy
    	and p.IdNo = y.IdNo
    	where p.idNO in
    	(
    		select IdNo
    		from #POLICIES
    		where Collector = 2
    	)
    	and p.IdNo in
    	(
    		select IdNo
    		from #POLICIES
    		where Collector = 1
    	)
    ) a
    where id < 3

    Thanks

    Monday, August 19, 2013 7:04 AM
  • Thanks. I need to check that collector=1 DateCreated is before the DateCreated value where collector=2

    Monday, August 19, 2013 8:55 AM
  • >> I need to find idno that have policies both with collector 1 followed by collector 2. I need to find the first occurrence of each.<<

    You need to read a book on basic data modeling. Why do you think that “id_no” is a clear precise data element name? Your “policy_<something>” is useless; the policy amount, a type code? the number?. We do no use IDENTITY; why should the number of physical insertion attempt to one disk be part of a logical model? Why do you have no keys? How do you have a table of payments without any money amounts in it? Where is the payment date? Did you know we have a DATE datga type? 

    I will guess your generic “policy” is the “policy_nbr”, but it would never be an integer. You do not do math on identifiers. 

    If a certain IDNO has 2 occurrences of 1, 2, 1, 2 I need to return only the first 1, 2. If idno has only 1 or 2 then it's not relevant.

    >> I would appreciate the help please. <<

    My guess at a schema would have been: 

    CREATE TABLE Payments
    (policy_nbr CHAR(8) NOT NULL,
     REFERENCES Policies(policy_nbr), 
     payment_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
     PRIMARY KEY (policy_nbr, payment_date),
     payment_amt DECIMAL (12,2) NOT NULL
     CHECK (payment_amt > 0.00));

    The Payments tables references the policies and shows when and how much. 

    CREATE TABLE Policies
    (policy_nbr CHAR(8) NOT NULL PRIMARY KEY
     collector_emp_id CHAR(10) NOT NULL
     REFERENCES Personnel (emp_id));

    I guess that your “collector” is a person assigned to a policy. Instead of trying for a kludge, why not do it right?



    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Monday, August 19, 2013 3:06 PM
  •  The tables are in use at a large organisation and have been used for many years before I came along a few weeks ago. They contain important information so of course I am not changing anything and have to work with what I have. The primary key is policy on table Policies. 

    Why can't a primary identity key be an integer? I don't get your point. Collector is not a person.


    Monday, August 19, 2013 3:54 PM
  • Thanks for posting DDL and sample data.

    I wonder if we can have a mix of collectors per [IdNo] like (1, 2, 2)?

    In other words, we can have pairs (1, 2) or just groups of one unique collector (1, 1) or (2, 2). The question is important for the approach to find the pairs.

    If it is the case that we can't have partial groups then you can use Relational Division to identify the [IdNo] having both collectors.

    SELECT
    	IdNo
    FROM
    	#POLICIES
    WHERE
    	Collector IN (1, 2)
    GROUP BY
    	IdNo
    HAVING
    	MIN(Collector) <> MAX(Collector);

    Then you can use this result to pull rows for each one of these [IdNo], enumerate then to find the pairs and select the first group for each [IdNo].

    WITH T AS (
    SELECT
    	A.IdNo,
    	A.Policy,
    	B.Collector,
    	A.DateCreated,
    	((ROW_NUMBER() OVER(PARTITION BY A.IdNo ORDER BY A.DateCreated) - 1) / 2) + 1 AS grp
    FROM
    	(
    	SELECT
    		IdNo
    	FROM
    		#POLICIES
    	WHERE
    		Collector IN (1, 2)
    	GROUP BY
    		IdNo
    	HAVING
    		MIN(Collector) <> MAX(Collector)
    	) AS R
    	INNER JOIN
    	#Payments AS A
    	ON R.IdNo = A.IdNo
    	INNER JOIN
    	#POLICIES AS B
    	ON A.IdNo = B.IdNo
    	AND A.Policy = B.Policy
    )
    SELECT
        IdNo,
        Policy,
        Collector,
        DateCreated
    FROM
    	T
    WHERE
    	grp = 1
    ORDER BY
    	IdNo,
    	DateCreated;
    GO

    If the matching have to be enforced for rows with same [IdNo] and [DateCreated] is the immediate one after the one being processed, then it will get a little bit more complex because we will have to UNPIVOT the rows after the matching.

    WITH R AS (
    SELECT
    	A.IdNo,
    	A.Policy,
    	B.Collector,
    	A.DateCreated,
    	ROW_NUMBER() OVER(PARTITION BY A.IdNo ORDER BY A.DateCreated) AS rn
    FROM
    	#Payments AS A
    	INNER JOIN
    	#POLICIES AS B
    	ON A.IdNo = B.IdNo
    	AND A.Policy = B.Policy
    ),
    T AS (
    SELECT
        A.IdNo,
        A.Policy,
        A.Collector,
        A.DateCreated,
        B.Policy AS B_Policy,
        B.Collector AS B_Collector,
        B.DateCreated AS B_DateCreated,
    	ROW_NUMBER() OVER(PARTITION BY A.IdNo ORDER BY A.DateCreated) AS grp
    FROM
    	R AS A
    	INNER JOIN
    	R AS B
    	ON A.IdNo = B.IdNo
    	AND A.rn = B.rn - 1
    	AND A.Collector = 1 
    	AND B.Collector = 2	
    )
    SELECT
    	T.IdNo,
    	CASE WHEN Q.Collector = 1 THEN T.Policy ELSE T.B_Policy END AS Policy,
    	Q.Collector,
    	CASE WHEN Q.Collector = 1 THEN T.DateCreated ELSE B_DateCreated END AS DateCreated
    FROM
    	T
    	CROSS APPLY
    	(VALUES (1), (2)) AS Q(Collector)
    WHERE
    	grp = 1;
    GO

    As you can see, we reference the CTE twice in the last solution, so you could improve the performance materializing this result into a table and adding proper indexes to support future joins.


    AMB

    Some guidelines for posting questions...

    Monday, August 19, 2013 6:32 PM
    Moderator
  • Thanks Hunchback. Yea, I need only IDNO that have at least one occurrence of 1 and 2. From the sample data  it seems to be doing just what I need. When i go back to work in a few days I will check it against the real data and make sure I understand the query :-)

    If possible can you please explain why you used CROSS APPLY :-)

    Thanks 

    Monday, August 19, 2013 7:50 PM
  • > If possible can you please explain why you used CROSS APPLY :-)

    To do the UNPIVOT or split the matching pair. Same could be accomplished by doing a CROSS JOIN.

    if two rows matched our joining condition then we get:

    A.idno, A.policy, A.collector, A.datecreated, B.policy, B.collector, B.datecreated

    in order to split it, I cross apply this to a two rows set and now we can have two rows in the result:

    A.idno, A.policy, A.collector, A.datecreated
    A.idno, B.policy, B.collector, B.datecreated

    Got it?


    AMB

    Some guidelines for posting questions...

    Monday, August 19, 2013 8:56 PM
    Moderator
  • >> Why can't a primary IDENTITY key [sic] be an integer? << 

    The term PRIMARY KEY makes sense and has a definition. But what is a “primary IDENTITY key”?  That invented term makes no sense in RDBMS. 

    IDENTITY is a table property. It is not a column; if it was a column, then I could have more than one of that type. It is not an integer – you cannot do math on it. It is totally non-relational and local to hardware.

    IDENTITY is a count of the physical insertion attempts (not even successes) to one particular physical table on one particular physical disk using one particular SQL product. 

    Incompetent non-SQL programmers will use IDENTITY as key to mimic pointer chains and record numbers. But we know that a key is a subset of the attributes of the entities in a valid data model. 

    >> Collector is not a person. <<

    Yes, it is what ISO-11179 calls an attribute. It cannot be used by itself and it requires an attribute property. We need it to be “collector_<something>” to be valid. The “_<something>” can be things like “_code”, “_id”, “_location”, etc. to make it precise. 

    It sounds like you will be dealing with "family curse" and not "legacy code" that is in bad shape. And you do not have the RDBMS background for this kind of repair work.  Good luck! 


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Tuesday, August 20, 2013 2:04 AM