none
Create group based on two columns value, order & closest match using T-SQL

    Question

  • I have requirement to identify top intersections.

    The table is similar to the below one:


    Declare @Location Table (R1Address Varchar(50),
                                            R2Address Varchar(50),
                                            IncidentID  int)


    Insert Into @Location Values('13 ST N','5 AVE N',341),
                                ('6 AVE','43 ST N',387),
                                ('5 AVE','13 ST N',324),
                                ('SOUTH RAILWAY ST SE','2 ST SE',675),
                                ('2 ST','SOUTH RAILWAY ST',534),
                                ('COLUMBIA BLVD','13 ST N',332)


    Select *
    From @Location

    R1Address R2Address IncidentID
    13 ST N 5 AVE N 341
    6 AVE 43 ST N 387
    5 AVE 13 ST N 324
    SOUTH RAILWAY ST SE 2 ST SE 675
    2 ST SOUTH RAILWAY ST 534
    COLUMBIA BLVD 13 ST N 332


    Now the requirements are:


    1. Create address based on alphabetic value of  R1, R2

         Example: For incident 341, the address will be 5 AVE N & 13 ST N

    2. Group rows based on new address and the closest match

         Example:  

      


    13 ST N 5 AVE N 341
    5 AVE 13 ST N 324


    Address                 IncidentCount

    5 AVE & 13 ST N    2 

    since 5 AVE N and 5 AVE are same street, so the closest address is 5 AVE & 13 ST N and the grouping will be

    based on this.


    The expected output is like below:

    Address                                IncidentCount
    5 AVE & 13 ST                       2                
    6 AVE & 43 ST N                    1
    2 ST & SOUTH RAILWAY S      2
    13 ST N & COLUMBIA BLVD    1 


    I understand this can be done via creating various staging tables, but before doing so thought it would be helpful to get ideas from many experts. Please also note that I just provided a small portion of the table which contains millions of rows.

    Also I am not much concerned about the query time as it will be a nightly job to process the data.


    Thanks in advance!





    • Edited by Taherul673 Wednesday, September 25, 2013 4:31 PM
    Wednesday, September 25, 2013 4:30 PM

Answers

  • I guess the following should be good.
    Declare @Location Table (R1Address Varchar(50),
                                             R2Address Varchar(50),
                                             IncidentID  int)
     
    
    Insert Into @Location Values('13 ST N','5 AVE N',341),
                                 ('6 AVE','43 ST N',387),
                                 ('5 AVE','13 ST N',324),
                                 ('SOUTH RAILWAY ST SE','2 ST SE',675),
                                 ('2 ST','SOUTH RAILWAY ST',534),
                                 ('COLUMBIA BLVD','13 ST N',332) 
    
    ;WITH CTE1
    AS
    (
       SELECT *
       FROM @Location A
       UNPIVOT(Addresses FOR Value IN(R1Address, R2Address)) UP
    ), CTE2 
    AS(
          SELECT A. *
                ,CASE
                      WHEN A.Addresses LIKE '% N' THEN LTRIM(RTRIM(REPLACE(A.Addresses, ' N','')))
                      WHEN A.Addresses LIKE '% SE' THEN LTRIM(RTRIM(REPLACE(A.Addresses, ' SE','')))
                   ELSE A.Addresses 
                END AS Trimmed
          FROM CTE1 A
          JOIN @Location B
          ON A.IncidentID = B.IncidentID
    ), CTE3 
    AS
    (
    
          SELECT *
                , ROW_NUMBER() OVER(PARTITION BY IncidentID ORDER BY Trimmed) AS rn
          FROM CTE2 
    ), CTE4
    AS
    (
       SELECT A.Trimmed + ' & ' + B.Trimmed AS Conc
       FROM    CTE3 A
       FULL OUTER JOIN CTE3 B
       ON    A.rn = B.rn + 1
       INNER JOIN @Location C
       ON A.IncidentID = C.IncidentID
       AND B.IncidentID = C.IncidentID
       WHERE A.Trimmed IS NOT NULL
       AND B.Trimmed IS NOT NULL
    )
    
    SELECT Conc, COUNT(*) AS AddressCount FROM CTE4
    GROUP BY Conc
    
    
    
    

    • Marked as answer by Taherul673 Wednesday, September 25, 2013 7:03 PM
    Wednesday, September 25, 2013 6:09 PM

All replies

  • I guess the following should be good.
    Declare @Location Table (R1Address Varchar(50),
                                             R2Address Varchar(50),
                                             IncidentID  int)
     
    
    Insert Into @Location Values('13 ST N','5 AVE N',341),
                                 ('6 AVE','43 ST N',387),
                                 ('5 AVE','13 ST N',324),
                                 ('SOUTH RAILWAY ST SE','2 ST SE',675),
                                 ('2 ST','SOUTH RAILWAY ST',534),
                                 ('COLUMBIA BLVD','13 ST N',332) 
    
    ;WITH CTE1
    AS
    (
       SELECT *
       FROM @Location A
       UNPIVOT(Addresses FOR Value IN(R1Address, R2Address)) UP
    ), CTE2 
    AS(
          SELECT A. *
                ,CASE
                      WHEN A.Addresses LIKE '% N' THEN LTRIM(RTRIM(REPLACE(A.Addresses, ' N','')))
                      WHEN A.Addresses LIKE '% SE' THEN LTRIM(RTRIM(REPLACE(A.Addresses, ' SE','')))
                   ELSE A.Addresses 
                END AS Trimmed
          FROM CTE1 A
          JOIN @Location B
          ON A.IncidentID = B.IncidentID
    ), CTE3 
    AS
    (
    
          SELECT *
                , ROW_NUMBER() OVER(PARTITION BY IncidentID ORDER BY Trimmed) AS rn
          FROM CTE2 
    ), CTE4
    AS
    (
       SELECT A.Trimmed + ' & ' + B.Trimmed AS Conc
       FROM    CTE3 A
       FULL OUTER JOIN CTE3 B
       ON    A.rn = B.rn + 1
       INNER JOIN @Location C
       ON A.IncidentID = C.IncidentID
       AND B.IncidentID = C.IncidentID
       WHERE A.Trimmed IS NOT NULL
       AND B.Trimmed IS NOT NULL
    )
    
    SELECT Conc, COUNT(*) AS AddressCount FROM CTE4
    GROUP BY Conc
    
    
    
    

    • Marked as answer by Taherul673 Wednesday, September 25, 2013 7:03 PM
    Wednesday, September 25, 2013 6:09 PM
  • This technique works for me. 

    Thanks Harsha!

    Wednesday, September 25, 2013 7:05 PM