none
How can I use a union to query the same table utilizing two queries with different matching criteria?

    Question

  • Hey Everyone,

    I'm trying to pull data from a table matching on two different sets of criteria.

    Criteria 1: Match on FirstName, LastName, and ServiceAreaID

    Criteria 2: Match on ServiceAreaID and PrimaryEmail

    How would I combine the results of both sets of criteria while eliminating duplicates? I am trying to retrieve as many ContactIDs as possible

    My query right now is as follows:

    Select ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
    From Spreadsheet S
    Left Outer Join Contacts C
    on C.FirstName=S.FirstName
        and C.LastName=S.LastName
        and C.ServiceAreaID=S.ServiceAreaID
    
    Union
    
    Select ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
    From Spreadsheet S
    Left Outer Join Contacts C
        on C.PrimaryEmail=S.PrimaryEmail
        and C.ServiceAreaID=S.ServiceAreaID
    My union query is problematic as I get duplicate rows due to 1 criteria possibly retrieving a NULL value while the other criteria may or may not retrieve a non-NULL value. As a result, both rows are possibly returned. The only Nulls I need are those that both criteria 1 and criteria 2 were unable to match. Any help would be greatly appreciated. Would it be better doing this through the use of a CTE? Thanks everyone

    Tuesday, February 26, 2013 1:12 AM

Answers

  • If all you care about is retrieving the distinct ContactID, then you can just do the Union on only the ContactID:

    Select ContactID
    From Spreadsheet S
    Left Outer Join Contacts C
    on C.FirstName=S.FirstName
       
    and C.LastName=S.LastName
       
    and C.ServiceAreaID=S.ServiceAreaID

    Union

    Select ContactID
    From Spreadsheet S
    Left Outer Join Contacts C on C.PrimaryEmail=S.PrimaryEmail and C.ServiceAreaID=S.ServiceAreaID

    However, it seems that what you are trying to do is perhaps merge those two data sets together such that if it's Null in one but not the other, then take the one that is not Null.  If they are both Null, then leave it as Null.  If this is the case, try something as follow:

    Select ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
    INTO #T1
    From Spreadsheet S
    Left Outer Join Contacts C
    on C.FirstName=S.FirstName
       
    and C.LastName=S.LastName
       
    and C.ServiceAreaID=S.ServiceAreaID

    Select ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
    INTO #T2
    From
    Spreadsheet S
    Left Outer Join Contacts C
    on C.PrimaryEmail=S.PrimaryEmail
    and C.ServiceAreaID=S.ServiceAreaID

    SELECT ContactID, COALESCE(T1.FirstName, T2.FirstName), COALESCE(T1.LastName, T2.LastName), COALESCE(T1.PrimaryEmail, T2.PrimaryEmail), COALESCE(T1.ServiceAreaID, T2.ServiceAreaID)
    FROM #T1 AS T1
    LEFT JOIN #T2 AS T2 ON T1.ContactID = T2.ContactID

    That is, assuming that ContactID is the primary key and you want T1 to always take precedence (otherwise, COALESCE T2 first so that T2 takes precedence).

    Hope this helps!

    • Marked as answer by Migince Tuesday, February 26, 2013 2:47 PM
    Tuesday, February 26, 2013 1:37 AM

All replies

  • If all you care about is retrieving the distinct ContactID, then you can just do the Union on only the ContactID:

    Select ContactID
    From Spreadsheet S
    Left Outer Join Contacts C
    on C.FirstName=S.FirstName
       
    and C.LastName=S.LastName
       
    and C.ServiceAreaID=S.ServiceAreaID

    Union

    Select ContactID
    From Spreadsheet S
    Left Outer Join Contacts C on C.PrimaryEmail=S.PrimaryEmail and C.ServiceAreaID=S.ServiceAreaID

    However, it seems that what you are trying to do is perhaps merge those two data sets together such that if it's Null in one but not the other, then take the one that is not Null.  If they are both Null, then leave it as Null.  If this is the case, try something as follow:

    Select ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
    INTO #T1
    From Spreadsheet S
    Left Outer Join Contacts C
    on C.FirstName=S.FirstName
       
    and C.LastName=S.LastName
       
    and C.ServiceAreaID=S.ServiceAreaID

    Select ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
    INTO #T2
    From
    Spreadsheet S
    Left Outer Join Contacts C
    on C.PrimaryEmail=S.PrimaryEmail
    and C.ServiceAreaID=S.ServiceAreaID

    SELECT ContactID, COALESCE(T1.FirstName, T2.FirstName), COALESCE(T1.LastName, T2.LastName), COALESCE(T1.PrimaryEmail, T2.PrimaryEmail), COALESCE(T1.ServiceAreaID, T2.ServiceAreaID)
    FROM #T1 AS T1
    LEFT JOIN #T2 AS T2 ON T1.ContactID = T2.ContactID

    That is, assuming that ContactID is the primary key and you want T1 to always take precedence (otherwise, COALESCE T2 first so that T2 takes precedence).

    Hope this helps!

    • Marked as answer by Migince Tuesday, February 26, 2013 2:47 PM
    Tuesday, February 26, 2013 1:37 AM
  • ;with mycte as
    (SELECT ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
    ,row_number() OVER (partition BY ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID
     ORDER BY ContactID,FirstName, LastName  ) AS rn
    FROM Spreadsheet S
    LEFT OUTER JOIN Contacts C
    ON ((C.FirstName=S.FirstName  AND C.LastName=S.LastName) OR C.PrimaryEmail=S.PrimaryEmail)
        AND C.ServiceAreaID=S.ServiceAreaID
    )
    
     SELECT ContactID, FirstName, LastName, PrimaryEmail, ServiceAreaID  FROM mycte 
    	WHERE rn=1

    Tuesday, February 26, 2013 1:42 AM
  • That did the trick. I had to add Coalesece(T1.ContactID, T2.ContactID) as I was getting an 'ambiguous column name' message. Other than that it worked great. Thank for your help.

    Tuesday, February 26, 2013 2:49 PM