Merge outputting more records than what's in the first table RRS feed

  • Question

  • I have a table A with say 100 transaction records with Client ID being one of the columns. In table B, I have list of Client IDs and the Business Unit and some other fields. 

    There is a possibility that table B may have the same Client ID repeated twice. 

    If I had done a vlookup in excel, it would have just picked the first one it found. However, if I use Merge in Power Query , it ends up giving me 2 records resulting in more than 100 records in the final output.

    I can't do an inner join as there is a possibility that a Client ID in table A is not in Table B.  

    Is there a way I prevent that from happening?

    • Edited by VivDev Thursday, October 1, 2015 11:34 PM
    Thursday, October 1, 2015 11:33 PM


All replies

  • In the "Table B" Query, I would use the "Group By" command. e.g. you could Group By just "Client ID", then use the Max Operation to get the other columns you need.

    Friday, October 2, 2015 2:38 AM
  • You could also remove duplicates in Table B before joining.

    Imke Feldmann

    Friday, October 2, 2015 5:10 AM
  • I considered Remove Duplicates, but I think its a bit dangerous in this scenario.  It might work with today's data, but as soon as any row in table B has a different value for any column (with the same Client ID), you will get duplicates again.
    Friday, October 2, 2015 6:27 AM
  • Don't know if I understood you right, but what I meant was to perform the remove duplicates step as the last step on MergeTableB. So that it will always be performed before the join takes place.

    In effect, you don't join with table B in it's original form, but in a modified form where dups will be removed first.

    Imke Feldmann

    Friday, October 2, 2015 6:38 AM
  • I'm imagining a scenario where Client ID #1 has 2 rows in Table B.  One row has Business Unit = ABC and the 2nd row has Business Unit = DEF. 

    Assuming you Expand to add the Business Unit column after the Merge, then Remove Duplicates will leave you with duplicated rows for Client ID #1 - the OPs original issue.

    Friday, October 2, 2015 8:31 AM
  • Thanks. I ended up removing the columns I didn't need and used remove duplicates. Group By is a good alternative too but for my data scenario removing duplicates was good enough. 
    Friday, October 2, 2015 5:25 PM