locked
Combining duplicate entries RRS feed

  • Question

  • I have a .csv file that lists customer email addresses, and their associated customer ID.  Some of the customers have ended up with multiple Customer ID's related to their email address though, which are currently listed on separate lines as follows.

        customer1@email.com      ID001
        customer2@email.com      ID002
        customer2@email.com      ID003
        customer3@email.com      ID004

    I want to import this info into powershell, then combine all the IDs from duplicated email addresses onto one line, so they look like the following.  Preferrably, I would like the output to have the IDs in separate cells once opened in Excel, as opposed to all in the same cell.

        customer1@email.com      ID001
        customer2@email.com      ID002, ID003
        customer3@email.com      ID004

    Obviously the first step will be to import the csv to a variable, but from there I'm not sure how to go about this.  Specifically, I'm not sure how to have Powershell look for a duplicate for each entry, and then combine the ID field.  There's also over 300k records in this csv file, so there's really no way to do this manually.

    I'm not asking for this to be written for me, but if someone could point me in the right direction (cmdlets to look at, etc), that should get me started.

    Thanks,

    Andrew


    • Edited by musik779 Tuesday, February 13, 2018 7:05 PM Formatting
    Tuesday, February 13, 2018 7:04 PM

Answers

  • wrong tool,

    use excel und pivot

    or

    first you must import data in powershell

    get-content or import-csv

    next you should sort your data 

    then you need a foreach

    remember you email 

    if It same as last add ID to a variable


    Chris



    • Edited by -- Chris -- Tuesday, February 13, 2018 7:48 PM
    • Marked as answer by musik779 Wednesday, February 14, 2018 7:00 PM
    Tuesday, February 13, 2018 7:37 PM

All replies

  • wrong tool,

    use excel und pivot

    or

    first you must import data in powershell

    get-content or import-csv

    next you should sort your data 

    then you need a foreach

    remember you email 

    if It same as last add ID to a variable


    Chris



    • Edited by -- Chris -- Tuesday, February 13, 2018 7:48 PM
    • Marked as answer by musik779 Wednesday, February 14, 2018 7:00 PM
    Tuesday, February 13, 2018 7:37 PM
  • Maybe there's some way with group-object?

    Tuesday, February 13, 2018 10:52 PM
  • Import-CSv <your csv> | group email | where{$_.Count -gt 1}


    \_(ツ)_/

    Wednesday, February 14, 2018 1:47 AM
  • Maybe like this.


    import-csv group.csv | group email | select name,@{n='id';e={$_.group.id}}

    Name id
    ---- --
    customer1@email.com ID001
    customer2@email.com {ID002, ID003}
    customer3@email.com ID004



    Wednesday, February 14, 2018 3:53 AM
  • JS2010

    cool solution. If I test it I became a wrong output

    email;id
    one;a1
    two;b1
    two;b2
    two;b3
    three;c1
    three;c1
    four;d1

    what is wrong? your column name is "Name" and you Group by "email"?

    PS H:\> import-csv c:\temp\group.csv | group email | select name,@{n='id';e={$_.group.id}}

    Name id                            
    ---- --                            
         {$null, $null, $null, $null...}



    Chris


    • Edited by -- Chris -- Thursday, February 15, 2018 8:14 AM
    Thursday, February 15, 2018 8:13 AM
  • The default delimiter in a csv is a comma, not a semicolon.


    • Edited by JS2010 Thursday, February 15, 2018 12:41 PM
    • Proposed as answer by -- Chris -- Thursday, February 15, 2018 1:36 PM
    Thursday, February 15, 2018 12:40 PM
  • import-csv c:\temp\group.csv -delim ';' | group email | select name,@{n='id';e={$_.group.id}}

    \_(ツ)_/

    • Proposed as answer by -- Chris -- Thursday, February 15, 2018 1:36 PM
    Thursday, February 15, 2018 12:46 PM
  • thxs all

    newbee question,

    the property Name for the first colume is "email" but the select name is "Name"? why


    Chris

    Thursday, February 15, 2018 1:36 PM
  • The group (group-object) cmdlet outputs count, name and group properties.  If you run each cmdlet seperately, you can see what they do.


    PS C:\Users\me> import-csv group.csv -delim ';' | group email

    Count Name Group
    ----- ---- -----
    1 one {@{email=one; id=a1}}
    3 two {@{email=two; id=b1}, @{email=two; id=b2}, @{email=two; id=b3}}
    2 three {@{email=three; id=c1}, @{email=three; id=c1}}
    1 four {@{email=four; id=d1}}






    • Edited by JS2010 Thursday, February 15, 2018 4:04 PM
    Thursday, February 15, 2018 1:49 PM