none
Reading Two CSV Files and Compare RRS feed

  • Question

  • Hi all -

     

    I have a bit of a problem that I'm lookign for a simple, if not elegant solution.

    I will be running a DB2 SQL script to putput a list of custmers to a .CSV file.   This CSV file will have a CustomerID and CustomerName column. 

    I will then be running another DB2 SQL script on another server against a different database to generate a different .CSV file that name details that references the CustomerID among other things.

    The issue today is that the CustomerID looks like a GUID.   As a result, I have to scan through the details CSV file, look at the Customer ID GUID, then compare against the customer file and find the same GUID to know the customer name.  This is important to know while I do some conversion verification.

    I was thinking it would be a simple task to loop thru the details CSV file and for each row, loop thru the customers CSV file and compare for a match on CustomerID.  If there is a match, I will print out a line including fields form both the details and customer CSV files as a single row (to be quickly examined after the script runs.)   If there is no match or no matching customerid, then the row would be skipped and the next record read.

    Can this easily be done with Powershell?   Does anyone have an example they can share?

    Any help or direction would be appreciated.

    Wednesday, June 22, 2011 5:31 PM

Answers

  • You're using the same GUID for each customer.

    No, look closely at the first char of each of GUID’s last group in the first record set (I’ve highlighted it), they are different.

    …I'm getting errors because of null

    In this version I added Angie to the first record set so we would get a Null when matching it to the second record set. To handle (ignore) this error just add the SilentlyContinue Error Action to the Add-Member statement (mind word wrapping).

    @'

    "CustomerID","CustomerName"

    "29f87f04-3022-4e80-87f2-8d2af6946685","Angie"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Bob"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Jane"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Zack"

    '@ | Set-Content .\dummy1.csv

     

    @'

    "CustomerID","City","OrderID"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","7765"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","8722"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","5342"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","7654"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","3456"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","5807"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","0977"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","0977"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","1538"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","5438"

    '@ | Set-Content .\dummy2.csv

     

    $csv = Import-Csv .\dummy1.csv

    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable

    $csv | ForEach-Object {

     $custID = $_.CustomerID

     $ht.$custID |

      Add-Member NoteProperty CustomerName $_.CustomerName -PassThru -ErrorAction SilentlyContinue

    } | Select-Object CustomerName, CustomerID, City, OrderID

     


      Robert Robelo  
    • Marked as answer by dm32813281 Thursday, June 23, 2011 2:10 AM
    Wednesday, June 22, 2011 9:49 PM
  • To avoid errors let’s check first if the record from the first set exists in the second, then we add the members.

    You can use the Format-Table Cmdlet to force a table display.

    Give this a shot:

    @'

    "CustomerID","CustomerName","DOB"

    "29f87f04-3022-4e80-87f2-8d2af6946685","Angie",06/27/1968"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Bob","01/02/2000"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Jane","04/02/1975"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Zack","03/20/1978"

    '@ | Set-Content .\dummy1.csv

     

    @'

    "CustomerID","City","OrderID"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","7765"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","8722"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","5342"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","7654"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","3456"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","5807"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","0977"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","0977"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","1538"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","5438"

    '@ | Set-Content .\dummy2.csv

     

    $csv = Import-Csv .\dummy1.csv

    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable

    $csv | ForEach-Object {

     # append members if record from 1st record set exists in 2nd record set

     if ($ht.ContainsKey($_.CustomerID)) {

      $ht.($_.CustomerID) |

       Add-Member NoteProperty CustomerName $_.CustomerName -PassThru |

       Add-Member NoteProperty DOB $_.DOB -PassThru

     }

    } | Format-Table -AutoSize

    # or if you want the fields in a certain order

    # } | Format-Table CustomerName, CustomerID, DOB, City, OrderID -AutoSize

     


      Robert Robelo  
    • Marked as answer by dm32813281 Thursday, June 23, 2011 3:05 PM
    Thursday, June 23, 2011 12:03 PM

All replies

  • Are these small files?  If they are both very large, it would not be good to scan one of the files for every record of the other file.  In that case, it might be best to load the data into SQL Server and do a join.  Or perhaps the data from one file will fit into memory so that it can be scanned quickly many times. 

    The first step is probably to determine regex patterns to get the GUID from both files.  Then perhaps an outer loop to process each record of one file (Select-String -File).  An inner loop (Select-String -File -Pattern) to get matching records from the other file. 


    Randy in Marin
    Wednesday, June 22, 2011 6:00 PM
  • @'

    "CustomerID","CustomerName"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Bob"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Jane"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Zack"

    '@ | Set-Content .\dummy1.csv

     

    @'

    "CustomerID","City","OrderID"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","7765"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","8722"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","5342"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","7654"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","3456"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","5807"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","0977"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","0977"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","1538"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","5438"

    '@ | Set-Content .\dummy2.csv

     

    $csv = Import-Csv .\dummy1.csv

    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable

    $csv | ForEach-Object {

     $custID = $_.CustomerID

     $ht.$custID |

      Add-Member NoteProperty CustomerName $_.CustomerName -PassThru

    } | Select-Object CustomerName, CustomerID, City, OrderID


      Robert Robelo  
    Wednesday, June 22, 2011 6:20 PM
  • Load that customer guid / name csv into a hash table .

     

    $customer_ht = @{}
    import-csv customers.csv |
     foreach-object {
        $customer_ht["$_.CustomerID"] = $_.CustomerName
        }
       
    Then finding the customer name is just a matter of:

    $customer_ht["$_.CustomerID"]


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Wednesday, June 22, 2011 6:23 PM
  • Hi --

     

    Thanks for this example.  I like it.

    I am having a couple issues.

    You're using the same GUID for each customer.   I changed the last digit and also changed some of the GUIDs in other table to match.

    However, I'm getting errors because of null

     

    "CustomerID","CustomerName"
    "29f87f04-3022-4e80-87f2-9d2af6946685","Bob"
    "29f87f04-3022-4e80-87f2-ad2af6946686","Jane"
    "29f87f04-3022-4e80-87f2-bd2af6946687","Zack"
    '@ | Set-Content .\dummy1.csv
     
    @'
    "CustomerID","City","OrderID"
    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","7765"
    "29f87f04-3022-4e80-87f2-4d2af6946685","London","8722"
    "29f87f04-3022-4e80-87f2-9d2af6946686","Phoenix","5342"
    "29f87f04-3022-4e80-87f2-ad2af6946686","Kansas","7654"
    "29f87f04-3022-4e80-87f2-bd2af6946687","Moscow","3456"
    "29f87f04-3022-4e80-87f2-7d2af6946687","Jerusalem","5807"
    "29f87f04-3022-4e80-87f2-9d2af6946688","Phoenix","0977"
    "29f87f04-3022-4e80-87f2-4d2af6946688","London","0977"
    "29f87f04-3022-4e80-87f2-ad2af6946689","Kansas","1538"
    "29f87f04-3022-4e80-87f2-bd2af6946689","Moscow","5438"
    '@ | Set-Content .\dummy2.csv
     
    $csv = Import-Csv .\dummy1.csv
    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable
    $csv | ForEach-Object {
     $custID = $_.CustomerID
     $ht.$custID |
      Add-Member NoteProperty CustomerName $_.CustomerName -PassThru
    } | Select-Object CustomerName, CustomerID, City, OrderID

    Add-Member : Cannot bind argument to parameter 'InputObject' because it is null.
    At line:27 char:13
    +   Add-Member <<<<  NoteProperty CustomerName $_.CustomerName -PassThru
        + CategoryInfo          : InvalidData: (:) [Add-Member], ParameterBindingValidationException
        + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.AddMemberCommand

    CustomerName                                          CustomerID                                            City                                                  OrderID                                             
    ------------                                          ----------                                            ----                                                  -------                                             
    Jane                                                  29f87f04-3022-4e80-87f2-ad2af6946686                  Kansas                                                7654                                                
    Zack                                                  29f87f04-3022-4e80-87f2-bd2af6946687                  Moscow                                                3456   

     

     

     

    Wednesday, June 22, 2011 7:27 PM
  • Yes, relatively small.  Less than 75 rows.

     

    Speed isn't import for me... as long as I can get everything on one report nicely.

     

    Wednesday, June 22, 2011 8:09 PM
  • You're using the same GUID for each customer.

    No, look closely at the first char of each of GUID’s last group in the first record set (I’ve highlighted it), they are different.

    …I'm getting errors because of null

    In this version I added Angie to the first record set so we would get a Null when matching it to the second record set. To handle (ignore) this error just add the SilentlyContinue Error Action to the Add-Member statement (mind word wrapping).

    @'

    "CustomerID","CustomerName"

    "29f87f04-3022-4e80-87f2-8d2af6946685","Angie"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Bob"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Jane"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Zack"

    '@ | Set-Content .\dummy1.csv

     

    @'

    "CustomerID","City","OrderID"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","7765"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","8722"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","5342"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","7654"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","3456"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","5807"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","0977"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","0977"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","1538"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","5438"

    '@ | Set-Content .\dummy2.csv

     

    $csv = Import-Csv .\dummy1.csv

    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable

    $csv | ForEach-Object {

     $custID = $_.CustomerID

     $ht.$custID |

      Add-Member NoteProperty CustomerName $_.CustomerName -PassThru -ErrorAction SilentlyContinue

    } | Select-Object CustomerName, CustomerID, City, OrderID

     


      Robert Robelo  
    • Marked as answer by dm32813281 Thursday, June 23, 2011 2:10 AM
    Wednesday, June 22, 2011 9:49 PM
  • Thanks for your help -- it works great.

    Now I need to go figure out how to execute DB2 sql commands remotely and create a .CSV based on query.   Fun Fun Fun.

    Thursday, June 23, 2011 2:09 AM
  • One more question --

     

    If my .CSV file doesn't have field headings, what would need to be changed in the script to make it still work?   I assume I can use som ordinal values like [0] or [1]?

     

    Thursday, June 23, 2011 2:12 AM
  • You’re welcome, I’m glad to help.

    PowerShell v2’s Import-Csv Cmdlet has a Header parameter, pass the missing headers to both of these Cmdlet statements, e.g.

    # change these lines:

    $csv = Import-Csv .\dummy1.csv

    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable

     

    # to...

    $csv = Import-Csv .\dummy1.csv -Header CustomerID, CustomerName

    $ht = Import-Csv .\dummy2.csv -Header CustomerID, City, OrderID |

     Group-Object CustomerID -AsHashTable

     

     


      Robert Robelo  
    Thursday, June 23, 2011 2:25 AM
  • I figured out I can create a .CSV file in DB2 by using an EXPORT TO 'filename.csv' of del 'SELECT * FROM mytable'.  I've already tested with your code and it works good. 

    I do have another question, though.   I would like to add another column from the dummy1 file (for instance) to the output.   I thought I could simply do another Add-Member with the field name from dummy1 but I got errors and Powershell displayed the output in list instead of column format (without displaying my other field) and popped up a InputObject box for me to supply a value for Add-Member. 

    You have any idea what I may be doing wrong?

    Modified code is below with DOB added.   Output also changed which I dont like..

    @'
    "CustomerID","CustomerName","DOB"
    "29f87f04-3022-4e80-87f2-8d2af6946685","Angie",06/27/1968"
    "29f87f04-3022-4e80-87f2-9d2af6946685","Bob","01/02/2000"
    "29f87f04-3022-4e80-87f2-ad2af6946685","Jane","04/02/1975"
    "29f87f04-3022-4e80-87f2-bd2af6946685","Zack","03/20/1978"
    '@ | Set-Content .\dummy1.csv
     
    @'
    "CustomerID","City","OrderID"
    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","7765"
    "29f87f04-3022-4e80-87f2-4d2af6946685","London","8722"
    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","5342"
    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","7654"
    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","3456"
    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","5807"
    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","0977"
    "29f87f04-3022-4e80-87f2-4d2af6946685","London","0977"
    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","1538"
    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","5438"
    '@ | Set-Content .\dummy2.csv
     
    $csv = Import-Csv .\dummy1.csv
    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable
    $csv | ForEach-Object {
     $custID = $_.CustomerID
     $ht.$custID |
      Add-Member NoteProperty CustomerName $_.CustomerName -PassThru -ErrorAction SilentlyContinue |
      Add-Member NoteProperty DOB $_.DOB -PassThru -ErrorAction SilentlyContinue
    } | Select-Object CustomerName, CustomerID, City, OrderID, DOB


    CustomerName : Bob
    CustomerID   : 29f87f04-3022-4e80-87f2-9d2af6946685
    City         : Phoenix
    OrderID      : 5342
    DOB          : 01/02/2000

    CustomerName : Bob
    CustomerID   : 29f87f04-3022-4e80-87f2-9d2af6946685
    City         : Phoenix
    OrderID      : 0977
    DOB          : 01/02/2000

    CustomerName : Jane
    CustomerID   : 29f87f04-3022-4e80-87f2-ad2af6946685
    City         : Kansas
    OrderID      : 7654
    DOB          : 04/02/1975

    CustomerName : Jane
    CustomerID   : 29f87f04-3022-4e80-87f2-ad2af6946685
    City         : Kansas
    OrderID      : 1538
    DOB          : 04/02/1975

    CustomerName : Zack
    CustomerID   : 29f87f04-3022-4e80-87f2-bd2af6946685
    City         : Moscow
    OrderID      : 3456
    DOB          : 03/20/1978

    CustomerName : Zack
    CustomerID   : 29f87f04-3022-4e80-87f2-bd2af6946685
    City         : Moscow
    OrderID      : 5438
    DOB          : 03/20/1978

    Thursday, June 23, 2011 4:58 AM
  • To avoid errors let’s check first if the record from the first set exists in the second, then we add the members.

    You can use the Format-Table Cmdlet to force a table display.

    Give this a shot:

    @'

    "CustomerID","CustomerName","DOB"

    "29f87f04-3022-4e80-87f2-8d2af6946685","Angie",06/27/1968"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Bob","01/02/2000"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Jane","04/02/1975"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Zack","03/20/1978"

    '@ | Set-Content .\dummy1.csv

     

    @'

    "CustomerID","City","OrderID"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","7765"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","8722"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","5342"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","7654"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","3456"

    "29f87f04-3022-4e80-87f2-7d2af6946685","Jerusalem","5807"

    "29f87f04-3022-4e80-87f2-9d2af6946685","Phoenix","0977"

    "29f87f04-3022-4e80-87f2-4d2af6946685","London","0977"

    "29f87f04-3022-4e80-87f2-ad2af6946685","Kansas","1538"

    "29f87f04-3022-4e80-87f2-bd2af6946685","Moscow","5438"

    '@ | Set-Content .\dummy2.csv

     

    $csv = Import-Csv .\dummy1.csv

    $ht = Import-Csv .\dummy2.csv | Group-Object CustomerID -AsHashTable

    $csv | ForEach-Object {

     # append members if record from 1st record set exists in 2nd record set

     if ($ht.ContainsKey($_.CustomerID)) {

      $ht.($_.CustomerID) |

       Add-Member NoteProperty CustomerName $_.CustomerName -PassThru |

       Add-Member NoteProperty DOB $_.DOB -PassThru

     }

    } | Format-Table -AutoSize

    # or if you want the fields in a certain order

    # } | Format-Table CustomerName, CustomerID, DOB, City, OrderID -AutoSize

     


      Robert Robelo  
    • Marked as answer by dm32813281 Thursday, June 23, 2011 3:05 PM
    Thursday, June 23, 2011 12:03 PM
  • Thanks, Robert.

    You are a Powershell guru.

    I have a couple Powershell books, but none of them give me the kind of useful techniques that you've shown....

     

    Awesome!

     

    Thanks again!

     

    Thursday, June 23, 2011 3:05 PM
  • You're welcome, I'm glad to help.
      Robert Robelo  
    Thursday, June 23, 2011 3:08 PM