none
Script to merge multiple CSV files together with no duplicate records. RRS feed

  • Question

  • I like a Script to merge multiple CSV files together with no duplicate records.

    None of the files have any headers and column A got a unique ID. What would be the best way to accomplish that?  

    Monday, March 2, 2015 3:24 PM

Answers

  • This worked for me.

    Get-ChildItem D:\test\AllDailySales*.csv | foreach { Get-Content $_ } | sort -Unique | Out-File D:\Reports\AllDailySalesComplete.csv -Encoding ascii 

    I run this a couple times a day and run in to a problem with the records of the current day,

    The records of the current day change thru out the day. So it is starts duplicating.

    I need to find first column and if there are duplicate records and if there are. Then compare column E of the records and drop it if it is 0 and or keep the record with the gather value in column E.

    Does that make sense?

    Saturday, March 28, 2015 12:26 AM

All replies

  • What does your script look like so far?

    -- Bill Stewart [Bill_Stewart]

    Monday, March 2, 2015 3:32 PM
    Moderator
  • OK here is my answer :

    2 files in a directory with no headers.

    first column is the unique ID, second colomun you put whatever u want

    The headers are added when using the import-csv cmdlet

    first file contains :

    1;a
    2;SAMEID-FIRSTFILE
    3;c
    4;d
    5;e

    second file contains :

    6;a
    2;SAMEID-SECONDFILE
    7;c
    8;d
    9;e

    the second file contains the line : 2;b wich is the same in the first file

    the code :

    $i = 0
    Foreach($file in (get-childitem d:\yourpath)){
        if($i -eq 0){
            $ref = import-csv $file.fullname -Header id,value -Delimiter ";"
        }else{
            $temp = import-csv $file.fullname -Header id,value -Delimiter ";"
            foreach($line in $temp){
                if(!($ref.id.contains($line.id))){
                    $objet = new-object Psobject
                    Add-Member -InputObject $objet -MemberType NoteProperty -Name id -value $line.id
                    Add-Member -InputObject $objet -MemberType NoteProperty -Name value -value $line.value
                    $ref += $objet
                }
            }
        }
        $i++
    }
    
    $ref

    $ref should return:

    id                                                          value
    --                                                          -----
    1                                                           a
    2                                                           SAMEID-FIRSTFILE
    3                                                           c
    4                                                           d
    5                                                           e
    6                                                           a
    7                                                           c
    8                                                           d
    9                                                           e

    (get-childitem d:\yourpath) -> yourpath containing the 2 csv file



    • Edited by TonQ Monday, March 2, 2015 4:02 PM
    • Proposed as answer by TonQ Tuesday, March 10, 2015 4:28 PM
    Monday, March 2, 2015 3:57 PM
  • Hmm, when the CSV files have no headers, I think you should not use Import-Csv.

    What about this:

    Get-ChildItem C:\temp\file*.csv | foreach { Get-Content $_ } | sort -Unique | Out-File C:\temp\fileX.csv -Encoding ascii
    With the Get-Content you collect the contents of all the files and put them into the pipeline. Sort -Unique eliminates any duplicate lines.

    Monday, March 2, 2015 4:03 PM
  • i'm adding the header by using the delimiter, it's pretty straight forward :

     $ref = import-csv $file.fullname -Header id,value -Delimiter ";"

    -header id,value

    Monday, March 2, 2015 4:14 PM
  • this works grate. except this first character I cant figure out why its there. any ideas? 

    Wednesday, March 11, 2015 7:02 PM
  • What solution works ? walterfmb or mine ?
    Thursday, March 12, 2015 1:30 PM
  • This worked for me.

    Get-ChildItem D:\test\AllDailySales*.csv | foreach { Get-Content $_ } | sort -Unique | Out-File D:\Reports\AllDailySalesComplete.csv -Encoding ascii 

    I run this a couple times a day and run in to a problem with the records of the current day,

    The records of the current day change thru out the day. So it is starts duplicating.

    I need to find first column and if there are duplicate records and if there are. Then compare column E of the records and drop it if it is 0 and or keep the record with the gather value in column E.

    Does that make sense?

    Saturday, March 28, 2015 12:26 AM
  • how would I run this.  In a .bat file?  or CMD?

    You didn't mention how to run this script.  Thanks.

    Friday, March 22, 2019 8:06 AM
  • Powershell.

    Please do not add additional questions to other peoples threads. Create a new one for yourself and reference this one if needed.


    Live long and prosper!

    (79,108,97,102|%{[char]$_})-join''

    Friday, March 22, 2019 8:09 AM