locked
Find duplicates in 2 csv files and copy extra data. RRS feed

  • Question

  • I have this issue, I have 2 csv files. the first contains several thousand  rows of data broken down by column. I have a second similar csv broken down the same way but different column names. Also, the two may be slightly different. The first is my primary and I wish to find the matching data in the second worksheet based on an 'identifier' column, then for the matching rows pull the cell data from column 'x' (which ever that is) and move it to the primary csv and place it in a new column 'x' of the matching row. 

    Can someone help?

    Thursday, November 29, 2018 1:26 AM

Answers

  • Working with CSV files, and not worksheets, this should help. You'll have to change the SQL query (in the variable $sql) to match the columns in your CSV files. You can also remove the line of code that displays the variable $dt on the console.

    $csv1 = "c:\temp\File1.txt"          # the file that's missing the data
    $csv2 = "c:\temp\File2.txt"          # the file with the data
    
    $OutputCSV = "C:\temp\File3.csv"     # <=== This file can be in any directory
    
    # both input CSV files (i.e. "tables") used in the query must be in the same directory (i.e. "database")
    $path1 = split-path $csv1
    $path2 = split-path $csv2
    if ($path1 -ne $path2)
    {
        write-host "ERROR: Both CSV files must be in the directory" -ForegroundColor Red
        return
    }
    
    # just pick the 1st provider (there are two on my system, one for MS Office 12 and one for 15)
    $provider = (New-Object System.Data.OleDb.OleDbEnumerator).GetElements() | Where-Object { $_.SOURCES_NAME -like "Microsoft.ACE.OLEDB.*" } 
    if ($provider -is [system.array]) { $provider = $provider[0].SOURCES_NAME } else {  $provider = $provider.SOURCES_NAME }
    
    $connstring = "Provider=$provider;Data Source=$(Split-Path $csv1);Extended Properties='text;HDR=YES';"
    
    $tablename1 = (Split-Path $csv1 -leaf).Replace(".","#")
    $tablename2 = (Split-Path $csv2 -leaf).Replace(".","#")
    
    # table "a" is the first file (with the empty 'ColumnD')
    # table "b" is the second file (with the data to copy)
    $sql = "SELECT a.identifier, a.ColumnB, a.ColumnC, b.ColumnN as ColumnD, a.ColumnE FROM $tablename1 as a LEFT OUTER JOIN $tablename2 as b ON a.identifier = b.ColumnD"
    
    # Setup connection and command
    $conn = New-Object System.Data.OleDb.OleDbconnection
    $conn.ConnectionString = $connstring
    $conn.Open()
    
    $cmd = New-Object System.Data.OleDB.OleDBCommand
    $cmd.Connection = $conn
    $cmd.CommandText = $sql
    
    # Load into datatable
    $dt = New-Object System.Data.DataTable
    $dt.Load($cmd.ExecuteReader("CloseConnection"))
    # Clean up
    $cmd.dispose | Out-Null
    $conn.dispose | Out-Null
    
    # Output results to console
    $dt | Format-Table -AutoSize
    # Output results to a new CSV
    $dt | Export-Csv -Path $OutputCSV -NoTypeInformation
    
    ###################################################################################
    ###################################################################################
    # Contents of test file1.txt and file2.txt files
    # if you want to create them
    $file1 = @"
    identifier,ColumnB,ColumnC,ColumnD,ColumnE
    123456,joe,blow,,red
    234567,mary,contrary,,yellow
    345678,mother,goose,,blue
    456789,paul,bunyan,,green
    "@
    
    $file2 = @"
    ColumnD,ColumnN
    123456,abc
    345678,ghi
    456789,jkl
    "@
    
    # the output CSV should look like this
    # Note that the row with identifier 234567 in file1 has no matching record in file2
    # so ColumnD is empty in the resulting table
    $file3 =@"
    "identifier","ColumnB","ColumnC","ColumnD","ColumnE"
    "123456","joe","blow","abc","red"
    "234567","mary","contrary","","yellow"
    "345678","mother","goose","ghi","blue"
    "456789","paul","bunyan","jkl","green"
    "@


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)


    Tuesday, December 4, 2018 2:35 AM

All replies

  • Here is the easiest and fastest way to join two CSVs and produce a merged result.

    https://1drv.ms/u/s!AjiiPtIUqzK_gs0UobI_Zrj9tslXKw

    This uses the ACE drivers to join two CSV files using a SQL select statement.  Just design the SQL for your operation and call the function.


    \_(ツ)_/

    Thursday, November 29, 2018 1:55 AM
  • jrv,

    that's interesting, I didn't know (should have guessed though) that you could put sql statements in powershell. That's handy. However, I'm not wanting to merge both sheets. 

    Let me explain this way, both csv's are similar and contain 98% same information but probably in different a different order and sheet 2 has an extra column. I'm trying to get the rows that match in the first sheet to have the data in the extra column in the second sheet copied over to the first sheet.

    Once I get that figured out, then I want to get powershell to create a few extra columns and pull keywords out of a description and copy them to their own cell in the same row. but that's later. :-)

    Thursday, November 29, 2018 2:29 AM
  • That is what SQL is for.  You just have to write the statement and run it.

    This forum is not for script requests.  You need to ask a single question and not expect someone to design a solution.

    Please carefully review the following links to set your expectation for posting in  technical forums.

    This Forum is for Scripting Questions Rather than script requests


    \_(ツ)_/

    Thursday, November 29, 2018 2:31 AM
  • I'm not asking for a script, I am trying to figure out how to copy cell data from a matching row and I'm not sure how to do it. Normally I would try a for-each, but I am not sure how to specify 1. when a row matches, then 2. copy the data from cell (x,y) to cell (y,z) or what ever the corresponding cell is. 

    the search for words and populate separate columns, I can do once I understand the first part.

    Thursday, November 29, 2018 2:53 AM
  • Use ForEach-Object and Where-Object.

    Use one file to look up the row in the second file.

    If you do not know PowerShell and have no programming experience then you should just hire a consultant to help you. Without experience you will not be able to do this.

    If you think you ca then post your script and ask a specific question about scripting.


    \_(ツ)_/

    Thursday, November 29, 2018 3:08 AM
  • Hello,

    All of this can be completed in Excel. It has very robust functions built it.

    I use the VLookup function all the time for comparing resulting reports and datasets.

    https://www.youtube.com/watch?v=XhJi2iDZa8Q

    Thursday, November 29, 2018 5:10 PM
  • Hello,

    All of this can be completed in Excel. It has very robust functions built it.

    I use the VLookup function all the time for comparing resulting reports and datasets.

    https://www.youtube.com/watch?v=XhJi2iDZa8Q

    This won't work since files have different headers.


    \_(ツ)_/

    Thursday, November 29, 2018 5:34 PM
  • thanks, I knew I would need the for-each, but forgot about utilizing the where-object. 

    so, (my network was down a while yesterday). Here is what I am trying to work with. I have done a bunch of simple text file and file/folder scripts. but nothing that deals with csv contents.

    so here is the crude version.

    $csv1 = import-csv -path 'c:\pathtocsv\csv1.csv #pull in data

    $csv2 = import-csv -path 'c:\pathtocsv\csv2.csv 3 pull in data2

    $compare = compare-object $csv1 $csv2 -includeequal | ? {$_.sideindicator -eq "=="} #compare for equality

    if ($compare) #start copy process

       {foreach $row in $compare.inputojbect) #not sure what to do here, probably put the where-object here

            {copy $csv2.column $csv1.column}} #not sure what to do here, but the intent should be clear

       else

            {write-host "nothing found"}

    Saturday, December 1, 2018 2:12 AM
  • I just watched the video and that may work to do the compare part but I also need to automate it and copy contents from another cell in the second sheet to the first to a particular column, but only for matching rows.


    Saturday, December 1, 2018 2:14 AM
  • My first reaction to this is that you are stating the problem wrong or, at least, in a misleading way.

    Do you want to find duplicate?  Do you want to match records?  What is the final purpose of this exercise?

    If you can clearly answer those questions and especially the last one then you will be 90% of the way to a solution.

    To match records in a Scv you need to enumerate one Cv and look up the match in the second Csv.

    $csv1 | 
        ForEach-Object{
              if($_.fieldx -in $csv2.Fieldx){
                   #do something
             }
        }


    \_(ツ)_/

    Saturday, December 1, 2018 2:22 AM
  • it's a match records and add.

    csv1 contains all good records.

    csv2 contains 1 column of needed records.

    match rows and copy  csv2 cell data to csv1 cell. 

    I guess I just don't understand how to tell powershell, I want the data of "this" cell copied to the other csv and pasted into "this" cell of the matching row. 

    I found another script that appears to have most everything I need, but like most scripts I find, I need to debug and fix (usually an outdated script or typo by someone, sometimes me. :-)

    the script below is the other one I found and I think I modified it for my needs, but I'm getting 

    "index operation failed; the array index evaluated to null". I'm guessing I have either pointed something wrong or it's not importing for some reason. 

    thanks for the help so far. 

    $csv1 = Import-Csv "filename.csv"
    $csv2 = Import-CSV "filename.csv"
    
    $indexKey = 'UserID' 
    $index1 = @{}; foreach($row in $csv1){$index1[$row.$indexKey] = $row.'department'}
    
    $copyfield = 'department'
    foreach($row in $csv2){
        if ($matched = $index1[$row.'User ID']){
            Add-Member @{$copyField = $matched.$copyfield} -InputObject $row -Force
         }
    }
    
    export-csv 'filepath.csv' -NoTypeInformation -Encoding UTF8 -InputObject $csv2 -Force

    Saturday, December 1, 2018 3:49 AM
  • You still haven't stated the problem in any understandable way.  You statement is like a recipe with not result.  It is just a bunch of disconnected statement.  There is no way anyone can understand wht you want without a clear statement of what you are trying to accomplish.

    Match what with what?  What gets moved/copied. 

    You are just repeatedly making very bad guesses because your understanding of programming and your ability to state the problem are both incomplete.

    You keep saying "match rows".  What is that. There is no concept in programming called "match rows". 

    You need to think about what you are asking from the perspective of someone who does not have your system and cannot see your files.  We also caanot get in your head to figure out what you are trying to ask


    \_(ツ)_/

    Saturday, December 1, 2018 4:16 AM
  • quick clarification.

    1. this is scripting not programming. (programming requires compiling :-P) 

    2. I have already stated 2 csv files (think excel spreadsheets, but you know that.)

    3. an example, sheet 1 row 1, column A "identifier" = 123456, sheet 2 row 3001, column D = 123456 (same identifier). match. task is copy data from row 3001 column N to sheet 1 row 1, column D and paste.

    that's all I'm trying to do, but usually I can't quite get the logic to work in my head. I'm a fixer of problems (20 year sys admin) not a scripter/programmer. I enjoy learning scripting (as rarely as I use it) but never got the knack of programming. My best bud can program all day long, and I respect that, but I myself can't do that, I tend to find answers to problems that others can't solve. I just changed jobs and had to train 5 people just to get them started with on all I did (at least on the surface). VMware, Arcserve, powershell, Windows, DNS, DHCP, Group Policy......etc.

    Monday, December 3, 2018 4:50 AM
  • No.  CSV files are NOT Excel.  They are industry standard text data files.  See: https://en.wikipedia.org/wiki/Comma-separated_values

    Your request is vague.  What do you want to do with the files?  If you really want to edit them in Excel then you are going to have to learn how to program Excel with PowerShell. 

    If you want to just deal with PowerShell imported CSV files then you will need to understand how objects and object collections work

    $csv1 = Import-Csv filename.csv
    $row = 3001
    $csv1[$row]

    TO get columsn we either use the header names or import the CSV and specify a header on import.

    $csv1 = Import-Csv filename.csv -Header Date, Time, Count …

    To learn how to use PowerShell start with the following:

    1. Microsoft Virtual Academy - Getting Started with Microsoft PowerShell
    2. PowerShell Documentation
    3. PowerShell Style Guidelines


    \_(ツ)_/

    Monday, December 3, 2018 4:59 AM
  • quick clarification.

    1. this is scripting not programming. (programming requires compiling :-P) 

    Scripting IS programming. It is exactly the same methodology and follows all of the same rules of program design and building. Both are also called coding.  Believe me. I have been programming for more than 40 years.

     

    \_(ツ)_/

    Monday, December 3, 2018 5:02 AM
  • read carefully, I didn't say csv's are excel. I said "think excel" as that is a common program used for csv files. Yes, I won't argue that scripting and programming follow the same methodology, however, every time I have said "programming in powershell" (or something similar) the dev team gets bent out of shape and declares that scripting is not programming (based on compiling). I however believe that any type of instruction typed via keyboard for a computer to use to perform ANY function is programming. 

    now that being said, I gave an example, you apparently didn't read it. I don't want to use excel because it "appears" to use VBscript for the macro function. I don't want to start learning another language. I only want to use powershell, because I am familiar with that but not an expert, thus my questions. I DID like assembler, I don't like basic, basic a, pascal, fortran, or any of the many useful languages I had to learn back in high school. I disliked punch cards the most. 

    I just want to understand, how to get the results I need so I can move to the next step. 

    so first, you said,

    If you want to just deal with PowerShell imported CSV files then you will need to understand how objects and object collections work

    that is correct, I do want to understand, that's my question, I don't recall the "object collection", I assume that means "array"?

    AND something helpful,

    TO get columsn we either use the header names or import the CSV and specify a header on import.

    OK, so I CAN specify a header on import (I wasn't sure on this, I can read, but my big problem is I tend to interpret things in several different ways and have to guess at what the writer is trying to say), great, now I'm making progress. 

    so, in powershell, how do I say,"powershell, compare these files. When you find rows that match copy the data from this column on this sheet on this row to the matching row on the other sheet and put it in this cell?"

    that's all I'm trying to do, but, it's difficult to figure out the logic that goes with it. 

    Of course I have to do a foreach to iterate through all the lines of the file probably with a where clause, but I am unsure of the statement structure. 

    Monday, December 3, 2018 5:34 AM
  • Rows match?  Her ewe are again.  What matches?  Do you mean equal or matches in part?

    Here is how to find all matching rows in two CSV files.

    $csv1 |
        ForEach-Object{
            $item = $_.Column1
            if($csv2 | Where-Object{ $_.Column1 -eq $item }) {
                Write-Host 'Rows match' $item
            } else {
                Write-Host 'Rows don''t match' $item
            }
        }


    \_(ツ)_/

    Monday, December 3, 2018 5:52 AM
  • Working with CSV files, and not worksheets, this should help. You'll have to change the SQL query (in the variable $sql) to match the columns in your CSV files. You can also remove the line of code that displays the variable $dt on the console.

    $csv1 = "c:\temp\File1.txt"          # the file that's missing the data
    $csv2 = "c:\temp\File2.txt"          # the file with the data
    
    $OutputCSV = "C:\temp\File3.csv"     # <=== This file can be in any directory
    
    # both input CSV files (i.e. "tables") used in the query must be in the same directory (i.e. "database")
    $path1 = split-path $csv1
    $path2 = split-path $csv2
    if ($path1 -ne $path2)
    {
        write-host "ERROR: Both CSV files must be in the directory" -ForegroundColor Red
        return
    }
    
    # just pick the 1st provider (there are two on my system, one for MS Office 12 and one for 15)
    $provider = (New-Object System.Data.OleDb.OleDbEnumerator).GetElements() | Where-Object { $_.SOURCES_NAME -like "Microsoft.ACE.OLEDB.*" } 
    if ($provider -is [system.array]) { $provider = $provider[0].SOURCES_NAME } else {  $provider = $provider.SOURCES_NAME }
    
    $connstring = "Provider=$provider;Data Source=$(Split-Path $csv1);Extended Properties='text;HDR=YES';"
    
    $tablename1 = (Split-Path $csv1 -leaf).Replace(".","#")
    $tablename2 = (Split-Path $csv2 -leaf).Replace(".","#")
    
    # table "a" is the first file (with the empty 'ColumnD')
    # table "b" is the second file (with the data to copy)
    $sql = "SELECT a.identifier, a.ColumnB, a.ColumnC, b.ColumnN as ColumnD, a.ColumnE FROM $tablename1 as a LEFT OUTER JOIN $tablename2 as b ON a.identifier = b.ColumnD"
    
    # Setup connection and command
    $conn = New-Object System.Data.OleDb.OleDbconnection
    $conn.ConnectionString = $connstring
    $conn.Open()
    
    $cmd = New-Object System.Data.OleDB.OleDBCommand
    $cmd.Connection = $conn
    $cmd.CommandText = $sql
    
    # Load into datatable
    $dt = New-Object System.Data.DataTable
    $dt.Load($cmd.ExecuteReader("CloseConnection"))
    # Clean up
    $cmd.dispose | Out-Null
    $conn.dispose | Out-Null
    
    # Output results to console
    $dt | Format-Table -AutoSize
    # Output results to a new CSV
    $dt | Export-Csv -Path $OutputCSV -NoTypeInformation
    
    ###################################################################################
    ###################################################################################
    # Contents of test file1.txt and file2.txt files
    # if you want to create them
    $file1 = @"
    identifier,ColumnB,ColumnC,ColumnD,ColumnE
    123456,joe,blow,,red
    234567,mary,contrary,,yellow
    345678,mother,goose,,blue
    456789,paul,bunyan,,green
    "@
    
    $file2 = @"
    ColumnD,ColumnN
    123456,abc
    345678,ghi
    456789,jkl
    "@
    
    # the output CSV should look like this
    # Note that the row with identifier 234567 in file1 has no matching record in file2
    # so ColumnD is empty in the resulting table
    $file3 =@"
    "identifier","ColumnB","ColumnC","ColumnD","ColumnE"
    "123456","joe","blow","abc","red"
    "234567","mary","contrary","","yellow"
    "345678","mother","goose","ghi","blue"
    "456789","paul","bunyan","jkl","green"
    "@


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)


    Tuesday, December 4, 2018 2:35 AM
  • Rich,

    thanks, I know enough powershell to get simple things to work, but this is more complicated than I thought it would be. I will have to read through it several times and lookup a couple things like split-path and the sql stuff. I would never be able to figure that out on my own, even though that is my preferred learning path. :-)

    Tom

    Tuesday, December 4, 2018 3:01 AM
  • Rich,

    just an fyi, I tried to just set the path and file name (to see what happens) and I got an error, (not asking for a fix, just letting you know). In the ISE powershell it's showing an error for "Exception setting "ConnectionString":,"An OLE DB Provider was not specified in the ConnectionString. An example would be, 'Provider='SQLOLEDB;'." 

    ....ps1:30 char:1

    I haven't looked it up yet, but it sounds interesting. :-)

    tom 

    Tuesday, December 4, 2018 3:32 AM
  • Rich,

    just an fyi, I tried to just set the path and file name (to see what happens) and I got an error, (not asking for a fix, just letting you know). In the ISE powershell it's showing an error for "Exception setting "ConnectionString":,"An OLE DB Provider was not specified in the ConnectionString. An example would be, 'Provider='SQLOLEDB;'." 

    ....ps1:30 char:1

    I haven't looked it up yet, but it sounds interesting. :-)

    tom 


    Without your code it is impossible to know what caused this.

    \_(ツ)_/

    Tuesday, December 4, 2018 3:41 AM
  • Most likely is that you don't have the correct software installed.

    Run these two lines in a Powershell window:

    $provider = (New-Object System.Data.OleDb.OleDbEnumerator).GetElements() | Where-Object { $_.SOURCES_NAME -like "Microsoft.ACE.OLEDB.*" } 
    $provider

    You should see something like this if you have multiple providers on the machine:

    SOURCES_NAME        : Microsoft.ACE.OLEDB.12.0
    SOURCES_PARSENAME   : {3BE786A0-0366-4F5C-9434-25CF162E475E}
    SOURCES_DESCRIPTION : Microsoft Office 12.0 Access Database Engine OLE DB Provider
    SOURCES_TYPE        : 1
    SOURCES_ISPARENT    : False
    SOURCES_CLSID       : {3BE786A0-0366-4F5C-9434-25CF162E475E}
    
    SOURCES_NAME        : Microsoft.ACE.OLEDB.15.0
    SOURCES_PARSENAME   : {3BE786A1-0366-4F5C-9434-25CF162E475E}
    SOURCES_DESCRIPTION : Microsoft Office 15.0 Access Database Engine OLE DB Provider
    SOURCES_TYPE        : 1
    SOURCES_ISPARENT    : False
    SOURCES_CLSID       : {3BE786A1-0366-4F5C-9434-25CF162E475E}

    Or something like this is you have only one provider:

    Microsoft.ACE.OLEDB.12.0

    If you don't have the ACE driver, see this link:

    https://www.microsoft.com/en-us/download/details.aspx?id=54920

    Or search for "install microsoft ace driver" on Bing/Google/etc.


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Tuesday, December 4, 2018 4:22 PM
  • It looks more complicated than it really is. Perhaps MS will produce a set of cmdlets one day!

    Broken down, the code:

    1. Verifies the CSVs are in the right place
    2. Selects a provider
    3. Creates a connection string
    4. Makes a connection to the database engine
    5. Creates a command (using the connection from #4, and the query in the variable "$sql")
    6. Loads a Data Table with the results of the query
    7. Exports the results

    If you were using a MS Access database you could use COM to reduce the clutter in the code, but not with CSVs.


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Tuesday, December 4, 2018 4:34 PM
  • If you use this function and design a SQL statement that joins, filters and groups as you want then that is all you need to do.

    Here is the easiest and fastest way to join two CSVs and produce a merged result.

    https://1drv.ms/u/s!AjiiPtIUqzK_gs0UobI_Zrj9tslXKw

    This uses the ACE drivers to join two CSV files using a SQL select statement.  Just design the SQL for your operation and call the function.


    \_(ツ)_/

    Tuesday, December 4, 2018 5:22 PM
  • It may not be obvious from looking at the function, but the CSV files would all need to be in one directory (thus there being only a single $csvPath parameter).

    FYI, the provider string, I think, no longer uses the "HDR=Yes", and "Format=Delimited". For CSV files with no headers, or use a delimiter other than a comma, it's necessary to create a schema.ini file in the same directory as the CSV files.

    Conveniently, it's also possible to use that schema.ini file to specify the use of fixed-width columns (a question that was asked in this forum a while back).

    http://cdn.cdata.com/help/RVB/cd/pg_schemainicsv.htm


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)


    Tuesday, December 4, 2018 9:34 PM
  • Can be modified to use any path you want. 


    \_(ツ)_/

    Tuesday, December 4, 2018 9:51 PM
  • Thusly:

    $sql = @'
    	SELECT 
    		F1.ID as ID,
    		F1.Name as Name, 
    		F1.Status as Status1, 
    		F2.Status as Status2 
    	FROM 
    		[c:\test\test1.csv] as F1,
    		[d:\other\test2.csv] As F2 
    	WHERE 
    		F1.ID = F2.ID
    '@


    \_(ツ)_/

    Tuesday, December 4, 2018 9:53 PM
  • Rich,

    Loaded ACE and got past that error, now I have a different error that i am researching. For some reason the second line is generating an error. I back tracked to make sure I didn't have any typos since the $dt is coming back blank. Each of the other variables has data. 

    # Load into datatable $dt = New-Object System.Data.DataTable $dt.Load($cmd.ExecuteReader("CloseConnection"))

    Exception calling "ExecuteReader" with "1" argument(s): "No value given for one or more required parameters."

    At c:\path to file\....


    Wednesday, December 5, 2018 4:28 PM
  • With various versions of the provider you will not be able to use that construct.

    $dt = New-Object System.Data.DataTable
    $rdr = $cmd.ExecuteReader()
    $dt.Load($rdr)
    $rdr.Close()
    

    I also find the SQL suspicious.


    \_(ツ)_/

    Wednesday, December 5, 2018 6:07 PM
  • $sql = @'
    SELECT 
        a.identifier, a.ColumnB, a.ColumnC, b.ColumnN as ColumnD, a.ColumnE 
    FROM {0} 
        LEFT OUTER JOIN {1} as b ON a.identifier = b.ColumnD
    '@ -f $csv1,$csv2


    \_(ツ)_/

    Wednesday, December 5, 2018 6:11 PM
  • Best guess? The SQL query makes a reference to a column name that doesn't exist.

    What do the column headers look like for each of the CSV files, and what's your SQL query?


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Wednesday, December 5, 2018 7:05 PM
  • Firstly - my function works perfectly as written.

    $sql = "
    SELECT a.identifier, a.ColumnB, a.ColumnC, b.ColumnD as ColumnD, a.ColumnE
    FROM 
        [csv1.csv] as a, 
        [csv2.csv] as b
    WHERE 
        a.identifier = b.ColumnD"
    Get-CsvReader -csvPath $csvPath -CommandText $sql
    It joins as expected. 


    \_(ツ)_/

    Wednesday, December 5, 2018 7:06 PM
  • A LFET OUTER works like this:

    $sql = @'
    SELECT a.identifier, a.ColumnB, a.ColumnC, b.ColumnD as ColumnD, a.ColumnE
    FROM [csv1.csv] as A
    LEFT OUTER JOIN [csv2.csv] as B
    ON A.identifier = B.ColumnD
    '@ 
    Get-CsvReader -csvPath $csvPath -CommandText $sql

    The full function is here:

    Function Get-CsvReader {
    	Param (
    		$CsvPath=$pwd,
    		[Parameter(Mandatory)]
            $CommandText,
            $Provider = 'Microsoft.ACE.OLEDB.16.0'
            
    	)
    	
    	$tmpl = 'Provider={0};Data Source={1};Extended Properties="Text;HDR=Yes;FORMAT=Delimited"'
    	$connStr = $tmpl -f $Provider, $CsvPath
    	Write-Verbose $connStr	
    	
    	$csvConnection = New-Object System.Data.OleDb.OleDbConnection($connStr)
    	$csvConnection.Open()
    	
    	$cmd = $csvConnection.CreateCommand()
    	$cmd.CommandText = $CommandText
    	
    	$dt = New-Object System.Data.DataTable
    	$rdr = $cmd.ExecuteReader()
    	$dt.Load($rdr)
    	$rdr.Close()
    	$dt
    }

    Unfortunately the CSV provider does not correctly support the LEFT OUTER JOIN syntax so it returns blank records which can be easily filtered.

     Get-CsvReader -csvPath $csvPath -CommandText $sql | where{$_.identifier.ToString().Trim()}


    \_(ツ)_/


    • Edited by jrv Wednesday, December 5, 2018 7:23 PM
    Wednesday, December 5, 2018 7:11 PM
  • Note that files must be ANSI and not Unicode.

    Here is the result.


    \_(ツ)_/

    Wednesday, December 5, 2018 7:27 PM
  • The SQL in the example works with the test data, so it isn't the syntax.

    What do you think is wrong with the syntax in the example code I posted?

    $sql = "SELECT a.identifier, a.ColumnB, a.ColumnC, b.ColumnN as ColumnD, a.ColumnE FROM $tablename1 as a LEFT OUTER JOIN $tablename2 as b ON a.identifier = b.ColumnD"


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Wednesday, December 5, 2018 7:41 PM
  • Rich,

    I think you are right, I forgot to set my columns. I'm working that now. but...

    for my sql lesson, I (probably poorly) interpret the select statement as this?

    select (my identifier column) and sheet1 columnB and sheet1 columnC and sheet2 columnN rename to columnD and sheet1 columnE FROM array named tablename1 and join it to array named tablename2 as b (?) based on (my identifier column) being = sheet2 columnD

    the columns I am actually using are sheet1 A (identifier) and P (empty destination) and sheet2 M (matching identifier) and K (data to copy)

    Thursday, December 6, 2018 3:08 AM
  • There are no "sheets" in a CSV. 

    Rich's code does too many unnecessary steps.  Just run the basic code and fix your files.  CSVs must be ANSI with headers.  The last SQL I posted does exactly what you asked.  I have run it and tested it with many variations of the SQL.


    \_(ツ)_/


    • Edited by jrv Thursday, December 6, 2018 3:22 AM
    Thursday, December 6, 2018 3:19 AM
  • jrv,

    I am sure you are correct about the steps, most of my powershell is the same. For me it makes it easier to understand till I get better at it. 

    So, I tried your code and found I had a couple syntax issues, and fixed those. Now I am getting an error on this line. 

    [Parameter(Mandatory)]

    the error says "property 'mandatory' cannot be found......." when I was typing this in, I raised my

    eyebrows as I hadn't seen that before. So, now that i see an error, I am guessing that i'm supposed

    to put something there, but your code is so compact, I'm not quite getting it.

    I would assume it's looking for a path, but you have $pwd listed, but I don't see it defined anywhere.

    would you mind adding some explanations? I'm really feeling newbish now.

    tom

    Friday, December 7, 2018 4:21 AM
  • You are using PS 2.   Why?  You should be using PS 5.1.  PS 2 is not longer supported on Windows 7 and later due to security issues.


    \_(ツ)_/

    Friday, December 7, 2018 4:28 AM
  • jrv,

    No, I'm on Windows 10 with version 5.1.17134.407

    so, as a test, I pulled out the mandatory and (not sure if it's expected as a result) it gave me an error at 

    "cannot convert the "select....." value of type "system.string" to type "system.management.automation.parameterattribute"

    sorry, I figured this would be a simple task, apparently not. 

    tom

    Monday, December 10, 2018 3:28 AM
  • That syntax works fine in all later versions of PowerShell so you have a system issue.

    Can you run this with no error?

    function test {
        param (
            [Parameter(Mandatory)]
            $msg
        )
        Write-Host $msg
    }
    test hello
    
     


    \_(ツ)_/

    Monday, December 10, 2018 3:38 AM
  • jrv,

    that seems to work, it echoed the script and finished with "hello" (ran it in ISE)

    tom

    Monday, December 10, 2018 3:56 AM
  • So now what script are you using that causes the issue?


    \_(ツ)_/

    Monday, December 10, 2018 4:08 AM
  • jrv,

    just going through the code again and found a typo. that's on me. fixed that, put the "mandatory" back in and now I am getting a few errors, look like the "snowball effect"

    so the error I am getting now, (haven't researched it yet) line 19 char:5

    $csvconnection.open()

    Looks like it's not making the connection, so tracing backwards, it looks like some of the variables are blank. 

    but $csvpath is set to c:\users\admin, so I know that is probably  not right unless that is the working dir. 

    However, I suspect that is where it is looking for the csv files, so I copied them to that location and re-ran the script and got past the errors and now it's is moving forward a little. 

    now it's complaining about the word text in this line. the error is unexpected token 'text' in expression or statement.

    $tmpl = 'provider={0}; data source={1}; extended properties="text;HDR=yes;format=delimited"'

    I'm looking that up now. 

    tom

    Monday, December 10, 2018 4:21 AM
  • Check your code again. "Text" is within a protected string and shouldn't be seen as anything but part of a string.

    Mismatched double or single quotes are what I'd be looking for.


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Monday, December 10, 2018 4:29 PM