locked
Select intermittent range of cells while Excel to CSV conversion using Powershell RRS feed

  • Question

  • My requirement is to read an excel file, copy it to a different location and convert it to CSV format for further use by Azure Data Factory - Copy activity. The issue is that data in excel is present in intermittent cells (i.e. from A1:V:1800 and then AB1:AQ1800). I have created below PowerShell script it is not reading the data in intermittent cells but reading whole excel. Please help in correcting it to select only specific range of values. Script is given below :

        $OriginFile = "\\aaaa\share3\Test\MasterCalendar DASH.xlsx"
        $SourceFile = "C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.xlsx"
        $sCSVFile = "C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.csv"
        $OrigFileExists = Test-Path $OriginFile
        $CSVFileExists = Test-Path $sCSVFile
    
        If ($OrigFileExists -eq $True)
        {      
            # Copy latest excel file to Source Location
            Copy-item path $OriginFile destination $SourceFile -Force
    
            if ($CSVFileExists -eq $True)
            {
                Remove-item -path $sCSVFile
            }
    
            # Convert excel file to CSV format
            # Get COM Object
            $Excel = New-Object -ComObject "Excel.Application"
            $ExcelWkbook = $Excel.Workbooks.Open($SourceFile)
            $ExcelWksheet = $ExcelWkbook.Worksheets.item(1)
            $DataColmns = $ExcelWksheet.Range("A1:V1793")
            $XLcsv = 6
            $DataColmns.worksheet.SaveAs($sCSVFile ,$XLcsv)
            $ExcelWkbook.Close($True)
            $Excel.Quit()
        } 
        Else
        {
            Exit
        } 

    Also, right now the excel has data till row no 1800 which will increase in future, how can I make the row number dynamic in above script so that all rows that have data are picked up.

    Thursday, April 30, 2020 10:47 PM

All replies

  • Why restrict the conversion.  Excel will only convert the used cells.  Just open the file and save it in the new format.

    This is how to convert a CSV to an XLSL file.

    $originFile = '\\aaaa\share3\Test\MasterCalendar DASH.xlsx'
    $sourceFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.xlsx'
    $csvFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.csv'
    
    $xl = New-Object -ComObject Excel.Application
    
    if(Test-Path $originFile){      
    
        Move-item –path $originFile –destination $sourceFile -Force
    
        $wb = $xl.Workbooks.Open($sourceFile)
        $wb.worksheet[1].SaveAs($csvFile,[Microsoft.Office.Interop.Excel.xlFileFormat]::xlWorkbookDefault))
        $wb.Close($True)
        
    }else{
        Write-Warning 'Original file not found'
    }
    
    $xl.Quit()
    exit
    

    Here are some resources to help you get started learning PowersHell.

    Read both and you will become a PS guru overnight.


    \_(ツ)_/

    Thursday, April 30, 2020 11:30 PM
  • The problem is that source excel has few hidden columns in between actual columns of data (say there are 7 hidden columns), these columns do not have a header row, unlike the columns with actual data that have a header.

    These hidden columns get migrated into CSV file as well and result in DataStore throwing error in ADF (Copy activity) due to mismatch in number of columns. ADF Copy activuty works perfectly fine if these hidden columns are deleted from CSV.

    I modified the script a bit to read only selected columns from CSV and convert it into another CSV file with continuous columns of data. But this one is also giving error. Modified script is :

        
    $sourceFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.xlsx'
    $csvFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.csv'
    $csvFile1 = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH_SC.csv'
    
        $xl = New-Object -ComObject Excel.Application
    
        $wb = $xl.Workbooks.Open($sourceFile)
        $ExcelWksheet = $wb.Worksheets.item(1)
        $ExcelWksheet.SaveAs($csvFile,[Microsoft.Office.Interop.Excel.xlFileFormat]::xlWorkbookDefault)
        $wb.Close($True)
    
       $xl.Quit()
    
       Import-Csv $csvFile | select 1469,2217,1314 | Export-Csv -Path $csvFile1 -NoTypeInformation
    
       exit

    Wherein 1469, 2217 and 1314 are column names to read. But this one is throwing below error :

    =================================================================
    select : Cannot convert System.Int32 to one of the following types {System.String, System.Management.Automation.ScriptBlock}.
    At line:23 char:23
    + Import-Csv $csvFile | select 1469,2217,1314 | Export-Csv -Path $csvFi ...
    +                       ~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : InvalidArgument: (:) [Select-Object], NotSupportedException
        + FullyQualifiedErrorId : DictionaryKeyUnknownType,Microsoft.PowerShell.Commands.SelectObjectCommand

    ==============================================================

    Can some one help is resolving this error so that my script reads only selected columns from CSV file and creates another CSV.



    • Edited by LataC Friday, May 1, 2020 1:09 AM
    Friday, May 1, 2020 12:56 AM
  • Sorry - it appears that you are trying to convert an XLSX to a CSV. Is that what you are trying to do?

    To do that just open the file and convert it on output.  If there are hidden cells then just unhide to whole sheet.  No need to think in ranges.  Excel will manage all of that for you.

    $wb.Worksheets[1].Columns | ForEach-Object{ $_.EntireColumn.Hidden = $false }

    $originFile = '\\aaaa\share3\Test\MasterCalendar DASH.xlsx'
    $sourceFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.xlsx'
    $csvFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.csv'
    
    $xl = New-Object -ComObject Excel.Application
    
    if(Test-Path $originFile){      
    
        Move-item –path $originFile –destination $sourceFile -Force
    
        $wb = $xl.Workbooks.Open($sourceFile)
        $wb.Worksheets[1].Columns | 
            ForEach-Object{$_.EntireColumn.Hidden = $false}
        $wb.worksheet[1].SaveAs($csvFile,[Microsoft.Office.Interop.Excel.xlFileFormat]::xlCsv)
        $wb.Close($True)
        
    }else{
        Write-Warning 'Original file not found'
    }
    
    $xl.Quit()
    
    Import-Csv $csvFile
    


    \_(ツ)_/

    Friday, May 1, 2020 1:13 AM
  • Sorry I am new to Scripting and not sure if I am able to articulate my problem correctly.

    The scenario is that script needs to read an excel file, copy it to another location and convert it to CSV. This CSV would be used as a source in ADF pipeline - Copy activity.

    The problem I am facing is that there are few hidden columns / extra columns without header in the excel file, these columns make their way into CSV file, which causes the ADF copy activity to fail.

    What I need is to remove those hidden / extra columns from CSV file so that ADF pipeline works fine.

    This is what the script needs to achieve.

    Please let me know if any other information is required on the issue, would you please suggest how this functionality can be achieved in the PowerShell script.

    Friday, May 1, 2020 1:27 AM
  • Are they "hidden" or just collapsed?

    If they are truly hidden then they should not be copied to the CSV.

    So your real question is "how to not copy columns that are either hidden or collapsed"?  Is that correct?


    \_(ツ)_/

    Friday, May 1, 2020 1:32 AM
  • Just rechecked the excel, these columns are neither hidden nor collapsed, but present in between without any data in header and other rows, so should not be copied in the CSV file.

    Apologies for the confusion.

    Friday, May 1, 2020 1:38 AM
  • So now the plot thickens. You will have to find the columns with no header and delete the column

    $wb.worksheet[1]..UsedRange.Columns|
         Where-Object{-not $_.Cells[1].Value2}|
         ForEach-Object{$_.Delete()}


    \_(ツ)_/


    • Edited by jrv Friday, May 1, 2020 2:05 AM
    Friday, May 1, 2020 2:04 AM
  • $sourceFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.xlsx'
    $csvFile = 'C:\Users\vermarachna\Documents\Rachna\17.Load Radiotherapy\25Apr2020\Development\MasterCalendar DASH.csv'
    
    $xl = New-Object -ComObject Excel.Application
    
    $wb = $xl.Workbooks.Open($sourceFile)
    
    $ExcelWksheet = $wb.Worksheets.item(1)
    
    $ExcelWksheet.UsedRange.Columns| Where-Object{-not $_.Cells[1].Value2}| ForEach-Object{$_.Delete()} 
    
    $wb.worksheet[1].SaveAs($csvFile,[Microsoft.Office.Interop.Excel.xlFileFormat]::xlCsv)
    
    $wb.Close($True)$
    $xl.Quit()
    
    exit

     I updated the script as above and executed. Getting below error :

    =============================================================

    Cannot index into a null array.
    At line:18 char:5
    +     $wb.worksheet[1].SaveAs($csvFile,[Microsoft.Office.Interop.Excel. ...
    +     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
        + FullyQualifiedErrorId : NullArray

    Friday, May 1, 2020 4:04 AM
  • Why?  My code was correct and yours is a very bad rearrangement for no reason I can see.

    Start small.  Open a workbook and save it.  Learn how to do that.  Use my example to understand how to do it.  Your changes are just arbitrary and add nothing but show that you have little PowerShell experience and no Excel experience.  I did my code using the cor4ect methods for managing a worksheet.

    When learning g something new in technology try each step until you understand what it does and why.  The online documentation will give you the rules and an understanding of what each thing intends to do.  You will only lack the PowerShell version.

    #open
    $wb = $xl.Workbooks.Open($sourceFile)
    #save
    $wb.worksheet[1].SaveAs($csvFile,[Microsoft.Office.Interop.Excel.xlFileFormat]::xlCsv)
    #close
    $wb.Close($True)
    

    There is no need to rearrange any of that.


    \_(ツ)_/

    Friday, May 1, 2020 4:23 AM