locked
Powershell script to read coulumn 2 data from .xlsx file RRS feed

  • Question

  • I'm trying to write a PowerShell script to get of 2 column of a .xlsx file. Can anyone help me here.

    My .xlsx file looks like below

    ProductName ProductCode Java 7 Update 75 {26374892-xxxx-xxxx-xxxx-123456789012} Java 8 Update 25 {26374892-xxxx-xxxx-xxxx-902341562789}

    I want output which will write only product codes line by line
    {26374892-xxxx-xxxx-xxxx-123456789012}
    {26374892-xxxx-xxxx-xxxx-902341562789}




    Tuesday, July 3, 2018 2:06 PM

Answers

  • The following will work in all current versions of PowerShell and Excel.

    # load file
    $FilePath = 'd:\test\myfile.xlsx'
    $xl = New-Object -ComObject Excel.Application
    $xl.Visible = $true
    $wb = $xl.Workbooks.Open($filepath)
    
    # get data from column 2
    $data = $wb.Worksheets['Sheet1'].UsedRange.Rows.Columns[2].Value2
    
    # cleanup
    $wb.close()
    $xl.Quit()
    While([System.Runtime.Interopservices.Marshal]::ReleaseComObject($wb) -ge 0){}
    while([System.Runtime.Interopservices.Marshal]::ReleaseComObject($xl) -ge 0){}
    Remove-Variable xl,wb # this is optional # display results $data | select -skip 1 # remove header


    \_(ツ)_/




    • Edited by jrv Wednesday, July 4, 2018 4:34 AM
    • Proposed as answer by jrv Wednesday, July 4, 2018 4:34 AM
    • Marked as answer by jrv Monday, July 15, 2019 6:54 PM
    Tuesday, July 3, 2018 7:25 PM

All replies

  • Check out the ImportExcel module for PowerShell. It can read the xlsx file and put it into a variable.

    Then you're set to do whatever with the data.

    Tuesday, July 3, 2018 2:33 PM
  • looks like you gave up on the CSV dream; huh?
    Tuesday, July 3, 2018 2:36 PM
  • looks like you gave up on the CSV dream; huh?
    I'm unable to save the above format data in .csv file, so trying for .xlsx file
    Tuesday, July 3, 2018 2:38 PM
  • $xl = New-Object -COM "Excel.Application"
    $xl.Visible = $false
    $wb = $xl.Workbooks.Open("C:\neato.xlsx")
    $ws = $wb.Sheets.Item(1)
    
    "this is where you do your magic"
    
    $wb.Close()
    $xl.Quit()
    [System.Runtime.Interopservices.Marshal]::ReleaseComObject($xl)

    Tuesday, July 3, 2018 3:23 PM
  • Tuesday, July 3, 2018 3:27 PM
  • This one Requires Powershell Version 5.

    So check your version using $PSVersionTable, and if it's not 5, update it.

    #require version 5
    class Javaupdate{
        [string]$ProductName
        [string]$ProductCode
    }
    
    $global:Rows=1000
    
    #Absolute File path location of the excel file
    $Global:FilePath = "D:\Libs\Desktop\ExcelFilter\Book1.xlsx"
    
    $outputArray=@()
    function ReadDataInSpreadSheets{
        [CmdletBinding()]
        param(
            [Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]$WorkSheet
        )
        begin{
            $output=@()
            $WSname= $WorkSheet.Name
            $WorkSheet = $WorkBook.sheets.item($WorkSheet)
            #Activate the Wanted sheet
    		$WorkSheet.Activate() | Out-Null
        }
        process{
            #read each row and if it's empty stop the process
            for($row=2; $row -le $global:Rows; $row++){
                $complete = $row/$global:Rows * 100;
                Write-Progress -Activity "Loading Data in $WSname" -Id 1 -PercentComplete $complete
    
                #if the value of action is empty, assume that there's no more data and break the collection of the information from file
    			if([string]::IsNullOrEmpty($WorkSheet.Cells.Item($row,1).Text)){
                    Write-Progress -Activity "Loading Data in $WSname" -Id 1 -PercentComplete 100 -Completed
                    break;
    			}
    
                $newobject = New-Object -TypeName Javaupdate
                #column 2 (A2 Producname)
    		    $newobject.ProductName = $WorkSheet.Cells.Item($row,1).Text;
    		    #column 3 (B2 ProductCode)
                $newobject.ProductCode = $WorkSheet.Cells.Item($row,2).Text;
                $output+=$newobject
            }
        }
        end{
            return $output
        }
    
    }
    
    # Create an Object Excel.Application using Com interface
    $readExcel = New-Object -ComObject Excel.Application
    # Disable the 'visible' property so the document won't open in excel
    $readExcel.Visible = $true
    
    # Open the Excel file and save it in $WorkBook
    $WorkBook = $readExcel.Workbooks.Open($Global:FilePath)
    
    #Get All WorkSheets in the Book
    $WorkSheetsName=@()
    write-host -ForegroundColor Cyan "Loading Worksheets names"
    foreach($item in $workBook.Worksheets){
        write-host -ForegroundColor Gray -BackgroundColor DarkBlue "Working with $($item.Name)"
    	$WorkSheetsName+=$item.Name
    }
    
    #Read data in SpreadSheets
    foreach($wb in $WorkSheetsName){
        $outputArray+= ReadDataInSpreadSheets $wb
    }
    $outputArray | select -ExpandProperty ProductCode
    
    Write-Host -BackgroundColor DarkGray -ForegroundColor Black "Use `$outputArray variable, to query the whole object"
    $WorkBook.close()
    $readExcel.Quit()
    [System.Runtime.Interopservices.Marshal]::ReleaseComObject($readExcel) | Out-Null
    
    Remove-Variable -Scope Global Rows,FilePath
    Remove-Variable ReadExcel,workbook,WorksheetsName

    My excel file have 2 columns: 'ProductName' and 'Productcode' in just 1 worksheet, but the script have the ability to do the same job in all spreadsheets so, all spreadsheets on the excel file MUST have the same notation. 

    Also, it will look for 1000 as maximum rows, if you want to add more you can until the excel limits 1048576 (for 2016). (Can be changed in Line 7).

    And in line 10, you can set the absolute path of your XLSX file.

    This is the output:




    • Edited by j0rt3g4 Tuesday, July 3, 2018 7:07 PM
    • Proposed as answer by j0rt3g4 Tuesday, July 3, 2018 7:07 PM
    • Unproposed as answer by jrv Wednesday, July 4, 2018 4:35 AM
    • Proposed as answer by jrv Wednesday, July 4, 2018 4:35 AM
    Tuesday, July 3, 2018 7:05 PM
  • The following will work in all current versions of PowerShell and Excel.

    # load file
    $FilePath = 'd:\test\myfile.xlsx'
    $xl = New-Object -ComObject Excel.Application
    $xl.Visible = $true
    $wb = $xl.Workbooks.Open($filepath)
    
    # get data from column 2
    $data = $wb.Worksheets['Sheet1'].UsedRange.Rows.Columns[2].Value2
    
    # cleanup
    $wb.close()
    $xl.Quit()
    While([System.Runtime.Interopservices.Marshal]::ReleaseComObject($wb) -ge 0){}
    while([System.Runtime.Interopservices.Marshal]::ReleaseComObject($xl) -ge 0){}
    Remove-Variable xl,wb # this is optional # display results $data | select -skip 1 # remove header


    \_(ツ)_/




    • Edited by jrv Wednesday, July 4, 2018 4:34 AM
    • Proposed as answer by jrv Wednesday, July 4, 2018 4:34 AM
    • Marked as answer by jrv Monday, July 15, 2019 6:54 PM
    Tuesday, July 3, 2018 7:25 PM
  • Most of the code is explaining what are the lines for.

    I can do it in one line :) the problem is that the people doesn't know the details.

    and I know you're an expert answering and treating people badly, but I'm not like that, once again, My code works fine for PSv5 because it uses a class template to save all. I can use a regular object to make it work in all versions, but the best is to teach people how to use the actual technologies instead of the old ones.

    My code illustrates how to use classes, how to create a class, how to define classes and fill out its properties. What does yours teach?, not even a comment to say that the Worksheet MUST be called "Sheet1" if it changes your code will break. Mine opens every single worksheet inside of a workbook. I'm sorry but your code is shorter but doesn't teach a thing.

    People are here to get answers and learn, not just to get the shorter answer's possible.

    That's my point of view.


    • Edited by j0rt3g4 Tuesday, July 3, 2018 9:50 PM
    Tuesday, July 3, 2018 9:50 PM
  • Most of the code is explaining what are the lines for.

    I can do it in one line :) the problem is that the people doesn't know the details.

    People are here to get answers and learn, not just to get the shorter answer's possible.

    That's my point of view.


    Non-programmers and new programmers who have no formal programming training tend to over-comment and to add many lines of unnecessary code. Another great weakness of non/new-programmers is the unnecessary use of functions.  All of this makes code even harder to understand and produces too many places where errors creep in.

    The request asked for one simple thing. Get the values in column 2.  This requires one line of native Excel Interop code:

    $data =$wb.Worksheets['Sheet1'].UsedRange.Rows.Columns[2].Value2

    All other lines are standard for opening a file and exiting Excel.  The pattern is simple and can be reused repeatedly with only a need to modify the one action line.

    The code I posted is how we would do this in any Net language.

    The reason I posted was not to criticize but to show how taking the simple path makes code easier to understand and more reliable.

    After you have about 5 years of technical programming this will become obvious. For now just absorb the lesson and try to understand why use formally trained systems people pu sh for a clean and simple approach to solving technical problems.  When you understand these things you will see why programming has evolved from the days of Dartmouth Basic to systems of object programming.

    There are many good new books on technical programming but the old ones are still valid although the techniques and language support for the formal rules have evolved.

    Current languages are now implementing full support for the following concepts and PowerShell is right in the mix.

    https://en.wikipedia.org/wiki/Functional_programming

    One of the first rules of "mathematical" programming is "don't create variables or functions when you don't need to.

    The action line of code needs nothing because it is a direct way to extract a property from the object model.  It is how Excel is designed to work. Old thinking about assigning everything to variables is just that - old thinking.  It was actually never a requirement or even useful in any era of programming once we had object support in our languages.  All modern languages support object syntaxes and rules.

    Spend some of your free time researching programming in a formal sense.  Don't rely on examples form others who are not technical programmers.  Their examples may work but you will not evolve as a coder by only learning from other non-technical coders.

    Good luck and have fun coding.  "Functional programming" adds loads of fun to coding and will help you think in a more efficient way.  Consider it the "Zen" of programming.


    \_(ツ)_/

    Tuesday, July 3, 2018 10:15 PM
  • But you are not the only denier of knowledge on the planet. 

    Your choice.  Live in the last century if it makes you happy.


    \_(ツ)_/

    Wednesday, July 4, 2018 12:07 AM
  • "Commented code" is an actual coding strategy. So I think you are the one that kept in the last century.

    And Hardcoding is pretty actual in 1990

    • Edited by j0rt3g4 Wednesday, July 4, 2018 2:15 AM
    Wednesday, July 4, 2018 1:58 AM
  • "Commented code" is an actual coding strategy. So I think you are the one that kept in the last century :)

    Yes but that is not what I am referring to.  Commenting has been a debated issue for many decades.  The current "best practices" for commenting specify limited commenting and correct use of nomenclatures for variable and function naming.  This is why PS strongly states that CmdLets and major functions should follow the Verb=Noun requirement.

    Commenting every line or commenting the obvious should always be avoided. If you are writing a blog or creating an example then the commenting rules are not in play.

    My suggestions are to avoid unnecessary use of interim variables and single use functions  even in examples.  Function rules can be relaxed for complex blocks of code when they make the flow of control or flow steps easier to understand.  Unfortunately this is not well implemented by inexperienced programmers.

    In most professional IDEs we have the ability to do automatic refactoring of code which is generally extremely useful and yet, it al most, never refactors to a function unless the code structure is so redundant that the factoring tools see it as an improvement and simplification of the code.

    Note that I was not criticizing your code.  I was simply showing how a simpler approach would be more understandable and, in fact, show how easy it is to do basic things in Excel.


    \_(ツ)_/

    Wednesday, July 4, 2018 2:09 AM
  • What is obvious to you, can just be not obvious to anybody.

    So you come here "god mode" and try to think about other people like learning people.

    So I'm sick and tired of talking to a dinosaur. You have your ways I have mine, and I commented every line because I want to be clearer, if it was better your code or mine, we shouldn't decide that. So respect that. That's it. 

    In notepad++ or ISE can be checked the parts
    The structure is always the same if you think is too complex, well it's something you need to fix


    • Edited by j0rt3g4 Wednesday, July 4, 2018 4:24 AM
    Wednesday, July 4, 2018 4:19 AM
  • What is obvious to you, can just be not obvious to anybody.

    So you come here "god mode" and try to think about other people like learning people.

    So I'm sick and tired of talking to a dinosaur. You have your ways I have mine, and I commented every line because I want to be clearer, if it was better your code or mine, we shouldn't decide that. So respect that. That's it. 

    In notepad++ or ISE can be checked the parts
    The structure is always the same if you think is too complex, well it's something you need to fix


    Why is it necessary for you to get nasty.  I was not criticizing you.  You can do whatever you want including ignoring any good advise that makes you uncomfortable. Your  choice.  Just don't blame your anger on others.  It is not very friendly or professional.

    I hope you can enjoy coding and all of its details some day.  Good luck.


    \_(ツ)_/

    Wednesday, July 4, 2018 4:27 AM
  • Most of the code is explaining what are the lines for.

    I can do it in one line :) the problem is that the people doesn't know the details.

    Hello.  Not to resurrect an old discussion, but as a person who is learning how to use PowerShell for this purpose (as in I could not just create my own code due to lack of current knowledge), the "excessive" commenting is appreciated.

    Now, I would like to be clear that I am not taking sides.  I have always felt colleges teach you to "excessively" comment code, but that is for the purpose of showing what you think it will do.  We used to be taught to do it for the next coder so they don't waste time deciphering.  That said, I felt if you CAN CODE, this is a complete waste of space, time, and energy.

    As the OP originally didn't know how to do what they were trying, I feel that explaining the code, espcially coding unique to the OP's in this instance is valid and helpful.  The OP can always utilize the code and leave out the commenting on the actual project, but it is helpful to see an explaination of what everything is doing.

    Once again, I am not saying anyone is right or wrong.  I think the communication was lost between the decision of how the commenting was used:  teaching vs. real world application.

    Thank you to each of you for your inputs in this matter.

    Monday, July 15, 2019 6:05 PM
  • Thanks for posting.  I wound up using this because it was shortest.  I have to mod it some for what I need, but it will get the job done.  I would use the above one if my needs were more complicated.

    Monday, July 15, 2019 6:33 PM