locked
Powershell - Extract contents between sections in word doc RRS feed

  • Question

  • We have successfully extracted contents from word doc using PowerShell for search terms.

    But now we have to extract a particular section from Word document. Extract 30.5 complete section (points a (i and ii) and b (i and ii) needs to be extracted).

    30.5    Team Structure for BI Project
    (a) 2 teams will be created for the project:
    (i) Database Professionals will be responsible for backend;
    (ii) Quality assurance professionals will be responsible for quality;
    
    (b) Following environments will be provided
    (i) Virtual Machine will be provided to all developers
    (ii) Separate sever will be created for Quality assurance team
    
    30.6    Process of Deployment
    

    Following piece of code we have used in past, but how to extract complete section:

    Get-Content $SourceFileName | Select-String -Pattern $keyword  
    
    $F = Select-String -Path "seach.doc" -Pattern "Team Structure for BI Project" -Context 0, 10
    Issue with above code is that I don't how much data will be there in one section. In this I have to extract everything in section 30.5
    Friday, August 3, 2018 3:25 PM

All replies

  • Word documents are binary and cannot be searched with string methods.

    You can use the Open Office SDK or the Word COM object to find and extract elements of Word documents.


    \_(ツ)_/

    Friday, August 3, 2018 3:32 PM
  • I'm struggling how to find sections using this:

    $word = New-Object -comobject Word.Application

    Friday, August 3, 2018 3:33 PM
  • This is what a word DOC file looks like in a text editor:


    \_(ツ)_/

    Friday, August 3, 2018 3:37 PM
  • You have to learn how to use the Word object model:

    Here is an example in PowerShell: http://powershelltutorial.net/technology/script/Powershell-With-Word


    \_(ツ)_/

    Friday, August 3, 2018 3:40 PM