none
Selecting from string

    Question

  • I would need to select all characters to the end of string starting from a given pattern. How can this be done?

    yaro

    Tuesday, May 15, 2018 9:02 AM

Answers

  • You have certainly omitted a lot in your question

    Either way after reading it here's the answer.

    #text file with you're information
    $file = Get-Content .\text.txt
    
    #Pattern to match everything after a given word, (SERVEPTP1) in this case
    $pattern ="^.*(?<=(SERVEPTP1))(?s)(.*$)"
    #Regex Object
    [System.Text.RegularExpressions.Regex]$regex = New-object System.Text.RegularExpressions.Regex -ArgumentList ($pattern, [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    #Do the Matches
    [System.Text.RegularExpressions.MatchCollection]$collection = $regex.Matches($file)
    #Define an empty string for the output
    [string]$result=[System.String]::Empty
    #just get the group 1 and 2 (1 it's the word) and (2 is the rest of the text)
    $result = "$($collection.Groups.GetValue(1))$($collection.Groups.GetValue(2))"
    
    #Print it
    $result
    
    #remove all variables but result.
    Remove-Variable file,pattern,regex,collection

    The result is:


    • Edited by j0rt3g4 Thursday, May 17, 2018 12:17 AM
    • Proposed as answer by j0rt3g4 Thursday, May 17, 2018 12:17 AM
    • Marked as answer by yaro137 Monday, May 21, 2018 8:21 AM
    Thursday, May 17, 2018 12:16 AM

All replies

  • 'this is an abc string'  -match '(abc.*)'
    $matches[1]


    \_(ツ)_/

    • Proposed as answer by BOfH_666 Tuesday, May 15, 2018 10:00 AM
    Tuesday, May 15, 2018 9:08 AM
    Moderator
  • Sorry forgot to mention it's a multiline string i'm talking about.

    yaro

    Tuesday, May 15, 2018 11:05 AM
  • $lines | 
         ForEach-Object{
              'this is an abc string'  -match '(abc.*)'
              $matches[1]
    
         }
    


    \_(ツ)_/

    Tuesday, May 15, 2018 11:20 AM
    Moderator
  • Sorry jrv but I'm not sure if we're on the same page. Let's assume my string is a whole page from a book and what I'd like to do is to find a combination of characters in the text that I know occurs only once on that page. When the word is found I'd like to select all the rest of the page including the word and put it in a new variable. I was hoping there is some way to easily define end of string including new lines so the code goes right to the last dot.

    yaro

    Wednesday, May 16, 2018 9:06 AM
  • You are making a lot of assumptions about a page of text that just don't happen to be true.  A page does not have a "last dot" that is at the end of the page.  A file of text has no pages.  Formatted text files like Word are not text.  Word and other structured documents may have pages that are defined by the program that creates the document.

    You can learn Regular Expressions and use them to extract text in almost any way you want with some effort.

    The code I posted will select a string of characters to the end of thee line or the end of the page depending on the options you set in the RegEx.


    \_(ツ)_/


    Wednesday, May 16, 2018 9:10 AM
    Moderator
  • It's nothing to do with formatted text. It's a config file in this case I need to parse. OK so in your example $lines is a string consisting of lines of text. In that case what's 'this is an abc string' ? shouldn't it just be $_ ?

    yaro

    Wednesday, May 16, 2018 9:21 AM
  • Parsing lines in a config file is done based on the format of the config file.

    Post an example of the config file so I can see the format.


    \_(ツ)_/

    Wednesday, May 16, 2018 9:26 AM
    Moderator
  • Oh it's a simple multiline text nothing special to it.

    e.g. where I would need to select all following "SERVERPTP1 (" but the number of lines following this pattern may vary which is why I'm looking to somehow tell PS to go to end of string rather than to select say $lines[8-12]

    SOURCE0 ( ) f PTPDOMAIN=0; PTPCLIENTVERSION=2; IFACE= e t h 0 ; g
    2 SOURCE1 ( ) f NTPSERVER= 1 0 . 5 . 3 . 4 5 ; g
    3 SERVEPTP0 ( ) f
    4 PTPSERVERVERSION=2
    5 PTPSERVERDOMAIN=0
    8
    6 PTPSERVERSYNCRATE=0.9
    7 IFACE= e t h 1
    8 }
    9 SERVEPTP1 ( ) f
    10 PTPSERVERVERSION=2
    11 PTPSERVERDOMAIN=0
    12 PTPSERVERSYNCRATE=0.9
    13 IFACE= e t h 2
    14 }


    yaro

    Wednesday, May 16, 2018 9:57 AM
  • Just read in a loop until the SERVERPTP1 is detected then output all lines to the end.  This has nothing to do with documents and pages.

    I don't think the file has line numbers.

    loop until SERVERPTP1 and output plus all subsequent lines.


    \_(ツ)_/

    Wednesday, May 16, 2018 10:02 AM
    Moderator
  • If you use -Raw parameter of Get-Content:
    Get-Content $File -Raw
    the text is not stored in an array but in a simple string. Linebreaks are stored as characters. Now, you can use multiline Regex-patterns to select content. If you still want to use single-line patterns, you can replace the linebreaks with any sign (of course a sign that is not used in text or a combination that does not occur before, e.g. "##newline##"). After finding your text, you can replace this sign back to linebreaks.
    Wednesday, May 16, 2018 10:35 AM
  • You have certainly omitted a lot in your question

    Either way after reading it here's the answer.

    #text file with you're information
    $file = Get-Content .\text.txt
    
    #Pattern to match everything after a given word, (SERVEPTP1) in this case
    $pattern ="^.*(?<=(SERVEPTP1))(?s)(.*$)"
    #Regex Object
    [System.Text.RegularExpressions.Regex]$regex = New-object System.Text.RegularExpressions.Regex -ArgumentList ($pattern, [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    #Do the Matches
    [System.Text.RegularExpressions.MatchCollection]$collection = $regex.Matches($file)
    #Define an empty string for the output
    [string]$result=[System.String]::Empty
    #just get the group 1 and 2 (1 it's the word) and (2 is the rest of the text)
    $result = "$($collection.Groups.GetValue(1))$($collection.Groups.GetValue(2))"
    
    #Print it
    $result
    
    #remove all variables but result.
    Remove-Variable file,pattern,regex,collection

    The result is:


    • Edited by j0rt3g4 Thursday, May 17, 2018 12:17 AM
    • Proposed as answer by j0rt3g4 Thursday, May 17, 2018 12:17 AM
    • Marked as answer by yaro137 Monday, May 21, 2018 8:21 AM
    Thursday, May 17, 2018 12:16 AM
  • j0rt3g4 could you please reveal the symbols in $pattern in plain English :)? ^.*

    would this be everything from start and .*$ everything to end? Then ?<=(SERVEPTP1)

    would be looking for an occurrence of SERVEPTP1 including that occurance?

    not sure about ?s. BTW the stuff in square brackets always puts me back

    not looking very powershelly and legible ;) I know it's just my poor knowledge of PS

    but am I right in this case it's a way of compressing something that would

    require much more code?


    yaro

    Thursday, May 17, 2018 9:21 AM
  • You can get it on the web: regex101.com if you put the regular expression there.




    • Edited by j0rt3g4 Friday, May 18, 2018 5:23 AM
    Friday, May 18, 2018 5:19 AM
  • How about this?

    The variable "$s" will contain "`r`n" line terminators.

    The "(?sm)" regex modifier will match single and multi-lines.

    # returns all characters FOLLOWING the last 'hello'
    $p = "(?sm)^.+hello(.+)$"
    # returns the last 'hello' and all characters FOLLOWING the last 'hello'
    $p1 = "(?sm)^.+(hello)(.+)$"
    
    $s = Get-Content c:\temp\Lines.txt -raw
    
    $r = $s -replace "$p", '$1'
    $r1 = $s -replace "$p1", '$1$2'


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Sunday, May 20, 2018 3:46 AM