none
Finding all text between quotes on a line of text using Powershell regex RRS feed

  • Question

  • Hi, I'm looking to filter some information from the source code of some website. I need to be able to output any text between quote marks - have tried very hard to get it to work but cannot. The information I need is all in one of many lines on the page which begins with the text: new Module_Members(71933, {"group_by_titles"

    Here is an excerpt, I'm mainly interested in names, post_count, datejoined, location

    {"team_id":"15212","post_count":"29","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 14","user_id":"776664","group_name":null,"name":"David","displayname":"David","location":"GB"} {"team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}

    The best I could come up with is this:

    $file=gcC:\Users\desktop\sr.htm|select-string-pattern"group_by_titles"

    $matches_found = @()

    $file   |%{

    if($_-match'(?<=")[^"]*(?=")'){$matches_found+=$matches[1]}

    }

    $matches

    output is:

    Name                           Value                                                                                              

    ----                           -----                                                                                              

    0                              group_by_titles    

        




    • Edited by Quarkspace Monday, February 23, 2015 3:49 PM too much at bottom
    Monday, February 23, 2015 3:48 PM

Answers

  • JSON cannot be parsed by regular expressions because it's not a regular language, see http://en.wikipedia.org/wiki/Regular_language

    If ConvertFrom-Json doesn't work it's likely because of one or a few small syntax errors in the JSON itself. In your OP you have the block

    {"team_id":"15212","post_count":"29","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 14","user_id":"776664","group_name":null,"name":"David","displayname":"David","location":"GB"} {"team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}

    which throws an error in Powershell because it is actually two objects.

    If you wrap them in an array like

    [{"team_id":"15212","post_count":"29","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 14","user_id":"776664","group_name":null,"name":"David","displayname":"David","location":"GB"}, {"team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}]
    

    it works correctly.

    It'll be much easier to fix your JSON data than to somewhat haphazardly parse all the special cases of quotes-within-quotes in regular expressions.

    • Marked as answer by Quarkspace Tuesday, February 24, 2015 1:40 PM
    Tuesday, February 24, 2015 11:27 AM
  • It's Key:Value pairs, so if you can get them split out and in Key=Value format, you can use ConvertFrom Stringdata:

    $text = '{"team_id":"15212","post_count":"29","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 14","user_id":"776664","group_name":null,"name":"David","displayname":"David","location":"GB"} {"team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}'
    
    $text.split('}')|
     foreach  {
     $StringData = 
      $_ -split '",|,"' -replace '<.+?>|{"' -replace '":"?','=' -replace '"' -join "`n"
    New-Object -TypeName PSCustomObject -Property (ConvertFrom-StringData $StringData)
    }
    
    
    displayname : David
    team_id     : 15212
    datejoined  : May 8, 14
    group_name  : null
    user_id     : 776664
    chatstatus  : online
    location    : GB
    lastseen    : Online Now
    post_count  : 29
    name        : David
    
    displayname : Peter
    team_id     : 15112
    datejoined  : May 8, 13
    group_name  : null
    user_id     : 776624
    chatstatus  : online
    location    : US
    lastseen    : Online Now
    post_count  : 33
    name        : Peter


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    • Marked as answer by Quarkspace Tuesday, February 24, 2015 1:40 PM
    Tuesday, February 24, 2015 11:54 AM
    Moderator

All replies

  • It looks like JSON formatted data, so you can use ConvertFrom-JSON to do all the hard work. Here's an example:

    Get-Content "test.txt" | ForEach-Object {
        ConvertFrom-Json $_ | Select-Object name,post_count,datejoined,location
    }
    


    Monday, February 23, 2015 4:11 PM
  • A JSON file will look like this:

    [
       {
          "team_id":"15212",
          "post_count":"29",
          "chatstatus":"online",
          "lastseen":"<div class=\"online\">Online Now<\/div>",
          "datejoined":"May 8, 14",
           "user_id":"776664",
          "group_name":null,
          "name":"David",
          "displayname":"David",
           "location":"GB"
       },
       {
          "team_id":"15112",
          "post_count":"33",
          "chatstatus":"online",
          "lastseen":"<div class=\"online\">Online Now<\/div>",
          "datejoined":"May 8, 13",
          "user_id":"776624",
          "group_name":null,
          "name":"Peter",
          "displayname":"Peter",
          "location":"US"
       }
    ]
    

    Of course it can be unformatted and it can be more complex.

    To convert a trye Json file just convert it.

    Get-Content <jsonfile> | ConvertFrom-Json

    You will get back objects.


    ¯\_(ツ)_/¯


    • Edited by jrv Monday, February 23, 2015 4:23 PM
    Monday, February 23, 2015 4:21 PM
  • Thanks.  does look like JSON code but the Powershell ConvertFrom-Json doesn't seem to like it: when I point it to the sample text, an htm file with the source code in and finally at the website URL (please highlight very faint text to see)

    PS U:\> $file = gc C:\Users\sbbh4\desktop\sr.txt | convertfrom-json


    convertfrom-json : Invalid JSON primitive: "team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div

    class=\"online\">Online Now<\/div>","datejoined":"May 8,

    13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}.

    At line:1 char:46

    + $file = gc C:\Users\sbbh\desktop\sr.txt | convertfrom-json

    +                                              ~~~~~~~~~~~~~~~~

        + CategoryInfo          : NotSpecified: (:) [ConvertFrom-Json], ArgumentException

        + FullyQualifiedErrorId : System.ArgumentException,Microsoft.PowerShell.Commands.ConvertFromJsonCommand

    $file = gc C:\Users\sbb\desktop\sr.htm | select-string -pattern "group_by_titles"


    convertfrom-json : Invalid JSON primitive: new.

    At line:2 char:9

    + $file | convertfrom-json

    +         ~~~~~~~~~~~~~~~~

        + CategoryInfo          : NotSpecified: (:) [ConvertFrom-Json], ArgumentException

        + FullyQualifiedErrorId : System.ArgumentException,Microsoft.PowerShell.Commands.ConvertFromJsonCommand

    PS U:\> $j = Invoke-WebRequest -Uri icantpostlinks| ConvertFrom-Json


    ConvertFrom-Json : Invalid JSON primitive: .

    At line:1 char:72

    + $j = Invoke-WebRequest -Uri icantpostlinks | ConvertFr ...

    +                                                                        ~~~~~~~~~

        + CategoryInfo          : NotSpecified: (:) [ConvertFrom-Json], ArgumentException

        + FullyQualifiedErrorId : System.ArgumentException,Microsoft.PowerShell.Commands.ConvertFromJsonCommand

    Tuesday, February 24, 2015 9:44 AM
  • JSON cannot be parsed by regular expressions because it's not a regular language, see http://en.wikipedia.org/wiki/Regular_language

    If ConvertFrom-Json doesn't work it's likely because of one or a few small syntax errors in the JSON itself. In your OP you have the block

    {"team_id":"15212","post_count":"29","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 14","user_id":"776664","group_name":null,"name":"David","displayname":"David","location":"GB"} {"team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}

    which throws an error in Powershell because it is actually two objects.

    If you wrap them in an array like

    [{"team_id":"15212","post_count":"29","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 14","user_id":"776664","group_name":null,"name":"David","displayname":"David","location":"GB"}, {"team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}]
    

    it works correctly.

    It'll be much easier to fix your JSON data than to somewhat haphazardly parse all the special cases of quotes-within-quotes in regular expressions.

    • Marked as answer by Quarkspace Tuesday, February 24, 2015 1:40 PM
    Tuesday, February 24, 2015 11:27 AM
  • It's Key:Value pairs, so if you can get them split out and in Key=Value format, you can use ConvertFrom Stringdata:

    $text = '{"team_id":"15212","post_count":"29","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 14","user_id":"776664","group_name":null,"name":"David","displayname":"David","location":"GB"} {"team_id":"15112","post_count":"33","chatstatus":"online","lastseen":"<div class=\"online\">Online Now<\/div>","datejoined":"May 8, 13","user_id":"776624","group_name":null,"name":"Peter","displayname":"Peter","location":"US"}'
    
    $text.split('}')|
     foreach  {
     $StringData = 
      $_ -split '",|,"' -replace '<.+?>|{"' -replace '":"?','=' -replace '"' -join "`n"
    New-Object -TypeName PSCustomObject -Property (ConvertFrom-StringData $StringData)
    }
    
    
    displayname : David
    team_id     : 15212
    datejoined  : May 8, 14
    group_name  : null
    user_id     : 776664
    chatstatus  : online
    location    : GB
    lastseen    : Online Now
    post_count  : 29
    name        : David
    
    displayname : Peter
    team_id     : 15112
    datejoined  : May 8, 13
    group_name  : null
    user_id     : 776624
    chatstatus  : online
    location    : US
    lastseen    : Online Now
    post_count  : 33
    name        : Peter


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    • Marked as answer by Quarkspace Tuesday, February 24, 2015 1:40 PM
    Tuesday, February 24, 2015 11:54 AM
    Moderator
  • Both answers work thanks!  Will get my head around mjolinor's answer at some point in the future :)
    Tuesday, February 24, 2015 1:40 PM