none
Powershell - simple regex to find a date only

    Question

  • Hello 

    im trying to write and a very simple regex to find the following behaviour:

    1. Date consist of 1 or 2 digits for day and month and 4 digits for year.

    Examples:
    ‎3/‎7/‎2018
    03/07/2018
    3/07/2018
    03/7/2018

    I tested it using the following string:
    The previous system shutdown at 6:19:26 AM on ‎3/‎7/‎2018 was unexpected.
    a string data type that i pulled from windows system log (To be more precious event ID 6008)

    I've been trying the following regex: (Note: at this point i dont care bout the quality of the regex such value bounderies of
    the d\m\y)

    1. \d{1,2}\/\d{1,2}\/\d{4}
    2. [0-9]{0,2}\/[0-9]{0,2}\/\d{4}

    All i get is false,

    I dont want a very long and accurate regex at this point.

    btw - those regex works fine on online testers but testing it at powershell ISE returns only false result using the following line

    $str = "The previous system shutdown at 6:19:26 AM on ‎3/‎7/‎2018 was unexpected."
    $str - match '\d{1,2}\/\d{1,2}\/\d{4}'
    $str -match '[0-9]{0,2}\/[0-9]{0,2}\/\d{4}'

    Anyone please, what am I doing wrong ? 

    Thanks A lot.

    Wednesday, May 16, 2018 5:13 PM

Answers

  • Here's what I did:

    https://1drv.ms/u/s!AsDC94k7vMVfkqZQCsxaJ87OPBfcMg

    It's a one drive link, and since it's important the encoding, I've created a script on Unicode :) 

    This works on your system because It worked for mine (taking as text source the Get-EventLog Cmdlet).

    I personally do not use the "Gwmi" it's a way more complicated way to do things, and you would require more privileges and configuration (for remote computers)

    The file content of the file is as follows: (Important: the encoding must be Unicode)

    $path=$PSScriptRoot#$MyInvocation.MyCommand.Path
    #$global:tmpFile="$path\file.txt"
    $textArray = Get-EventLog -LogName System -EntryType Error | where{ $_.EventID -eq 6006 -or $_.EventID -eq 6008 -or $_.EventID -eq 1074} 
    $pattern="(0|‎)\d{1,2}\/(0|‎)\d{1,2}\/(0|‎)\d{1,4}"
    
    
    #Test1
    Write-Host -ForegroundColor Cyan "Test1"
    foreach($text in $textArray.Message){
       # $replacedText = removeSpecials $text
        $text -match $pattern
    }
    
    
    #test2
    [System.Text.RegularExpressions.Regex]$regex = New-Object System.Text.RegularExpressions.Regex($pattern, [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    [System.Text.RegularExpressions.MatchCollection]$m = $regex.Matches($textArray.Message)
    Write-Host -ForegroundColor Cyan "Test2"
    foreach($match in $m){
        if($match.Success){
            $match.Value
        }
    }



    • Edited by j0rt3g4 Wednesday, May 16, 2018 8:57 PM
    • Marked as answer by Wavestone Thursday, May 17, 2018 4:19 PM
    Wednesday, May 16, 2018 8:56 PM

All replies

  • It's an Encoding problem on your Text.

    Open your script with "notepad++" and make sure on the tab "Encoding" to be UTF-8.

    After that open your file again and you will see weird chars on your "date" in the test string.

    Remove those chars and voilá


    #Test1
    Write-Host -ForegroundColor Cyan "Test1"
    $text = "The previous system shutdown at 6:19:26 AM on 3/7/2018 was unexpected."
    $pattern="\d{1,2}\/\d{1,2}\/\d{1,4}"
    $text -match $pattern
    
    #test2
    [System.Text.RegularExpressions.Regex]$regex = New-Object System.Text.RegularExpressions.Regex($pattern, [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    [System.Text.RegularExpressions.Match]$m = $regex.Match($text)
    if($m.Success){
        Write-Host -ForegroundColor Cyan "Test2"
        $m.Value
    }
    
    
    


    • Proposed as answer by j0rt3g4 Wednesday, May 16, 2018 5:52 PM
    • Edited by j0rt3g4 Wednesday, May 16, 2018 5:53 PM
    Wednesday, May 16, 2018 5:50 PM
  • Thanks a lot, 

    I had a feeling that encoding is the issue in this way or another.
    The thing is, the string is coming out of object returned by Powershell using the Get-Eventlog cmdlet.

    the source isn't a file or manually typed by me or anyone else.

    is there a way to clean the string from unwanted characters ?

    Thanks

    Wednesday, May 16, 2018 6:08 PM
  • It's probably easier to use WMI to do this. The date you want is the 2nd item in the returned item's InsertionStrings:

    gwmi win32_NTLogEvent -filter "Logfile = 'System'" | where-object {$_.EventCode -eq "6008"} | foreach {$_.InsertionStrings[1] }
    
    

    There's a 0x3F character in front of each date element in the Get-WinEvent object's "Message" property. You can use a regex like the one below to extract the three elements of the date and then rebuild it as a string if you'd like, but I'd use WMI -- it's less work.

    # extract date
    $event.message -match "on .(\d{1,2})/.(\d{1,2})/.(\d{4}) was"
    
    # reconstruct date
    $matches[1..3]-join "/"


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Wednesday, May 16, 2018 7:37 PM
  • Thx Rich, 

    I will defiantly try that, the output comes from multiple jobs (as psobject  data type stored in object[]), i will make those changes tomorrow morning and report back.

    even if it will be a good workaround, it makes me wonder - how can I deal with encoding issues like this in cases that no workaround or other alternative exist ? I see that it possible to convert strings to other encoding, is that a way ? if so I'm not sure how to execute this.

    Thanks a lot for helping,
    I appreciate it

    Wednesday, May 16, 2018 8:02 PM
  • Here's what I did:

    https://1drv.ms/u/s!AsDC94k7vMVfkqZQCsxaJ87OPBfcMg

    It's a one drive link, and since it's important the encoding, I've created a script on Unicode :) 

    This works on your system because It worked for mine (taking as text source the Get-EventLog Cmdlet).

    I personally do not use the "Gwmi" it's a way more complicated way to do things, and you would require more privileges and configuration (for remote computers)

    The file content of the file is as follows: (Important: the encoding must be Unicode)

    $path=$PSScriptRoot#$MyInvocation.MyCommand.Path
    #$global:tmpFile="$path\file.txt"
    $textArray = Get-EventLog -LogName System -EntryType Error | where{ $_.EventID -eq 6006 -or $_.EventID -eq 6008 -or $_.EventID -eq 1074} 
    $pattern="(0|‎)\d{1,2}\/(0|‎)\d{1,2}\/(0|‎)\d{1,4}"
    
    
    #Test1
    Write-Host -ForegroundColor Cyan "Test1"
    foreach($text in $textArray.Message){
       # $replacedText = removeSpecials $text
        $text -match $pattern
    }
    
    
    #test2
    [System.Text.RegularExpressions.Regex]$regex = New-Object System.Text.RegularExpressions.Regex($pattern, [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    [System.Text.RegularExpressions.MatchCollection]$m = $regex.Matches($textArray.Message)
    Write-Host -ForegroundColor Cyan "Test2"
    foreach($match in $m){
        if($match.Success){
            $match.Value
        }
    }



    • Edited by j0rt3g4 Wednesday, May 16, 2018 8:57 PM
    • Marked as answer by Wavestone Thursday, May 17, 2018 4:19 PM
    Wednesday, May 16, 2018 8:56 PM
  • OK . . . I'm pleading ignorance. :-)  Can you explain the "(*0|)" in the regex, please? Does it signify either a zero, or an empty character, or something else?

    Unfortunately, the "$matches[0]" array element in your example contains the "?". Add "$matches[0] | Format-Hex" to see that.

    $matches[0] | format-hex (or $match.value | format-hex)

               00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
    00000000   3F 35 2F 3F 31 35 2F 3F 32 30 31 38              ?5/?15/?2018

    This regex (using a question mark, or 0x3f) works: $pattern="\?\d{1,2}/\?\d{1,2}/\?\d{1,4}". It doesn't include the "?" characters in $matches[0] or $match.value.


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Wednesday, May 16, 2018 9:53 PM
  • Google^H^H^H^H^H^HBing "powershell normalize string" and you should find examples of this.

    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Wednesday, May 16, 2018 10:03 PM
  • Hello j0rt3g4

    The $pattern="(0|‎)\d{1,2}\/(0|‎)\d{1,2}\/(0|‎)\d{1,4}" did it.
    Adding the (0|‎) to the expression solved the issue, Could you please explain why and what it 
    does in the background ?

    Thx a lot


    Thursday, May 17, 2018 4:23 PM
  • Yes, I can explain. it just brings the "special Character" that PowerShell implemented into the picture, we don't see it but it's there.

    (0|‎<theres a special char here>) but is not visible on the script, but it's there, similar to our PowerShell return does on the message.

    so (0|Char) it's a group optional, what I did what to capture the special char that was making it not to "match" and use it for matching, The whole point is that is not visible there, but if you copy and paste on the notepad you will see it. Try it.


    • Edited by j0rt3g4 Thursday, May 17, 2018 11:09 PM
    Thursday, May 17, 2018 11:08 PM
  • If you do the download from the one drive site you will get the match

    the (0|<char>) it's a special character that PowerShell add. 

    And no problem with the ignorance, we all are!, just doesn't ignore the same things lol 

    Thursday, May 17, 2018 11:11 PM
  • Here's a much simpler way to get that date if you use the ReplacementStrings array in the log event instead of the Message:

    $LogEvents = Get-EventLog -LogName System -EntryType Error | where{ $_.EventID -eq 6008} | Foreach {
        $TheDate = $_.ReplacementStrings[1] -replace "[^\d/]", ""
        # not part of the conversion, just to prove the UniCode "LEFT-TO-RIGHT-MARK" (U+200e or 0x8206)
        # is gone and what remains is a usable date in string format
        Write-Host $TheDate
        Write-Host ($TheDate|format-hex)
        Write-Host (($TheDate | Get-Date).ToShortDateString())
        Write-Host ($TheDate | Get-Date -Format "yyyyMMdd")
    }

    The problem is the Unicode "LEFT-TO-RIGHT-MARKER" (\u+200e or 0x8206) before each element of the date. Trying to remove it without resorting to an "invisible character" is a PITA, and USING an invisible character in the regex makes the regexe inscrutable!

    The simpler (in this case) way to deal with it is just to remove everything that isn't a digit or the character "/".


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    • Proposed as answer by jrvModerator Saturday, May 19, 2018 3:03 AM
    Saturday, May 19, 2018 2:58 AM
  • Thanks all for the helpful posts,

    I appreciate it, I'm glad eventually the issue is solved thanks to all of you.

    Monday, May 21, 2018 8:39 AM