none
Problems with Regex and Identify PII RRS feed

  • Question

  • Hello,

    So, I am at a loss on a piece of code that is functioning yet not capturing all the data I am attempting to capture.  I borrowed a bit of code from a blog (I can't recall which one) and modified it to search for social security number like patterns.  The code is as follows:

    foreach ($file in Get-ChildItem -Recurse -filter | Select-String
    -pattern '[0-9]{3}-[0-9]{2}-[0-9]{4}' -allmatches | Select-Object -Unique Path)
    {$file.path}

    The code functions and properly identifies social security number like patterns, however, it only does it for .txt documents.  It does not return .docx or from any other type of document.  Any idea why that would be the case?  Thank you for any help you can provide.  


    bufzech3

    Monday, April 3, 2017 2:20 PM

Answers

  • Docx and other document formats us proprietary formatting. You cannot use text file string formatting methods with such files. In general, they need to be opened and edited in the corresponding application, like Word. Look at the files in Notepad to see what I mean.

    Richard Mueller - MVP Enterprise Mobility (Identity and Access)

    Monday, April 3, 2017 2:51 PM
    Moderator

All replies

  • Docx and other document formats us proprietary formatting. You cannot use text file string formatting methods with such files. In general, they need to be opened and edited in the corresponding application, like Word. Look at the files in Notepad to see what I mean.

    Richard Mueller - MVP Enterprise Mobility (Identity and Access)

    Monday, April 3, 2017 2:51 PM
    Moderator
  • Ugh...that was my intuition. I was hoping that my intuition was wrong.  Thank you for the help!

    V/r,


    bufzech3

    Monday, April 3, 2017 4:26 PM