locked
Use filtered content as variable RRS feed

  • Question

  • I need to filter the content of a text file, the file contains numerous columns and all I want is email address. I need to make the filtered content a variable.

    So right now I have 

    Get-Content <file path> | Where-Object { $_ -match "[a-z]@email.domain" 
    I need to have the results of this set as a variable $email.

    How can I do this? I can find ways to make the entire contents a variable or specific lines but not just the email addresses matching the filter.
    • Edited by GADavies Monday, April 27, 2020 4:14 PM remove duplicate text
    Monday, April 27, 2020 3:37 PM

All replies

  • Probably multiple ways to do this. Here's one example.

    Given this file:

    line 1
    line 2   dave@email.domain     another-column 
    line 3
    Line 4    bob@email.domain    xxxxxxxxxx    tom@Email.Domain
     

    This script will get you the last email address. 

    (get-content <file path>).split(' ') | select-string -pattern ".+.*(@email\.domain)" | foreach {$email = $_.Matches[0].value}
    $email
    

    This script will get you all email addresses in the file.

    $emails = @()
    (get-content <file path>).split(' ') | select-string -pattern ".+.*(@email\.domain)" | foreach {$emails += $_.Matches[0].value}
    $emails
    

    Monday, April 27, 2020 5:21 PM
  • Thanks, that helps, though I need to filter emails for a particular domain. The file contains a mix of personal and work email addresses, the goal is to pull just the work addresses, how can I filter the addresses matching this, so just @domain.com and not @gmail.com or @outlook.com are used as the values for the variable?
    Monday, April 27, 2020 5:26 PM
  • $emails = @()
    (get-content <file path>).split(' ') | select-string -pattern ".+.*(@domain\.com" | foreach {$emails += $_.Matches[0].value}
    $emails

    • Proposed as answer by Vector BCO Friday, May 1, 2020 10:22 AM
    Monday, April 27, 2020 5:51 PM
  • # Code for generating test file
    <# foreach ($i in 1..100){
        $date = (get-date).AddDays($(Get-Random -Maximum 100)).AddMinutes(-$(Get-Random -Maximum 100)).AddHours($(Get-Random -Maximum 10)).AddYears(-$(Get-Random -Maximum 30)).AddSeconds($(Get-Random -Maximum 50))
        $domains = @('test.domail.com','best.com', 'domain.local', 'gmail.com', 'yahoo.com')
        $names = @('john','mat','rob','jess','tom')
        $lastnames = @('smit', 'wilkinson', 'atkinson')
        $msgSet = @('some', 'test', '-', 'message', 'rubbish', ':', ' ', 'with', 'qwerty')
    
        $tmpName = Get-Random -InputObject $names
        $tmpLastName = Get-Random -InputObject $lastnames
        $tmpDomain = Get-Random -InputObject $domains
        $email = "$tmpName.$tmpLastName@$tmpDomain"
    
        $msgLengrth = Get-Random -Maximum 30 -Minimum 10
        $msg = Get-Random -Count $msgLengrth -InputObject $msgSet
    
        "$date : $msg * $email * $msg" | out-file C:\TMP\emailparse.txt -Append
    } #>
    
    # here you can add domains that you want to find
    $InterestingDomains = @('test.domail.com', 'domain.local')
    
    # Creating regex based on provided domains
    $regex = '([^@ ]+@('
    $firstDomain = $true
    foreach ($domain in $InterestingDomains){
        $regex += "$(if(! $firstDomain){'|'})$([regex]::Escape($domain))"
        if($firstDomain){$firstDomain = $false}
    }
    $regex += "))"
    
    # Get-Content not needed because select string can do the same 
    Select-String -Path C:\TMP\emailparse.txt -Pattern $regex | foreach {$_.Matches.Value}

    MotoX80,

    you have a mistake in a regex '.+.*' mean same as '.+', but ".+" may mean something like "@some@trash=_." and in this particular case you will get false positive result like "@some@trash=_.@domain.com", additionally you have a typo - "(" -should be removed, or ")" should be added

    ".+.*(@domain\.com"

    correct regex (but not the optimal one):

    ".+@domain\.com"


    The opinion expressed by me is not an official position of Microsoft

    • Proposed as answer by Vector BCO Friday, May 1, 2020 10:31 AM
    Friday, May 1, 2020 10:31 AM