none
Reading from a text file RRS feed

  • Question

  • Hi Guys,

    I have a log file which goes something like this:

    "

    The logs are as mentioned below

    col1 col2 col3
    --------------------
    a1 a2 a3
    b1 b2 b3
    c1 c2 c3

    Here are further details

    col1 col2 col3
    --------------------
    a1 a2 a3
    b1 b2 b3
    c1 c2 c3

    These where the head of the chain

    col1
    -----
    a1
    b1 

    "

    The content of the log file is similar to what appears above in the double quotes

    What i want to do is use powershell to read the file and find this particular section in the log file

    "These where the head of the chain"

    and after that check if the line below it has "col1". The line below "col1" are some numbers like here "a1,b2"

    If there are numbers like this then i want to copy them to a different text file. These numbers may or may not exist in the log file.When i copy them the format should be the same i.e.

    "

    These where the head of the chain

    col1
    -----
    a1
    b1 

    "

    Basically i want to take this section out from the log file and have it in a different text file.

    Thanks in advance

    Sachin

     

    Thursday, October 21, 2010 7:01 AM

Answers

  • This should do exactly what you asked for.  I created a file called t.txt that had the contents of your quotes.  The key is to make sure that you read in the whole text file as one chunk rather than read line by line. 

    The regex holds all of the magic that is doing the logic exactly how you described.  \s+ will match any space or newline.  (?m) converts it to a multiline regex so that you can use the ^ symbol to mean the beginning of a line.  (^[a-zA-Z]\d+(?! )\s+)+ searches for the beginning of a line followed by an a-z (case insensitive) followed by at least one digit... the (?! )\s+ means that it will match any space or newline character that is not an actual space... this way you ensure you are only getting sets of data with a single column.  It will keep repeating this whole search until it has found all of the a1, b1, etc.

    If you want to include the "These where the head of the chain" text you would use

    $_.groups[0].value instead of 1.

    $filenumber = 1
    $reg = [regex]'(?m)^These where the head of the chain\s+(^col1\s+^\-+\s+(^[a-zA-Z]\d+(?! )\s+)+)'
    $text = ""
    Get-Content .\t.txt |foreach{$text += $_ + "`r`n"}
    $matches = $reg.Matches($text)
    $matches |foreach {
        $_.groups[1].value |Out-File "newfile$filenumber.txt"
        $filenumber++
    }
    I hope this helps.
    write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""
    • Proposed as answer by Marco ShawModerator Saturday, October 23, 2010 1:51 AM
    • Unproposed as answer by SachinNair Monday, October 25, 2010 6:16 AM
    • Marked as answer by SachinNair Monday, October 25, 2010 10:38 AM
    Thursday, October 21, 2010 5:18 PM

All replies

  • This should do exactly what you asked for.  I created a file called t.txt that had the contents of your quotes.  The key is to make sure that you read in the whole text file as one chunk rather than read line by line. 

    The regex holds all of the magic that is doing the logic exactly how you described.  \s+ will match any space or newline.  (?m) converts it to a multiline regex so that you can use the ^ symbol to mean the beginning of a line.  (^[a-zA-Z]\d+(?! )\s+)+ searches for the beginning of a line followed by an a-z (case insensitive) followed by at least one digit... the (?! )\s+ means that it will match any space or newline character that is not an actual space... this way you ensure you are only getting sets of data with a single column.  It will keep repeating this whole search until it has found all of the a1, b1, etc.

    If you want to include the "These where the head of the chain" text you would use

    $_.groups[0].value instead of 1.

    $filenumber = 1
    $reg = [regex]'(?m)^These where the head of the chain\s+(^col1\s+^\-+\s+(^[a-zA-Z]\d+(?! )\s+)+)'
    $text = ""
    Get-Content .\t.txt |foreach{$text += $_ + "`r`n"}
    $matches = $reg.Matches($text)
    $matches |foreach {
        $_.groups[1].value |Out-File "newfile$filenumber.txt"
        $filenumber++
    }
    I hope this helps.
    write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""
    • Proposed as answer by Marco ShawModerator Saturday, October 23, 2010 1:51 AM
    • Unproposed as answer by SachinNair Monday, October 25, 2010 6:16 AM
    • Marked as answer by SachinNair Monday, October 25, 2010 10:38 AM
    Thursday, October 21, 2010 5:18 PM
  • Hi Tome,

    I tried what you suggested but i get a system out of memory exception.

    Actually the log file i have is a huge one. Its about 50MB. The log file consist of few tables and stuff like i mentioned in the earlier post. Its very dificult to go through each and every line for finding out the things i want.

     

    I tried it with a smaller file it worked.

    Thanks a lot :)

    Monday, October 25, 2010 9:39 AM
  • Here's an alternate method using the pipeline that may be more resource-friendly:

    $cap_tag = "These where the head of the chain"
    $include = "col1","-----"
    $temp = @()

    gc file.txt|%{
    if ($_ -eq $cap_tag){
        $temp += $_
        $cap= $true
        }
     elseif ($cap -and ($include -contains $_ -or $_ -match "^\w\d$")){
        $temp += $_
        }
     else {$cap = $false}
     }
     
     $temp | out-file test2.txt


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Monday, October 25, 2010 1:10 PM
  • Well, that's a horse of a different color, I say!

    To handle that you need to do some line-by-line logic.  It's not as pretty, but it works the same.

    So here's the script I used to generate a 50 MB file that has your text:

    $file = "test.txt"
    $size = 50MB
    $writechunks = 10MB
    $string = @"
    The logs are as mentioned below
    
    col1 col2 col3
    --------------------
    a1 a2 a3
    b1 b2 b3
    c1 c2 c3
    
    Here are further details
    
    col1 col2 col3
    --------------------
    a1 a2 a3
    b1 b2 b3
    c1 c2 c3
    
    These where the head of the chain
    
    col1
    -----
    a1
    b1
    
    "@
    "" |Out-File $file
    
    while ((Get-ChildItem $file).length -lt $size) {
     $string*($writechunks/16/$string.length) |Out-File $file -Append
    }

    And here's the script that does the same thing as the original answer, but it will not give you memory errors.  It's a bit longer than the original solution, but I think it's pretty easy to follow the logic:

    $inheader = $false
    $incolumn = $false
    $indashes = $false
    $inrow = $false
    $filenumber=1
    
    $output = @()
    $text = get-content .\test.txt -ReadCount 0
    $text |foreach { 
     if ($inrow) {
     if ($_ -match '^([a-zA-Z]\d+)\s*$') {
      $output += $_
     }
     else {
      $output  
      $inrow = $false
      $output |Out-File "newfile$filenumber.txt"
      $filenumber++
      $output = @()
     }
     }
     else {
     if ($incolumn) {
      $inrow = $_ -match '^-+'
      if ($inrow) {
      $output += $_
      $incolumn = $false
      }
     }
     else {
      if ($inheader) {
      $output += $_
      $incolumn = $_ -match "^col1"
      if ($incolumn) {
       $inheader = $false
      }
      }
      else {
      $inheader = $_ -match "^These where the head of the chain"
      if ($inheader) {
       $output += $_
      }
      }
     }
     } 
    }
    if ($inrow) {
     $output |Out-File "newfile$filenumber.txt"
    }
    

     


     

    http://twitter.com/toenuff

    write-host ((0..56)|%{if (($_+1)%3 -eq 0){[char][int]("116111101110117102102064103109097105108046099111109"[($_-2)..$_] -join "")}}) -separator ""

    Monday, October 25, 2010 2:41 PM
  • Hi Guys,

     

    Here is what I used:

     

    $reg = [regex]'(?m)^SPIDs at the head of blocking chains\s+(^spid\s+^\-+\s+[\d\s]+)'

    $text = [System.Io.File]::ReadAllText("C:\Users\snair1\Desktop\Test.txt")

    $mymatches = $reg.Matches($text)

    $mymatches | foreach {  $_.groups[1].value | Out-File "C:\ForumTest\Sample.txt" -Append }

     

    The RegEx guy needs a string as input file.

    The Get-Content cmdlet returns a object, so we used “Get-Content .\t.txt |foreach{$text += $_ + "`r`n"}” to get a string.  Since the log file is a big one it took lot of time to read each line and get it into the $text guy.  Thus I had a system out of memory exception.

    I used the File.IO guy because it took only few seconds to load the huge log file and it returned a string object.

     

    I am really sorry guys as I can’t share the actual log file but please note this section, it goes something like this.

    SPIDs at the head of blocking chains

    spid 

    ------

        72

        74

    Sections like this would be there in the log file and they occur many times with different spid.

     

    When I use the above code I get the output as for example

    spid 

    ------

        72

        74

        75

     

    spid 

    ------

        72

        74

        80

     

    All I got to do now is to arrange the output in the following format

     

    SPIDs     Occurrences

    72           2

    74           2

    75           1

    80           1

     

     

    Thanks a lot for you help and support. I really appreciate it.

     

    Sachin

     

    Tuesday, October 26, 2010 5:49 AM