none
Updating lines in txt file by numbers RRS feed

  • Question

  • Hello,  I have a text file I need to update on occasion to re-number the end of specific lines.

    I will receive a txt file that will have multiple segments,  and each segment needs to be uniquely numbered, and the start and stop of each segment needs to  match...  There will be a varying number of segments to renumber,  usually about 7 to 9 or so.. 

    example.   a segment will start with:

         "  ST*835*00001 "
    and ends with
         "  SE*123*00001 " 

    The start line is always the same..  but the end line will vary by the numbers right after the SE*xxx"   and the file will have any number of lines between the start and end of the segment.

    any suggestions on how to find each line (i'm having trouble finding the strings I think because of the " * ".  once I can find the lines, what would be  a good way to renumber them?

    currently attempting to use the following, which I found on another page somewhere.

    $FileContent = Get-Content "c:\temp\filename.txt"
    $Matches = Select-String -inputObject $FileContent -pattern "ST*835*00001" -AlMatches
    $Matches.Matches.Count

    if I can find by the   "ST"   and the number at the end.. that is fine..  but how?

    Tuesday, December 18, 2018 6:03 PM

Answers

  • OK, I took a stab at it. See if this works:


    $data = Get-Content "inp.txt"
    $startPattern = '^ST\*835\*(\d+)~'
    $stopPattern = '^SE\*(\d+)\*\d+~'
    $sequenceNum = 1
    foreach ( $line in $data ) {
      if ( ($line -notmatch $startPattern) -and ($line -notmatch $stopPattern) ) {
        $line
        continue
      }
      $match = [Regex]::Match($line, $startPattern)
      if ( $match.Success ) {
        "ST*835*{0:D9}~" -f $sequenceNum
        $sequenceNum++
        continue
      }
      $match = [Regex]::Match($line, $stopPattern)
      if ( $match.Success ) {
        $lineCount = $match.Groups[1].Value
        "SE*{0}*{1:D9}~" -f $lineCount,($sequenceNum - 1)
        continue
      }
    }
    

    The input data is assumed to be in the file inp.txt.


    -- Bill Stewart [Bill_Stewart]



    Tuesday, December 18, 2018 9:09 PM
    Moderator

All replies

  • We might be able to come up with something.

    Post a very short example containing lines from the input file (with sufficient context such that respondents can look at a short data example), and then also explain precisely how you want to extract the data.


    -- Bill Stewart [Bill_Stewart]

    Tuesday, December 18, 2018 7:14 PM
    Moderator
  • Will do Bill,  thanks for the assist.   here is a short sample I made up,  has only 3 segments to it, which should be enough.  note that the number of lines in each segment varies, usually a couple hundred lines, so this is VERY simplified!!

    original will look similar to this.

    *********

    file*header*00                 *something

    ST*835*000000001~

    MO*RE*JUNK

    MO*RE*JUNK

    SE*2*000000001~

    ST*835*000000001~

    MO*RE*JUNK*here*as*well

    MO*RE*JUNK

    SE*12*000000001~

    ST*835*000000001~

    MO*RE*JUNK

    MO*RE*JUNKand more lines to ignore here

    SE*122*000000001~

    **************

    what I need to do is make it look like this (numbering is optional.. but just how I do it manually at this time)

    ***************

    file*header*00                 *something

    ST*835*000000001~

    MO*RE*JUNK

    MO*RE*JUNK

    SE*2*000000001~

    ST*835*000000002~

    MO*RE*JUNKhereas well

    MO*RE*JUNK

    SE*12*000000002~

    ST*835*000000003~

    MO*RE*JUNK

    MO*RE*JUNKand more lines to ignore here

    SE*122*000000003~

    *************

    again..  now the segments are unique from each other.. but the ST and SE numbers need to match .   the middle number on the SE segment is a line count for the segment, and can be safely ignored (the numbers in the sample are for show, only the 0000001 needs to be worried about.

    Tuesday, December 18, 2018 8:08 PM
  • Why does the number need to be incremented?

    -- Bill Stewart [Bill_Stewart]

    Tuesday, December 18, 2018 8:12 PM
    Moderator
  • it's a payment file. each "segment" is a unique set of info, and the file will be submitted to another system that reads each segment..   however.. if each segment is not uniquely numbered.. the receiving system will error out.

    the segment numbers don't need to be incremental,   just unique from any other segment.. so they could be 1,  1002,  432, and 6654 if we wanted..  as long as no two segments are the same.  I just went incremental because it was easiest for me at that time.

    Tuesday, December 18, 2018 8:21 PM
  • OK. Why is the input always numbered 1?

    -- Bill Stewart [Bill_Stewart]

    Tuesday, December 18, 2018 8:26 PM
    Moderator
  • Also - can you clarify the difference between the bold number below and the underlined number below?

    ST*835*000000001~

    ...

    SE*2*000000001~

    Does the bold number have to match in each "chunk", or just the underlined number?


    -- Bill Stewart [Bill_Stewart]


    Tuesday, December 18, 2018 8:31 PM
    Moderator
  • HA, I've been arguing with the vendor who produces this for a while..   it always comes in numbered 000001.  if it ever changes I'll be surprised as heck.

    for the other numbers you asked about

      ST*835*xxx 
    and
       SE*###*xxx

    the ST will always be followed by the *835.

    The SE*123..     this number is actually the number of lines in the segment, so it will vary.  

    the constants on each of these will be the
    ST*835*0000001
    and
    SE*###*0000001

    Tuesday, December 18, 2018 8:37 PM
  • OK, I took a stab at it. See if this works:


    $data = Get-Content "inp.txt"
    $startPattern = '^ST\*835\*(\d+)~'
    $stopPattern = '^SE\*(\d+)\*\d+~'
    $sequenceNum = 1
    foreach ( $line in $data ) {
      if ( ($line -notmatch $startPattern) -and ($line -notmatch $stopPattern) ) {
        $line
        continue
      }
      $match = [Regex]::Match($line, $startPattern)
      if ( $match.Success ) {
        "ST*835*{0:D9}~" -f $sequenceNum
        $sequenceNum++
        continue
      }
      $match = [Regex]::Match($line, $stopPattern)
      if ( $match.Success ) {
        $lineCount = $match.Groups[1].Value
        "SE*{0}*{1:D9}~" -f $lineCount,($sequenceNum - 1)
        continue
      }
    }
    

    The input data is assumed to be in the file inp.txt.


    -- Bill Stewart [Bill_Stewart]



    Tuesday, December 18, 2018 9:09 PM
    Moderator
  • works mostly,  amazing stuff Bill.  Thanks.   One issue i'm trying to figure out now..  how do I get it to write it all back to the file.. or output to another file, and keep the changes?   I can see them scrolling by,  but the file is unchanged..  

    I tried a simple  $data > test1.txt     at the end of the script.. but that didn't work.

    Wednesday, December 19, 2018 4:10 PM
  • Put the code in a script (.ps1) file, and pipe to Out-File. For example, if the script is in process.ps1, you would write:


    process.ps1 | Out-File newdata.txt

    This will write newdata.txt with the new data as a Unicode text file. If you need a different text output format (e.g., ASCII), then use the -Encoding parameter of Out-File (e.g., Out-File newdata.txt -Encoding ASCII).


    -- Bill Stewart [Bill_Stewart]

    Wednesday, December 19, 2018 4:42 PM
    Moderator
  • perfect, thanks  Bill.  I was trying to run it in ISE..  not as a standalone script.

    fyi.. the new file comes in at about double the size in bytes for some reason.. but I don't think it's anything to worry about.  I did do a difference verification  (" diff (cat test.txt) (cat test123.txt)  and all it found was the ST and SE lines..  perfect!

    Again, thank you for the help!.

    Wednesday, December 19, 2018 5:30 PM
  • fyi.. the new file comes in at about double the size in bytes for some reason..

    I thought that might come up...This is why I mentioned Unicode (double-byte) encoding. If you want ASCII encoding, use -Encoding ASCII with the Out-File cmdlet.


    -- Bill Stewart [Bill_Stewart]

    Wednesday, December 19, 2018 7:12 PM
    Moderator