none
Need to delete text files based on the contents of a text file? RRS feed

  • Question

  • I am pretty new to digging this deep into my computer at work, but I would really like to automate this process to help relieve some of the strain on my coworker. Every morning a text file is generated that lists the files our machines put out to track what they have done. There are usually quite a few of these, and they are spread out between about 17 different folders. Here is an example of the text report:

    m:\trumpf3\ncdone\rueck091.txt:DA,'1599','159935','159903500',1,1,3518
    m:\trumpf3\ncdone\rueck092.txt:DA,'CP_TAPESHOT',1
    m:\trumpf3\ncdone\rueck093.txt:DA,'CP_TAPESHOT',1
    m:\trumpf3\ncdone\rueck094.txt:DA,'CP_TAPESHOT',1

    m:\trumpf5\ncdone\rueck499.txt:DA,'2178','217804','217800400',1,1,450,''
    m:\trumpf5\ncdone\rueck500.txt:DA,'1647','164714','164701400',1,1,2806,''
    m:\trumpf5\ncdone\rueck501.txt:DA,'1593','159301','159300100',1,1,435,''
    m:\trumpf5\ncdone\rueck502.txt:DA,'CP_TRIM_OFF',1
    m:\trumpf5\ncdone\rueck503.txt:DA,'1672','167201','167200100',3,1,907,''

    Each line is a different file on the server, but some of them we do not need, and if we leave them in the system errors out when we scan the work into the tracking system. So we have to manually find each file and delete it. Basically, if it doesn't follow the format of the first line, we need to remove it. I have tried a couple of things to make this happen, but I can't figure out how to make it use this text file. Instead I am running through Excel to filter out the good ones. Is there an easier way? I'm not going to be the one actually running this, so it would be great if I could reduce the likelihood of error.

    Thank you for your help

    Tuesday, December 16, 2014 12:06 PM

Answers

  • The expression above will match the lines correctly, you just need to tell it to delete the files.

    $pattern = "[a-z]{1}:\\[a-z]+[0-9]\\[a-z]+\\[a-z]+[0-9]+.[a-z]{3}:[a-z]{2},'[0-9]+','[0-9]+','[0-9]+',[0-9]+,[0-9]+,[0-9]+"
    
    Gc C:\Rueck_Info.txt | % {
    
        if($_ -match $pattern) {
    
    
    write-host "$_ is good"
    
    }
        else{
    
    write-host "Deleting $_"
    
    [regex] $pat = "(?<=t:)"
    $file = $_ -split $pat | Select -index 0
    $file = $file.Remove($file.LastIndexOf(":")) 
    remove-item -Path $file -Confirm 
    
    
    
        }
    
    }
    

    • Edited by Braham20 Tuesday, December 16, 2014 4:11 PM
    • Marked as answer by The First Axle Tuesday, December 16, 2014 6:25 PM
    Tuesday, December 16, 2014 4:07 PM

All replies

  • Your question is vague and incomplete.  Post your script with a clear description of what you are trying to do.

    ¯\_(ツ)_/¯


    • Edited by jrv Tuesday, December 16, 2014 12:56 PM
    Tuesday, December 16, 2014 12:56 PM
  • Do you need to match the exact format - length of the number string etc? Regex will be your best bet, get a decent pattern and the rest is easy: 

    "[a-z]{1}:\\[a-z]+[0-9]\\[a-z]+\\[a-z]+[0-9]+.[a-z]{3}:[a-z]+,'[0-9]+','[0-9]+','[0-9]+',[0-9]+,[0-9]+,[0-9]+"

    The above would probably do it. It's not pretty and could be improved considerably(!) but I'm only just getting round to learning regex.

    Tuesday, December 16, 2014 1:06 PM
  • I don't have any script. I couldn't get it to work so I didn't keep it. I will try to expand on my explanation.

    m:\trumpf3\ncdone\rueck093.txt

    That is the path and filename of the file that I need to test.

    DA,'1599','159935','159903500',1,1,3518

    That is what I am trying to test for to see if the file is good. A good file will always match this format.

    DA,'CP_TAPESHOT',1

    For comparison, this file is bad, because the string does not match a good file.

    I need to test that part of the file, and if it is bad, I would like for it to be deleted. If it is good I would like to keep it.

    Tuesday, December 16, 2014 1:27 PM
  • Are you trying to match a file name or th e contents of a file.  You are not being clear about what you are trying to accomplish.

    This: DA,'1599','159935','159903500',1,1,3518 does not look like a file name. Perhaps you are just rying to extract the correct contents from the file?

    What scripting languages have you tried.

    If you want contents then yuse Select-String ib PowerShell.


    ¯\_(ツ)_/¯


    • Edited by jrv Tuesday, December 16, 2014 1:33 PM
    Tuesday, December 16, 2014 1:32 PM
  • No "m:\trumpf3\ncdone\rueck093.txt" is the path and file name. In the text file I am working with it shows the file name and a small selection of the files contents.

            File Name                                        Contents of the File.

    (m:\trumpf3\ncdone\rueck091.txt)(:DA,'1599','159935','159903500',1,1,3518)

    There is a large report that gets generated that lists all of these files in a text document. It shows the File Name and the Contents of the File, exactly as it is above without the parentheses. I just copy and pasted out of the generated file the list from my original post. I want to check the Contents of the File, and if it matches the above format, Keep it. If the Contents of the File do not match, I want to delete the File at the Path indicated at the beginning of the line.

    So far I have tried to create a batch file to make this happen, but I could not figure out how to make it check and see if the format matches. Once that failed I tried bringing the list into excel, but excel also gave me problems trying to see if the Contents of the File were the correct format when I tried making an IF statement to test it. VBA also gave me the same trouble.

    The Contents of the File should always be ":DA,'####','######','#########','#','#'," and the last set of numbers varies. Instead, some of them say ":DA,'CP_TAPESHOT',1" or a random jumble of numbers and letter that are not able to be scanned.

    Tuesday, December 16, 2014 2:44 PM
  • Regex?

    And that is exactly what I am trying to do. But I don't know what Regex is

    Tuesday, December 16, 2014 2:46 PM
  • Are you saying that thse files have only one line in them?

    When you say "contents" that is plural and can indicate multiple lines.  Are youa sking how to test every line in the target file for that pattern?

    I know this makes sense to you but we do not know your system and cannot see whatyou are looking at.

    You need to start by writing a set of logical steps that describe how to match one file to the other files.  What is being matched?

    The above post shows how to use RegEx to match a pattern.  You need to work from that end until you can make what you are asking clear enough to boil it down to a question.

    Remember this forum is to help technicians solve issues with administrative scriptin or to learn how to write scripts.  It is anot a free script design forum for end users.


    ¯\_(ツ)_/¯

    Tuesday, December 16, 2014 2:54 PM
  • Here's a simple script that may help you to understand how regular expressions work - 

    $pattern = "[a-z]{1}:\\[a-z]+[0-9]\\[a-z]+\\[a-z]+[0-9]+.[a-z]{3}:[a-z]+,'[0-9]+','[0-9]+','[0-9]+',[0-9]+,[0-9]+,[0-9]+"
    Gc C:\Yourfile.txt | % {
    
    if($_ -match $pattern) {
    
    
    write-host "$_ is good"
    
    }
    else{
    
    
    write-host "$_ is bad"
    
    
    }
    }
    
    
    
    

    Tuesday, December 16, 2014 3:04 PM
  • Yes the files I'm trying to remove have multiple lines of content. The report that we print shows it exactly as I copy and pasted in my original post. That is what I want to check it against, not each individual file. The sample is what I'm trying to check. Here are the steps we are using now:

    1. Generate "Rueck_Info.txt". This pulls the path/file name for each of the files as well as one line of the content in that file and puts them all in one text file. The code we use to do this is here:

    %SystemRoot%\explorer.exe "m:\TRUMPF3\NCDONE"
    %SystemRoot%\explorer.exe "N:\Gas_House_Data"
    %SystemRoot%\explorer.exe "G:\Airport Laser\NCell"
    del "g:\airport laser\ncell\*Data_Collection.csv"
    copy n:\Gas_House_Util\RueckInfo_Init.txt n:\Gas_House_Data\RueckInfo.txt
    
    qgrep -B DA,' m:\TRUMPF1\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF2\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF3\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF4\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF5\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF6\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF7\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF8\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF9\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF10\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    qgrep -B DA,' m:\TRUMPF11\NCDONE\rueck*.txt >>N:\Gas_House_Data\RueckInfo.txt
    
    
    n:\accufab\bin\gawk-w32.exe -f n:\gas_house_util\Rueck-Repl-Comma.awk N:\gas_house_data\RueckInfo.txt > N:\gas_house_data\RueckInfo-2-Final.txt
    
    move n:\gas_house_data\RueckInfo.txt "n:\Data Backups\RueckInfo\RueckInfo_%date:~10,4%%date:~4,2%%date:~7,2%_%time:~0,2%_%time:~3,2%_%time:~6,2%.txt"
    
    

    2. Open "Rueck_Info.txt" and print it. We print it so that we can look at each one and manually see which ones are "bad", or do not meet the required format.

    3. Follow the path of the "bad" files, then delete that file.

    I'm not sure how else to describe the report that is generated other than each line in the report represents one file. Each line looks like this:

    m:\trumpf3\ncdone\rueck091.txt:DA,'1599','159935','159903500',1,1,3518

    After the path and file name, there is a ":" followed by the text I want to compare. The text I want to compare should look like the one above, but the numbers will vary. The files that I want to delete will have the path and file name, but instead of a set of numbers it will be asterisks, letters, or gibberish. For example:

    m:\trumpf3\ncdone\rueck094.txt:DA,'CP_TAPESHOT',1

    m:\trumpf5\ncdone\rueck502.txt:DA,'CP_TRIM_OFF',1

    Both of the above are bad. For the first one, I would like to delete this file:

    "m:\trumpf3\ncdone\rueck094.txt"

    I did not realize this was a bad forum to post this question. I was referred here by someone and didn't realize it was geared more towards IT professionals. If I need to I can try to find an answer elsewhere

    Tuesday, December 16, 2014 3:46 PM
  • This is a scripting (programming) forum where we try to answer specific scripting questions.

    In general, it's not a forum where you post a request for someone to write code for you to specific specifications.

    If this is critical to your business and you need someone to write code for you, then you probably should look at hiring a consultant for the task.


    -- Bill Stewart [Bill_Stewart]

    Tuesday, December 16, 2014 3:52 PM
    Moderator
  • It isn't critical, I was just having trouble with the coding and thought this was a general help forum from the person who referred me here. Sorry about the confusion. I will look elsewhere
    Tuesday, December 16, 2014 3:56 PM
  • The expression above will match the lines correctly, you just need to tell it to delete the files.

    $pattern = "[a-z]{1}:\\[a-z]+[0-9]\\[a-z]+\\[a-z]+[0-9]+.[a-z]{3}:[a-z]{2},'[0-9]+','[0-9]+','[0-9]+',[0-9]+,[0-9]+,[0-9]+"
    
    Gc C:\Rueck_Info.txt | % {
    
        if($_ -match $pattern) {
    
    
    write-host "$_ is good"
    
    }
        else{
    
    write-host "Deleting $_"
    
    [regex] $pat = "(?<=t:)"
    $file = $_ -split $pat | Select -index 0
    $file = $file.Remove($file.LastIndexOf(":")) 
    remove-item -Path $file -Confirm 
    
    
    
        }
    
    }
    

    • Edited by Braham20 Tuesday, December 16, 2014 4:11 PM
    • Marked as answer by The First Axle Tuesday, December 16, 2014 6:25 PM
    Tuesday, December 16, 2014 4:07 PM
  • Awesome thank you very much. This is exactly what I need
    Tuesday, December 16, 2014 4:30 PM
  • The problem here appears to be that you don't care what is int he taerget file.  You just want to delete it if the line in the log file does not match the pattern. Apparently Braham20 is better at this guesswork then I am as that is what he has provided for you.

    Sometimes we get way to complicated with out explanatations.  That can make communicating quite difficult.

    I am glad someone figured out your puzzle.


    ¯\_(ツ)_/¯

    Tuesday, December 16, 2014 5:31 PM
  • I wish I had thought to say it that way. I didn't mean to take up so much of your time. Thank you all very much
    Tuesday, December 16, 2014 6:25 PM
  • I wish I had thought to say it that way. I didn't mean to take up so much of your time. Thank you all very much

    Not an issue just think a bit more the next time.  Try to find a simple statement.  I know it can be hard to ssort out when you are not a programmer or technician but finding a simple statement can be even more helpful for you to see the solution for yourself.

    Consider that the solution to most problems starts with finding how to ask the right question.  Science is about questions as much or more than it is about answers.


    ¯\_(ツ)_/¯

    Tuesday, December 16, 2014 7:08 PM
  • As i said, the regex could be improved and simplified. It may currently find matches that aren't desirable, depending on how strict your criteria is. The deletion could also be shortened to one line. It's a working example only; you could treat it as a learning experience and tweak it :)

    • Edited by Braham20 Tuesday, December 16, 2014 10:00 PM
    Tuesday, December 16, 2014 9:34 PM
  • The "guessed" pattern may be the best available since there is not enough test data to be certain.   The pattern has some unpredictable looking items.  The users request does not address these issues. Only running this over a lot of data will determine  when and where your pattern breaks.

    As for efficiency;  in this situation predictability is more important.  Once a set of rules is locked down we could look at optimizing.  Rules first.  Performance last.


    ¯\_(ツ)_/¯

    Tuesday, December 16, 2014 11:53 PM
  • This is replaceable but is still the same:

    [0-9]+ is the same as \d+ meaning any number of digits in a row with no other character.


    ¯\_(ツ)_/¯

    Tuesday, December 16, 2014 11:55 PM