locked
Remove texts prior to a particular xml tag RRS feed

  • Question

  • Hi All

    I have a requirement wherein I want to remove all the Texts prior to a particular xml tag

    Sample xml file sample.xml

    <?xml version="1.0" encoding="iso-8859-1"?>

    <!DOCTYPE ichicsr SYSTEM "http://eudravigilance.ema.europa.eu/dtd/icsr21XML.dtd">

    <ichicsr lang="en">
    <tag1><tag1/>
    <tag2><tag2/>
    <tag3><tag3/>
    </ichicsr>

    As depicted in my sample.xml file, I want to create a new xml in a different path using sample.xml file, where I want to delete all the texts prior to tag <ichicsr lang="en">. so my target xml would be as below:

    <ichicsr lang="en">
    <tag1><tag1/>
    <tag2><tag2/>
    <tag3><tag3/>
    </ichicsr>

    or, in other words, I want my target xml file to have everything between the tags <ichicsr lang="en"> and  </ichicsr>

    </ichicsr>

    Below is the code I am using, apparently is not working. Apparently, where and skipuntil doesnt work in Powershell version 2.0. Appreciate any workaround to this code. Please remember that we cannot perform any manipulations based on the line numbers because  the line numbering varies from file to file.

    #set your directory
    $file_temp = "C:\DTD_R2_RAW"

    #grab your files
    $xml_files = Get-ChildItem $file_temp *.XML -Recurse

    #designate your keyword
    $keyword = "my keyword"

    #create your new 'keep' folder
    New-Item -ItemType Directory C:\DTD_R2_RAW\Keep

    #if there are files, do something...
    if ($xml_files) {

        #for each file, skip all characters until your find the keyword, then output everything from that point
        ForEach ($x in $xml_files) {

            $file = Get-Content -Path ($file_temp + '\' + $x.Name)
            
            $keep = $file.Where({$_ -match $keyword}, 'SkipUntil') | Out-File C:\DTD_R2_RAW\keep\$($x.name)  

        }
    }

    Tuesday, November 14, 2017 4:48 AM

All replies

  • not a valid xml file

    (66,65,83,65,84,73|%{[char]$_})-join''

    Tuesday, November 14, 2017 4:56 AM
  • You XML is not legal XML and won't work as posted.  I fixed it so we can do this the easy way.

    [xml]$xml = @'
    <?xml version="1.0" encoding="iso-8859-1"?>
    <!DOCTYPE ichicsr SYSTEM "http://eudravigilance.ema.europa.eu/dtd/icsr21XML.dtd">
    <ichicsr lang="en">
    <tag1></tag1>
    <tag2></tag2>
    <tag3></tag3>
    </ichicsr> 
    '@
    $xml.ichicsr.OuterXml | Out-File newfile.xml
    
    


    \_(ツ)_/

    • Proposed as answer by BASATI Tuesday, November 14, 2017 5:08 AM
    Tuesday, November 14, 2017 5:04 AM