none
How can I replace text in xml node with Powershell?

    Question

  • Hello.

    I have a script that seek for entries in xml resx files in nodes using Powershell:

    get-childitem "D:\work\"* -include *.resx -recurse -force | select-xml -xpath "//data/value" 
    | where { $_.node.InnerXML | select-string -pattern [\\s\\S]*?"text"[\\s\\S]*? }

    And now I need to modify this script to replace this found text in xml nodes with another.

    How can I implement this?

    I try something like this:

    get-childitem "D:\work\"* -include *.resx -recurse -force | select-xml -xpath "//data/value" 
    | foreach-object {$_.node.innerXML -replace "the the", "the"} | Set-Content Path

    But it looks like Set-Content doesn't work for this case...

    Thanks in forward.

    Wednesday, April 25, 2012 9:21 AM

Answers

  • The following will get you only the nodes of interest.

     select-xml -xpath '//data/value[contains(.,"the the")]'|%{$_.Node.'#text'}

    This willreturn only nodes that match the pattern in 'contains()'.  The dot refrences the current node in the enumertion.  The search is only at the current level.

    Replacement can be done on the returned items.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 10:35 AM
  • To see how much entries would be replaced...

    Every collection has a 'count' property.

    $nodes=$xml.selectNodes('//data/value')

    $nodes.count


    ¯\_(ツ)_/¯

    Friday, April 27, 2012 12:20 PM

All replies

  • That is correct.  You have to uppdate teh nodes value.  'innerXML' is all xmk including child nodes.  In an XML object the xml representation is read-only.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 10:10 AM
  • Try it this way.

    $xml=[xml](cat 'F:\projects\Visual Studio 2008\IVCentral.NET\IVCentral.NET\frmDisplayRecord.resX')
    $xml.selectNodes('//data/value[text()]')|%{$_.'#text' = ($_.'#text' -replace 'the the','the')}

    THis shows how to select teh nodes and replace teh text contents of teh node.  That is the part between the arrows.

    <tagname attr1='attrvalue'> this is the text nore (#text)

          <childnode> text </childnode>

    </tagname>

    You method would try to replace text anywhere beloe the select noe.  The nethod I posted here isolates only the text for that specific node.  Lucky for all that the rules of XML do not allow innerXML to be changed.  Yu cancopy and fix the text copy butit has to be either reassigned or reaplaced as a fragment.  Not fun things to do.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 10:21 AM
  • The following will get you only the nodes of interest.

     select-xml -xpath '//data/value[contains(.,"the the")]'|%{$_.Node.'#text'}

    This willreturn only nodes that match the pattern in 'contains()'.  The dot refrences the current node in the enumertion.  The search is only at the current level.

    Replacement can be done on the returned items.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 10:35 AM
  • Thank you.

    I have tried to do like this:

    get-childitem "D:\temp\"* -include *.resx -recurse -force | select-xml -xpath "//data/value[text()]" 
    | foreach-object { $_.node.'#text' = ($_.node.'#text' -replace 'text','foo') } 

    and this code have replaced 'text' with 'foo', but how can I save these changes?

    I have Path member from Select-Xml cmdlet which indicates from which file is node. can I use this value to utilize with Set-Content?

    I have tried thid, but no success:

    get-childitem "D:\temp\"* -include *.resx -recurse -force | select-xml -xpath "//data/value[text()]" 
    | foreach-object { $_.node.'#text = ($_.node.'#text' -replace 'text','foo'), $_.Path } 
    | set-content Path

    Thanks in forward.

    Wednesday, April 25, 2012 10:45 AM
  • A modified and faster version

    $xml.selectNodes('//data/value[contains(.,"the the")]')|
        ForEach-Object{$_.'#text' -replace 'the the','the'}

    Of course you have to use the XML type save method to replace the file contents.

    $xml.Save(<filename>)


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 10:49 AM
  • I was trying to find out why you took your approach.  I see now what you thought you could do.  Remember that the output of the select-string comamnd is a collecion of matchinfo objects.  You are thenpassing that to the Set-Content CmdLet.  Set-Content takes an object and pus 9it into a file.  It needs a file name to proceed.  The input objects are not files and not text. Inthe ppipeline you nave lost the original file name and we cannot go back and get it easily.  You are also going to have issues with trying to alter the innerXML.  What is missing is a Replace-Xml CmdLet that operates at the file level and takes an XPath selector and a select/replace argument.

    This is actually very easy to do with XSL but the setup is harder.

    The Set-Content appears to eb able to st content in all object but the rules of PosH say that we need a provider to support the type we want to use the Set-Content symantics on.  There is no XML provider at this time.  The examples all use files for demonstrating the CmdLet.

    Notice that the file example shows that you need to supply the filename: 

    (get-content Notice.txt) | foreach-object {$_ -replace "Warning", "Caution"} | set-content Notice.txt


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 11:18 AM
  • But I have in the output of:

    get-childitem "D:\temp\"* -include *.resx -recurse -force | select-xml -xpath "//data/value[text()]" 
    the values which contains path to the file:

    Node                              Path                                                                           Pattern                                                                              
    ----                                 ----                                                                             -------                                                                              
    value                              D:\temp\ControlNames.resx                                     //data/value[text()]                                                                 
    value                              D:\temp\ControlNames.resx                                     //data/value[text()]                                                                 
    value                              D:\temp\ControlNames.resx                                     //data/value[text()]

    Can I preserve these values in pipeline after ForEach-Object construction?

    Something like this -

     ... | foreach-object { ($_.node.'#text', $_.Path } | SomeCommand Path
    I mean save this Path value for each object and use it in 'SomeCommand' after...

    Wednesday, April 25, 2012 11:37 AM
  • No - not without writing a lot of troublesome code.  If you are an advanced Powershell scripter you can create a function or script block that will do this.  It isn't worh it.  Just use the code I posted.

    $files=Get-ChildItem

    $files|%{
        # get contents
        # fix nodes
        # save file
    }

    That is it. I leave it up to you to write the simple thee lines of code.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 11:44 AM
  • $files=Get-ChildItem $path -include *.resx -recurse 
    $files|
         ForEach-Object{
              $xml=[xml](Get-Content $_)
              $xml.selectNodes('//data/value[contains(.,"the the")]')|
                   ForEach-Object{ $_.'#text' -replace 'the the','the' }
         $xml.Save($_)
    }

    Here - this is pretty much the whole template for an XML search and replace in files.


    ¯\_(ツ)_/¯


    • Edited by jrv Wednesday, April 25, 2012 11:55 AM
    Wednesday, April 25, 2012 11:54 AM
  • But I have in the output of:

    get-childitem "D:\temp\"* -include *.resx -recurse -force | select-xml -xpath "//data/value[text()]" 
    the values which contains path to the file:

    Node                              Path                                                                           Pattern                                                                              
    ----                                 ----                                                                             -------                                                                              
    value                              D:\temp\ControlNames.resx                                     //data/value[text()]                                                                 
    value                              D:\temp\ControlNames.resx                                     //data/value[text()]                                                                 
    value                              D:\temp\ControlNames.resx                                     //data/value[text()]

    Can I preserve these values in pipeline after ForEach-Object construction?

    Something like this -

     ... | foreach-object { ($_.node.'#text', $_.Path } | SomeCommand Path
    I mean save this Path value for each object and use it in 'SomeCommand' after...

    You will not be able to use thse nodes as they are part of an info structure.  and once you do your replace you lose the file name.

    Like I posted before.  Yuo can figure out a method if you can figure out how to do the update.

    My code is identical but uses the XML class to do the work.  I find this to be more flexible.  If you really want to try it your way go ahead.  I think it will be a bit difficult but a good learning experience.

    A Replace-Xml CmdLet would be a help.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 12:08 PM
  • Thank you very much.

    This code works:

    $files=Get-ChildItem "D:\temp\"* -include *.resx -recurse 
    $files|
         ForEach-Object{
              $xml=[xml](Get-Content $_)
              $xml.selectNodes('//data/value[contains(.,"text")]')|
                   ForEach-Object{$_.'#text' = ($_.'#text' -replace 'text','foo1') }               
         $xml.Save($_)
    }

    But only if text case-sensitive.

    Is there any way to make "[contains]" and "-replace" case-insensitive?

    Wednesday, April 25, 2012 12:58 PM
  • No - MS XPath is XPath 1.0 and does not support cases.

    -replace uses -ireplace for case insensitive matches.

    You can use OR in the XPath where clause.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 1:27 PM
  • Thanks.

    I think I can omit [contains].

    But "-replace" and "-ireplace" both replace text with case-insensitive option :-/

    Can I modify this behavior if I need?

    And where can I read about "-replace"? I can't find information about it on technet.microsoft.com (http://technet.microsoft.com/en-us/library/dd347608.aspx).

    Wednesday, April 25, 2012 1:55 PM
  • The other option is -crepkace.

    All of the string things are structured like that.

    help about operators


    ¯\_(ツ)_/¯


    • Edited by jrv Wednesday, April 25, 2012 3:39 PM
    Wednesday, April 25, 2012 3:39 PM
  • One more question.

    After calling  $xml.Save($_)  resx file lost their format, some indents,tabulation and line breaks.

    Is there any option to preserve spaces in this "resx xml format" with indent and line breaks.

    Thank you.

    Wednesday, April 25, 2012 4:50 PM
  • You have to create a custom xmlwrite and a stem reader.  This can be optioned to reformat th e XML according to a number of standard 'prettyprint' styles.

    By default and by definitions XML has no format for display.  space, tab and line break characters are ignored at all times by XML and its poor cousin XTML.  Support for fromatting isa lways optional.  Notmall you don't need to look at this.  If you open it in an XML viewer like IE it will be formatted for you for dispaly.

    Later on I will look for the code example I have that shows how to use the XML writer in PowerSHell.


    ¯\_(ツ)_/¯

    Wednesday, April 25, 2012 5:10 PM
  • get-childitem "D:\work\"* -include *.resx -recurse -force | foreach-object {
        [xml]$XML = Get-Content $_.FullName
        $XML.data.value = $XML.data.value -replace "the the", "the"
        $XML.Save($_.FullName)
        }
    This should do the trick.  No need to worry about nodes or other complicated XML syntax.  You might need to update the $XML.Data.Value path for your files so that it captures all of the instances, but you get the gist.

    Rich Prescott | Infrastructure Architect, Windows Engineer and PowerShell blogger | MCITP, MCTS, MCP

    Engineering Efficiency
    @Rich_Prescott
    Windows System Administration tool
    AD User Creation tool

    Wednesday, April 25, 2012 10:38 PM
    Moderator
  • get-childitem "D:\work\"* -include *.resx -recurse -force | foreach-object {
        [xml]$XML = Get-Content $_.FullName
        $XML.data.value = $XML.data.value -replace "the the", "the"
        $XML.Save($_.FullName)
        }
    This should do the trick.  No need to worry about nodes or other complicated XML syntax.  You might need to update the $XML.Data.Value path for your files so that it captures all of the instances, but you get the gist.

    Rich Prescott | Infrastructure Architect, Windows Engineer and PowerShell blogger | MCITP, MCTS, MCP

    Engineering Efficiency
    @Rich_Prescott
    Windows System Administration tool
    AD User Creation tool

    Rich - not working on resx files due to schema.  The value nodes can exist at mutiple levels.  You are only getting one level.

    We have t working and are not looking to preserve the format of the file.  The [xml] save method saves as unfromatted text, or. basically as a single line.  You need to use the XMLWriter class to 'prettyprint' the output.

    Stick around and I will show you how to do that.  It is pretty neat and it gives us nicely formatted XML files.

    I just got back so give me  a few.


    ¯\_(ツ)_/¯


    • Edited by jrv Wednesday, April 25, 2012 10:46 PM
    Wednesday, April 25, 2012 10:45 PM
  • I was able to read in my .XML files, modify them and save them with the 'prettyprint' using the method posted above.  

    Rich Prescott | Infrastructure Architect, Windows Engineer and PowerShell blogger | MCITP, MCTS, MCP

    Engineering Efficiency
    @Rich_Prescott
    Windows System Administration tool
    AD User Creation tool

    Wednesday, April 25, 2012 11:05 PM
    Moderator
  • How did you view them?


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 12:14 AM
  • I was able to read in my .XML files, modify them and save them with the 'prettyprint' using the method posted above.  

    Rich Prescott | Infrastructure Architect, Windows Engineer and PowerShell blogger | MCITP, MCTS, MCP

    Engineering Efficiency
    @Rich_Prescott
    Windows System Administration tool
    AD User Creation tool

    Rich - you are correct.  I went back and checked and teh format was maintained.  It wasn't in Net Framework 1.0 and mgiht not have been in PowerShell 1.0

    I have never checked the output of this like this but have had some Unicode files that wouldn't retain format.

    Perhaps the OP clobbered the files earlier by by saving the InnerXML as text which will not retain the format.

    Try:

    $xml.InnerXml | out-file test.xml

    So issue of formatting solved.

    If you make your own XML and save it it will not be formatted.  I suspect that we can get at the XMLWriter in the XML type and alter its default settings. 


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 12:34 AM
  • Now about the selection methods.  I go after the value nodes in resx files and come up with three differnt results.

    Rich - your method is this:
    $xml.selectNodes('/data/value')

    'data/value' only off of the root.

    This is second set that is closer:

    $xml.selectNodes('//data/value')

    This is the third:

    $xml.selectNodes('//value')

    The third one gets all 'value' nodes in the hierarchy.  For most replacements it is the third one we want to inspect.  Controls can be embedded to any depth although ostly we want //data/value but sometimes we want all value nodes.  It is the value nodes that are active in the schema. Everything else is navigational.

    In this case I am not sure.  In the past we wanted to inspect ALL value nodes.

    I used this because it gets closest to the text to test and replace.

    $xml.selectNodes('//data/value[text()]') | %{ $_.'#text' -replace 'the the','the' }


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 12:53 AM
  • Looking more closely at what you are doing I see that you ran your test on a non resx file because an resx file does not have the structure that you referenced.

    Most have $xml.root.data.value at a miinimum.  Some have very different structures.  Using teh XPAth \\ to inspect every norde eliminates the need to know which schema we are working on.

    $xml.schema is another common one.

    The last time I did this was when a programmer got smart with his 'grep' tool and sdid a wild card replace on some test for all of the resx files in a large project.  Using grep to fix this just caused more headaches.  We used the Xpath and did the fixes in phases.  Within a couple of hours we were able to compile almost all of the files.

    Don't even think of asking where the backup was.


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 1:07 AM
  • get-childitem "D:\work\"* -include *.resx -recurse -force | foreach-object {
        [xml]$XML = Get-Content $_.FullName
        $XML.data.value = $XML.data.value -replace "the the", "the"
        $XML.Save($_.FullName)
        }
    This should do the trick.  No need to worry about nodes or other complicated XML syntax.  You might need to update the $XML.Data.Value path for your files so that it captures all of the instances, but you get the gist.

    Rich Prescott | Infrastructure Architect, Windows Engineer and PowerShell blogger | MCITP, MCTS, MCP

    Engineering Efficiency
    @Rich_Prescott
    Windows System Administration tool
    AD User Creation tool

    I try this:

    get-childitem "D:\temp\"* -include *.resx -recurse -force | foreach-object {
        [xml]$XML = Get-Content $_.FullName
        $XML.root.data | foreach-object { $_.value = $_.value -replace "text", "foo3" }
        $XML.Save($_.FullName)
        }

    and this code also don't preserve formatting :(

    The codes

    get-childitem "D:\work\"* -include *.resx -recurse -force | foreach-object {
        [xml]$XML = Get-Content $_.FullName
        $XML.data.value = $XML.data.value -replace "text", "foo3"
        $XML.Save($_.FullName)
        }
    
    OR
    
    get-childitem "D:\work\"* -include *.resx -recurse -force | foreach-object {
        [xml]$XML = Get-Content $_.FullName
        $XML.root.data.value = $XML.root.data.value -replace "text", "foo3"
        $XML.Save($_.FullName)
        }

    don't work, maybe because root is always present and $XML.root.data return list of <data> nodes...

    Thursday, April 26, 2012 10:23 AM
  • nikolasha - Richard either dowsn't have te same files or he doesn't understand the problem.  His method can never work with this type of schema.

    Your files have been altered or you don't have the coretlt version of XML on your system.

    What version of PowerShell and which OS are you running.

    I have run through the thousands of RSX files IO have on my local system.  I have a very large projects folder with about 60medium sized projects in C# and VB.Net.  ALl of my Resx files are formatted. I have amfe copies of many and run them through the code I posted for you and they retain the formatting.

    When the latest version of the XML class saves a file that was opened with formaatting its dedfault behavios is to save the file with formating intact.  If you save teh file useing either Set-Content or with Out-File or if you edit it using MSXML4 COM object it will not retain the formatting.

    Look at some resx files that you have not touched or create a new small project and check the files. Are they properly formatted.  If so then use my code to make some easy change and sav ewith my code.  Does the file retain its formatting.  If npot then you are probably missing a service pack or there isa bug in a different platform.  I have only tested with PowerShell 2 on XP SP3 with all Net Framework patches.

    If there is an issue with saving on your system we can use the XMLWriter to re-format the files.


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 10:40 AM
  • It's very strange.

    I using Windows 7 Home Premium, PowerShell

    Major  Minor  Build  Revision
    -----  -----  -----  --------
    2      0      -1     -1   

    I have opened rex file in Notepad:

    and after executing script in Powershell:

    $files=Get-ChildItem "D:\temp\"* -include *.resx -recurse 
    $files|
         ForEach-Object{
              $xml=[xml](Get-Content $_)
              $xml.selectNodes('//data/value')|
                   ForEach-Object{$_.'#text' = ($_.'#text' -replace 'text','foo1') }               
         $xml.Save($_)
    }

    I open again this file in Notepad and formatting changed :-?

    Thursday, April 26, 2012 11:16 AM
  • It is still formatted it just uses different defaults.

    There ae about for standard methods of formatting XML>  The default is set to break after every closure and to indent for each level.  The original uses a different default that breaks in a tag if there is a value.

    The formattingis still there.  It is readable.  Why would you care.  Any tool you open this with will do it differently no matter how it looks in a text editor.  The XML will still work exactly the same.

    Personally I prefer this format.

    If you open this fiel in IE or XMLNotepad or PrimalXML or almost any other tool each one will format the display acording to its own preferences.  The formatting in the file usually has no effect because that is outside of teh XML specification.  XML allows whitespace but does not maintain it.  A save is up to the platform stream writer.  On Unix it would convert completely differntly.


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 11:39 AM
  • Here is an excellent Pretty printer for XML.  It can be altered to give you the format you are looking for.

    http://rkeithhill.wordpress.com/2006/08/10/cmdlet-style-xml-pretty-print-format-xml/


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 1:20 PM
  • I look at svn and I see that after $xml.Save() it replaces carriage return/new line with another non-printable symbol:

    Thursday, April 26, 2012 4:20 PM
  • That is a carriage return.  It is a special symbol used to denote  a carriage return when viewing text in a browser.

    What does it look lile in notepad?

    Only IE with teh XML viewer installed canview XML hierarchically.  By defaul teh browser willstring all whitesace and tags from an XML file preserving only carriage returns.


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 4:42 PM
  • In Windows notepad it looks like in pictures in above post.

    I suppose that correct for notepad \r\n (carriage return and new line) and Save method replace all \r\n (but in fact not all... some of it persist) only with \r and notepad display it like above.

    And one more question. Can I save xml file only if it have been modified? For now it save all xml files even if there is no modification in it.

    Thursday, April 26, 2012 5:06 PM
  • In Windows notepad it looks like in pictures in above post.

    I suppose that correct for notepad \r\n (carriage return and new line) and Save method replace all \r\n (but in fact not all... some of it persist) only with \r and notepad display it like above.

    And one more question. Can I save xml file only if it have been modified? For now it save all xml files even if there is no modification in it.

    If you are gettingonly retruns you culture setting may specify that. or you may have somew othe issue.  Use thet XMLwriter cmdlet if you want to cahnge it.

    I believe ther is a flag on the XML that determines if it has been modified or you can add a flag.  That is why we use the the 'content' call. It allows uses to skip.


    ¯\_(ツ)_/¯

    Thursday, April 26, 2012 7:33 PM
  • Thank you.

    And what do you think, how can I get count of all entries before replacing it?

    Friday, April 27, 2012 10:44 AM
  • Thank you.

    And what do you think, how can I get count of all entries before replacing it?

    Why would you want to count anything?


    ¯\_(ツ)_/¯


    • Edited by jrv Friday, April 27, 2012 10:54 AM
    Friday, April 27, 2012 10:53 AM
  • To see how much entries would be replaced...
    Friday, April 27, 2012 12:15 PM
  • To see how much entries would be replaced...

    Every collection has a 'count' property.

    $nodes=$xml.selectNodes('//data/value')

    $nodes.count


    ¯\_(ツ)_/¯

    Friday, April 27, 2012 12:20 PM
  • But I need to know exactly count of pattern to replace.

    $nodes=$xml.selectNodes('//data/value') or $nodes=$xml.selectNodes('//data/value[contains(.,"text")]')

    $nodes.count

    give me only count of all //data/value or count of //data/value that contain one or more "text",not count of "text" entries (and as you said [contains] can't be case-insensitive in MSXML...)

    Maybe I can do something like that?

    var counter = 0
    get-childitem "D:\temp\"* -include *.resx -recurse -force 
    | foreach-object { $xml=[xml](get-content $_.FullName) 
    $xml.selectNodes('//data/value[text()]') 
    | foreach-object{$_.'#text' -match  'text' 
    if($matches.count != 0) { counter += $matches.count } } 
    | select counter }

    or something like that:

    var counter = 0
    get-childitem "D:\temp\"* -include *.resx -recurse -force
     | select-xml -xpath "//data/value" 
     | foreach-object { $_.node.InnerXML -match '[\s\S]*?text[\s\S]*?' 
       if($matches.count != 0) { counter += $matches.count } 
     }
     | select counter

    Friday, April 27, 2012 2:04 PM