locked
saving XML file changes text case of UTF-8 to lowercase RRS feed

  • Question

  • I noticed when needing to change the encoding to UTF with no BOM the UTF-8 is saved to lower case.  

    I found a work-around by using the $xml.outerxml  then saving that string.  but this seems odd that that the case would be overwritten.  I could see if it weren't there. 

    A small code snip example below to show the issue. 

    $xml = [xml]@"
    <?xml version="1.0" encoding="UTF-8"?>
    <test>
        <example>required</example>.
    </test>
    "@
    #$xml.Save('c:\temp\lower.xml')
    
    $utf8WithoutBom = New-Object System.Text.UTF8Encoding($false)
    
    $settings = New-Object System.Xml.XmlWriterSettings
    $settings.Encoding = $utf8WithoutBom
    $settings.CheckCharacters = $false
    $sw = [System.Xml.XmlWriter]::Create( 'c:\temp\lower.xml',$settings )
    
    #New-Object System.IO.StreamWriter($newFileName, $false, $utf8WithoutBom)
    $xml.Save($sw)
    $sw.Close()
    
    #uncommenting line below will save with the expected upper case as in the $xml variable 
    #[System.IO.File]::WriteAllLines('c:\temp\lower.xml',$xml.OuterXml,$utf8WithoutBom)

     I was hoping if there was a better answer to why and if there is a setting I could set so wouldn't need to use WriteAllLines.
     

    Thanks in advance.


    Joe--

    Wednesday, June 3, 2020 10:48 PM

Answers

  • The default is set by Windows which is always utf-8 NOBOM.  utf-8 is defined to have no BOM and use of a BOM is heavily discouraged.

    XML only supports utf-8 (no BOM) and UTF-16.

    Old Windows defaulted to ANSI\ASCII

    With the [xm] type the encoding is taken from the header if specified.  You cannot override this behavior.


    \_(ツ)_/


    • Edited by jrv Thursday, June 4, 2020 3:03 AM
    • Marked as answer by rainmakers Thursday, June 4, 2020 9:07 PM
    Thursday, June 4, 2020 3:02 AM

All replies

  • XML is case sensitive.  You cannot change the case of any elements.

    The "Save" method will not and does not change the case no matter how you option it.

    \_(ツ)_/



    • Edited by jrv Thursday, June 4, 2020 12:22 AM
    Thursday, June 4, 2020 12:20 AM
  • Please try out the sample code i provided that changes the case of the UTF-8 to utf-8.

    Joe--


    Joe--

    Thursday, June 4, 2020 12:48 AM
  • That part is normally lowercase and should be lowercase although the XML line is not case sensitive by design.  There is no point in worrying about it.

    Also I cannot reproduce your issue.  When I save the XML with an uppercase anything it saves it as it is.

    The default in Windows 10 and later is utf-8 with BOM.


    \_(ツ)_/

    Thursday, June 4, 2020 1:20 AM
  • When you use "settings" the xml header is always rewritten according to XML standards.  XML standards require teh encoding to be stated in lower case.  When you just use "$xml.Save(file)" the save does not rewrite the header it just saves what it has.

    \_(ツ)_/

    Thursday, June 4, 2020 1:27 AM
  • If you had powershell 7 or pwsh, the default is utf8 no bom.


    (get-content file.xml) | set-content file.xml

    • Edited by JS2010 Thursday, June 4, 2020 2:53 AM
    Thursday, June 4, 2020 2:53 AM
  • The default is set by Windows which is always utf-8 NOBOM.  utf-8 is defined to have no BOM and use of a BOM is heavily discouraged.

    XML only supports utf-8 (no BOM) and UTF-16.

    Old Windows defaulted to ANSI\ASCII

    With the [xm] type the encoding is taken from the header if specified.  You cannot override this behavior.


    \_(ツ)_/


    • Edited by jrv Thursday, June 4, 2020 3:03 AM
    • Marked as answer by rainmakers Thursday, June 4, 2020 9:07 PM
    Thursday, June 4, 2020 3:02 AM
  • Thanks for letting me know it isn't changeable behavior.  I at least glad I have the workaround to write it to the file using the .net approach.


    Joe--

    Thursday, June 4, 2020 9:08 PM
  • Thanks for letting me know it isn't changeable behavior.  I at least glad I have the workaround to write it to the file using the .net approach.


    Joe--

    Yes but using lowercase is just wrong.  Read the XML spec.  It is required to be lowercase which is why rewriting it places it in as lowercase.

    Only the body of an XML document - the "Document" element - is case preserving.  All metatags and commands are required to be in the case specified which is nearly always lowercase.  Forcing a required element to be in the wrong case can cause failures in many systems.


    \_(ツ)_/

    Thursday, June 4, 2020 9:38 PM
  • True enough, but my source XML file is delivered from the 3rd party vendor this way.   For some reason their product wants it. 

    Joe--

    Thursday, June 4, 2020 10:03 PM
  • How do you know their product wants this.  I doubt that you are right.  Requiring it would break their product in a way that would make it useless as an XML product.  I think you are just making an assumption with no proof of the assumption.

    Of course if you are talking about some small hand-built exe written by an independent programmer who has not training in XML then this might be possible although I do not see any reason that any programmer would enforce such a rule.

    An inexperienced and untrained programmer might choose to parse XML as text and then they could have made a mistake of parsing the text in a case sensitive way.  If this were the case I would contact the programmer and have them fix the code.  The fix would be trivial.


    \_(ツ)_/

    Friday, June 5, 2020 12:22 AM