locked
PowerShell xml parsing script change encoding RRS feed

  • Question

  • Hi guys

    I have an xml file with Shift-JIS encoding inside, there are Japanese and English characters, and after succesfuly parsing this file I see Japan characters broken. Can anyone help me to understand the cause of this issue?

    Thanks in advance!

    Best regards


    • Edited by aleksjal Friday, August 17, 2018 6:56 AM
    Friday, August 17, 2018 6:55 AM

Answers

  • Hi!

    Ok finally this one  Get-Content file.xml -Encoding Default | Set-Content -Encoding UTF8 newfile1.xml converts encoding to correct UTF8 and made the trick. Hope this will be useful for someone!

    Thanks for the help!

    • Marked as answer by aleksjal Friday, August 24, 2018 11:28 AM
    Friday, August 24, 2018 11:15 AM

All replies

  • Hi,

    Thanks for your question.

    How do you parse the xml file? If you use powershell to parse it, please  paste your script.

    Best regards,

    Lee


    Just do it.

    Friday, August 17, 2018 7:54 AM
  • Unicode characters that you cannot display.  You do not have support for that language installed.  YOU may also be using plain text parsing methods when you need to use the XML class to correctly read XML files.


    \_(ツ)_/


    • Edited by jrv Friday, August 17, 2018 9:44 AM
    Friday, August 17, 2018 9:42 AM
  • Hi Lee

    Thank for the response, here is the script

    I  have found it and modified a bit, it makes its function good but encoding the issue.

    Even if I comment Export-Csv the encoding is not correct so do not pay attention to that block.

    # Path to downloaded file
    $dir_in = 'd:\uploads\'
    # Path to SalesForce Directory
    $dir_out = 'D:\CSV\\K01' + (Get-Date -Format MMddyyyyhhmm) + '.xml'
    # Check if file exists
    if (Test-Path  ($i.fullname + "d:\uploads\*.xml"))
    {
    
    # process the XML files
    Get-ChildItem -Path $dir_in  -Filter K0BIZ*.xml | ForEach-Object {
    
      # retrieve relevant values
      $Encoding = [System.Text.Encoding]::GetEncoding("Shift_Jis")
      $businessname = Select-Xml -XPath '//Businessname' -Path $_.FullName -ErrorAction SilentlyContinue
      $orderno = Select-Xml -XPath '//Orderno' -Path $_.FullName -ErrorAction SilentlyContinue
      $businessreceptnumber = Select-Xml -XPath '//BusinessReceptNumber' -Path $_.FullName -ErrorAction SilentlyContinue
      $servicename = Select-Xml -XPath '//Servicename' -Path $_.FullName -ErrorAction SilentlyContinue
      $serviceitem = Select-Xml -XPath '//Serviceitem' -Path $_.FullName -ErrorAction SilentlyContinue
      $accesskey = Select-Xml -XPath '//Accesskey' -Path $_.FullName -ErrorAction SilentlyContinue
      $appointdate = Select-Xml -XPath '//Appointdate' -Path $_.FullName -ErrorAction SilentlyContinue
      $appointampm = Select-Xml -XPath '//AppointAmPm' -Path $_.FullName -ErrorAction SilentlyContinue
      $dispatchdiv = Select-Xml -XPath '//DispatchDiv' -Path $_.FullName -ErrorAction SilentlyContinue
      $so = Select-Xml -XPath '//So' -Path $_.FullName -ErrorAction SilentlyContinue
      $sostat = Select-Xml -XPath '//Sostat' -Path $_.FullName -ErrorAction SilentlyContinue
    
      # if values were retrieved succesfully
      if ($businessname -and $orderno -and $businessreceptnumber -and $servicename -and $serviceitem -and $accesskey-and $appointdate -and $appointampm -and $dispatchdiv -and $so -and $sostat) {
    
        # create a custom PSObject and set the values to corresponding properties
        New-Object PSObject -Property @{
          Businessname = $businessname.Node.InnerText
          Orderno = $orderno.Node.InnerText
          BusinessReceptNumber = $businessreceptnumber.Node.InnerText
          Servicename = $servicename.Node.InnerText
          Serviceitem = $serviceitem.Node.InnerText
          Accesskey = $accesskey.Node.InnerText
          Appointdate = $appointdate.Node.InnerText
          AppointAmPm = $appointampm.Node.InnerText
          DispatchDiv = $dispatchdiv.Node.InnerText
          So = $so.Node.InnerText
          Sostat = $sostat.Node.InnerText
    
        }
      }
    
      # clear values
      Clear-Variable businessname, orderno, businessreceptnumber, servicename, serviceitem, accesskey, appointdate, appointampm, dispatchdiv, so, sostat
    
        # export custom object to the CSV dir_out
    } | Export-Csv -Path $dir_out -NoTypeInformation -Encoding Default -Verbose  
    # Remove empty files
    #if ((Get-Content "D:\CSV\*.csv") -eq $Null){
    #Remove-Item D:\CSV\*.csv
    #}
    }


     Thanks, 

    Aleks

          
    • Edited by aleksjal Friday, August 17, 2018 11:04 AM
    Friday, August 17, 2018 10:52 AM
  • Hi, jrv

    The Windows 10 is Japan language version. 

    Friday, August 17, 2018 10:53 AM
  • First - you should not be using "-ErrorAction SilentlyContinue"

    You fail to say what is an issue. 

    Write the minimal amount of code needed to produce the problem.  Post the example XML as  XML.

    Post your code correctly with the code posting tool provided.  What you have posted is mostly unreadable.


    \_(ツ)_/

    Friday, August 17, 2018 10:57 AM
  • The following line serves absolutely no purpose.

    $Encoding = [System.Text.Encoding]::GetEncoding("Shift_Jis")

    The following line also makes no sense.  "$I" is never defined and adding the fullname of a file to the beginning of a path is also wrong.

    if (Test-Path  ($i.fullname + "d:\uploads\*.xml")){


    \_(ツ)_/

    Friday, August 17, 2018 11:09 AM
  • If you format and post your code correctly many of the mistakes will become obvious.

    Here is a quick reformat and removal of some useless lines.  Notice the code structure is more visible.

    $dir_in = 'd:\uploads\*' #downloaded file path
    $dir_out = 'D:\CSV\\K01' + (Get-Date -Format MMddyyyyhhmm) + '.xml'
    
    Get-ChildItem -Path $dir_in  -Filter K0BIZ*.xml | 
        ForEach-Object {
    
            # retrieve relevant values
            $businessname = Select-Xml -XPath '//Businessname' -Path $_.FullName
            $orderno = Select-Xml -XPath '//Orderno' -Path $_.FullName
            $businessreceptnumber = Select-Xml -XPath '//BusinessReceptNumber' -Path $_.FullName
            $servicename = Select-Xml -XPath '//Servicename' -Path $_.FullName
            $serviceitem = Select-Xml -XPath '//Serviceitem' -Path $_.FullName
            $accesskey = Select-Xml -XPath '//Accesskey' -Path $_.FullName
            $appointdate = Select-Xml -XPath '//Appointdate' -Path $_.FullName
            $appointampm = Select-Xml -XPath '//AppointAmPm' -Path $_.FullName
            $dispatchdiv = Select-Xml -XPath '//DispatchDiv' -Path $_.FullName
            $so = Select-Xml -XPath '//So' -Path $_.FullName
            $sostat = Select-Xml -XPath '//Sostat' -Path $_.FullName
    
            # if values were retrieved succesfully
            if ($businessname -and $orderno -and $businessreceptnumber -and $servicename -and $serviceitem -and $accesskey-and $appointdate -and $appointampm -and $dispatchdiv -and $so -and $sostat) {
                [pscustomobject]@{
                    Businessname = $businessname.Node.InnerText
                    Orderno = $orderno.Node.InnerText
                    BusinessReceptNumber = $businessreceptnumber.Node.InnerText
                    Servicename = $servicename.Node.InnerText
                    Serviceitem = $serviceitem.Node.InnerText
                    Accesskey = $accesskey.Node.InnerText
                    Appointdate = $appointdate.Node.InnerText
                    AppointAmPm = $appointampm.Node.InnerText
                    DispatchDiv = $dispatchdiv.Node.InnerText
                    So = $so.Node.InnerText
                    Sostat = $sostat.Node.InnerText
                }
            }
        } | 
        Export-Csv -Path $dir_out -NoTypeInformation -Encoding Default -Verbose  


    \_(ツ)_/


    • Edited by jrv Friday, August 17, 2018 11:14 AM
    Friday, August 17, 2018 11:13 AM
  • First - you should not be using "-ErrorAction SilentlyContinue"

    1. What is the reason not using it?

    You fail to say what is an issue. 

    2. Sure I did! "and after succesfuly parsing this file I see Japan characters broken"

    Write the minimal amount of code needed to produce the problem.  Post the example XML as  XML.

    3. Like this and it is ok with parsing, the problem I have is broken Japanese characters after like "???"

    <Root>
      <DataPart>
        <Businessname>55</Businessname>
    	<Sostat>SO発出済み</Sostat>

    Post your code correctly with the code posting tool provided.  What you have posted is mostly unreadable.

    4. Sure! Corrected

    Thanks!


    \_(ツ)_/


    Friday, August 17, 2018 11:16 AM
  • Rereading the xml file repeatedly is also a possible issue.  This is how to do this efficiently and without the CmdLet converting the file.

            [xml]$xml = Get-Content $_.FullName
            $businessname = $xml.SelectSingleNode('//Businessname').InnerText
            $orderno = $xml.SelectSingleNode('//Orderno').InnerText
            $businessreceptnumber = $xml.SelectSingleNode('//BusinessReceptNumber').InnerText
            $servicename = $xml.SelectSingleNode('//Servicename').InnerText
            $serviceitem = $xml.SelectSingleNode('//Serviceitem').InnerText
            $accesskey = $xml.SelectSingleNode('//Accesskey').InnerText
            $appointdate = $xml.SelectSingleNode('//Appointdate').InnerText
            $appointampm = $xml.SelectSingleNode('//AppointAmPm').InnerText
            $dispatchdiv = $xml.SelectSingleNode('//DispatchDiv').InnerText
            $so = $xml.SelectSingleNode('//So').InnerText
            $sostat = $xml.SelectSingleNode('//Sostat').InnerText
    Note that the "InnerText" of a node will contain the text of ALL child nodes.


    \_(ツ)_/



    • Edited by jrv Friday, August 17, 2018 11:27 AM
    Friday, August 17, 2018 11:22 AM
  • First - you should not be using "-ErrorAction SilentlyContinue"

    1. What is the reason not using it?

    You fail to say what is an issue. 

    2. Sure I did! "and after succesfuly parsing this file I see Japan characters broken"

    Write the minimal amount of code needed to produce the problem.  Post the example XML as  XML.

    3. Like this and it is ok with parsing, the problem I have is broken Japanese characters after like "???"

    <Root>
      <DataPart>
        <Businessname>55</Businessname>
    	<Sostat>SO発出済み</Sostat>

    Post your code correctly with the code posting tool provided.  What you have posted is mostly unreadable.

    4. Sure! Corrected

    Thanks!


    \_(ツ)_/


    The PowerShell console cannot display many character sets and encodings.  The correct characters will be exported if the output file encoding is set to Unicode.  There are also some encoding bugs in PowerShell and Net that were supposed to be patched in a recent release.  You might want to contact MS support to see if you are missing a patch.

    Without a complete XML file it will be hard to know how this is intended to be handled.  Upload a sample file to a sharing drive.


    \_(ツ)_/

    Friday, August 17, 2018 11:25 AM
  • Oh yes I know that. Just my experiments!
    Friday, August 17, 2018 11:28 AM
  • Oh yes I know that. Just my experiments!

    ??? Know what?


    \_(ツ)_/

    Friday, August 17, 2018 11:57 AM
  • First - you should not be using "-ErrorAction SilentlyContinue"

    1. What is the reason not using it?

    You fail to say what is an issue. 

    2. Sure I did! "and after succesfuly parsing this file I see Japan characters broken"

    Write the minimal amount of code needed to produce the problem.  Post the example XML as  XML.

    3. Like this and it is ok with parsing, the problem I have is broken Japanese characters after like "???"

    <Root>
      <DataPart>
        <Businessname>55</Businessname>
    	<Sostat>SO発出済み</Sostat>

    Post your code correctly with the code posting tool provided.  What you have posted is mostly unreadable.

    4. Sure! Corrected

    Thanks!


    \_(ツ)_/


    The PowerShell console cannot display many character sets and encodings.  The correct characters will be exported if the output file encoding is set to Unicode.  There are also some encoding bugs in PowerShell and Net that were supposed to be patched in a recent release.  You might want to contact MS support to see if you are missing a patch.

    Without a complete XML file it will be hard to know how this is intended to be handled.  Upload a sample file to a sharing drive.


    \_(ツ)_/

    How can I privately share with you sample XML file?
    Friday, August 17, 2018 11:57 AM
  • Put it "zipped" on a public file sharing site.  Replace any private data first.

    \_(ツ)_/


    • Edited by jrv Friday, August 17, 2018 12:13 PM
    Friday, August 17, 2018 12:13 PM
  • Got it.

    The XML works as required but cannot be displayed in the console or PowerShell.  The console does not support the character set.  You can still save the results to a file and it should remain OK.'


    \_(ツ)_/

    Friday, August 17, 2018 12:38 PM
  • This is all you need to do to get the data from these files:

    $dir_in = 'd:\uploads\*' #downloaded file path
    $dir_out = 'D:\CSV\\K01' + (Get-Date -Format MMddyyyyhhmm) + '.xml'
    
    Get-ChildItem -Path $dir_in  -Filter K0BIZ*.xml | 
        ForEach-Object {
            [xml]$xml = Get-Content $_.FullName
            $xml.SelectSingleNode('//DataPart')
        } | 
        select Businessname,Orderno,BusinessReceptNumber,Servicename,Serviceitem,
               Accesskey,Appointdate,AppointAmPm,DispatchDiv,So,Sostat |
        Export-Csv -Path $dir_out -NoTypeInformation -Encoding ASCII

    If you want to filter the records just add a "Where" after the "Select".


    \_(ツ)_/



    • Edited by jrv Friday, August 17, 2018 12:57 PM
    Friday, August 17, 2018 12:49 PM
  • Yes XML worked fine, even with my mistakes :), but when I tried to pipe to file, like this for example -  | Out-File -Path $dir_out the resulted XML has all fine inside with English but Japan is "????", and encoding seems changes to UTF8. If you open the file in notepad++ you`ll see the encoding is Shift-JIS.
    Friday, August 17, 2018 12:51 PM
  • The file you posted is encoded as ANSI.  The character set for the XML is Shift-Jis.  When displayed in an XML tool like IE the characters are interpreted correctly.  The console cannot do this.

    You export should look like this to retain the characters:

    Export-Csv -Path $dir_out -NoTypeInformation -Encoding ASCII


    \_(ツ)_/

    Friday, August 17, 2018 12:57 PM
  • The "ASCII" will retain the XML correctly but the characters will be converted incorrectly.  You will likely have to convert the XML to get it to save correctly in Unicode.


    \_(ツ)_/

    Friday, August 17, 2018 1:00 PM
  • Yes I tried ASCII also but your conclusion about converting it to Unicode is same as I came to.
    Friday, August 17, 2018 1:07 PM
  • Also Windows 10 does not have ab encoding for the one used in the original file.  It has JP and 2JP but they are not correct.


    \_(ツ)_/

    Friday, August 17, 2018 1:13 PM
  • Anyway thank you very much! You helped me much! Say Hi to New-York! Can buy you a coffee the best you like! Paypal?
    Monday, August 20, 2018 4:43 AM
  • Just to be sure this is what happens when trying to save the XML:

    PS D:\scripts> $xml.Save("$pwd\testfix.xml")
    Exception calling "Save" with "1" argument(s): "'Windows-31J' is not a supported encoding name. For information on defining a custom encoding, see the
    documentation for the Encoding.RegisterProvider method.
    Parameter name: name"
    At line:1 char:1
    + $xml.Save("$pwd\testfix.xml")


    \_(ツ)_/

    Monday, August 20, 2018 4:51 AM
  • Hi!

    Ok finally this one  Get-Content file.xml -Encoding Default | Set-Content -Encoding UTF8 newfile1.xml converts encoding to correct UTF8 and made the trick. Hope this will be useful for someone!

    Thanks for the help!

    • Marked as answer by aleksjal Friday, August 24, 2018 11:28 AM
    Friday, August 24, 2018 11:15 AM