none
Bug? Invoke-RestMethod and UTF-8 data RRS feed

  • Question

  • Using PowerShell 3.0 I have a RESTful get service I'm calling that returns UTF-8 data and it appears I've found a bug, but I'm hoping someone has a work-around. The service returns a person's first name, and in one instance the name is Paulé.

    Among other settings in the headers I have indicated Accept-Charset = 'UTF-8' however when the data is returned like this:

    $data = Invoke-RestMethod -Uri $url -Headers $header -Method Get -ContentType 'application/json; charset=utf-8'

    the name appears like this in the variable: Paulé

    However, if I change the invoke call to save the data to a file, the name comes back correctly:

    Invoke-RestMethod -Uri $url -Headers $header -Method Get -ContentType 'application/json;charset=utf-8' -OutFile "C:\temp\response.txt"
    $data = (gc c:\Temp\response.txt -Encoding UTF8) | ConvertFrom-Json
    Now when I inspect the name in the $data variable, it appears correct: Paulé

    This appears to be a bug me, and I'd really like to avoid writing the output to a file and then reading it back in. Does anyone have any suggestions on how to get Invoke-RestMethod to work correctly saving the results to a variable?  Short of abandoning Invoke-RestMethod and writing my own function, is there anyway to avoid writing to disk?

    Thanks for you help!


    • Edited by wtgreen Tuesday, April 1, 2014 5:57 PM
    Tuesday, April 1, 2014 5:56 PM

Answers

  • Looks like the Invoke-RestMethod cmdlet bases its decoding on the result of the HttpWebResponse.CharacterSet property, and if that is not set, it uses a default encoding of ISO-8859-1 by default. Based on some tests, that default encoding seems to be what's happening here:

    $defaultEncoding = [System.Text.Encoding]::GetEncoding('ISO-8859-1')
    
    $string = 'Paulé'
    
    $utf8Bytes = [System.Text.Encoding]::UTf8.GetBytes($string)
    
    $decoded = $defaultEncoding.GetString($utf8bytes)
    
    $object = New-Object psobject -Property @{
        Original = $string
        Decoded  = $decoded
    }
    
    $object | Format-Table -AutoSize
    
    <#
    Output:
    
    Decoded Original
    ------- --------
    Paulé  Paulé   
    #>

    I'm not sure where the HttpWebResponse class gets that CharacterSet information (which HTTP header it reads), or whether it's using ISO-8859-1 here because your web server told it to do that, or because the web server didn't give it anything, and it used its default.  It 'may' be possible to fix this on the server end if you can figure out how to set the proper HTTP header so that the WebRequest / Response classes see the stream as UTF-8.

    Otherwise, you could do something similar to what I just did in reverse, decoding the ISO-8859-1 string back to byte form, then re-encoding it as UTF8.  I still think that's a lot of work compared to what you've done, just dumping the raw binary data to a file and then interpreting it as UTF-8 to begin with.


    Tuesday, April 1, 2014 9:57 PM

All replies

  • Have you tried using Unicode instead of UTF8?  .NET Strings are all stored as Unicode in memory, anyway, and it might avoid whatever conversion bug you seem to have discovered.

    Aside from that, I'm not sure how to proceed without being able to reproduce the issue on my own computer.  I don't suppose this URL is publicly accessible?

    Tuesday, April 1, 2014 6:44 PM
  • I'm not sure how exactly I could try using Unicode. The service I'm calling returns UTF-8 data so that's what I need to specify.

    And correct, unfortunately it's not a public URL. I wish I knew a public service where I could test UTF-8 output but unfortunately I don't.

     
    Tuesday, April 1, 2014 7:09 PM
  • Well, you could possibly try using the underlying .NET WebRequest class yourself, but that's likely to wind up being more work than the workaround you've already got (though it may perform slightly better, by working in memory instead of writing to disk.)
    Tuesday, April 1, 2014 7:18 PM
  • Yeah, I might have to. What's particularly frustrating is that per the JSON spec, UTF-8 is the default. I don't understand why I have to jump through hoops to get it to work.

    3.  Encoding
    
       JSON text SHALL be encoded in Unicode.  The default encoding is
       UTF-8.

    RFC 4627


    • Edited by wtgreen Tuesday, April 1, 2014 9:35 PM corrected link text
    Tuesday, April 1, 2014 9:32 PM
  • Looks like the Invoke-RestMethod cmdlet bases its decoding on the result of the HttpWebResponse.CharacterSet property, and if that is not set, it uses a default encoding of ISO-8859-1 by default. Based on some tests, that default encoding seems to be what's happening here:

    $defaultEncoding = [System.Text.Encoding]::GetEncoding('ISO-8859-1')
    
    $string = 'Paulé'
    
    $utf8Bytes = [System.Text.Encoding]::UTf8.GetBytes($string)
    
    $decoded = $defaultEncoding.GetString($utf8bytes)
    
    $object = New-Object psobject -Property @{
        Original = $string
        Decoded  = $decoded
    }
    
    $object | Format-Table -AutoSize
    
    <#
    Output:
    
    Decoded Original
    ------- --------
    Paulé  Paulé   
    #>

    I'm not sure where the HttpWebResponse class gets that CharacterSet information (which HTTP header it reads), or whether it's using ISO-8859-1 here because your web server told it to do that, or because the web server didn't give it anything, and it used its default.  It 'may' be possible to fix this on the server end if you can figure out how to set the proper HTTP header so that the WebRequest / Response classes see the stream as UTF-8.

    Otherwise, you could do something similar to what I just did in reverse, decoding the ISO-8859-1 string back to byte form, then re-encoding it as UTF8.  I still think that's a lot of work compared to what you've done, just dumping the raw binary data to a file and then interpreting it as UTF-8 to begin with.


    Tuesday, April 1, 2014 9:57 PM
  • had a similar issue with Invoke-RestMethod and the encoding while using -Method PUT...

    but its working fine on windows10/server2016.... so i moved my scripts there

    Wednesday, July 12, 2017 8:51 AM