none
PS Unicode to Ascii conversion

    Question

  • Hello All,

    I am trying to resolve the problem of converting unicode file into Ansi or Ascii. 

    get-content t1.txt  or gc t1.txt -encoding ascii  - do not work.

    I wish I would be able to attach this file for reference. May be something wrong with the file, but here is what I noticed:

    Notepad opens my problematic file as any other file and can save it in 'ansi' format acceptable for my goals.

    Notepad may save any file in any type of Unicode file. Then this file can be converted by 'gc' command.

    However, if notepad is not involved, my file can not be currently converted by 'gc' command.


    gene

    Monday, September 10, 2012 3:01 PM

All replies

  • get-content UNICODE.txt | out-file -encoding ASCII ASCII.txt
    
    • Edited by Larry Weiss Monday, September 10, 2012 3:24 PM
    Monday, September 10, 2012 3:23 PM
  • Larry's response is good.

    If you want to do it another (more manual) way - you can do this the following.

    (I have a unicode text file called utext.txt)

    $encoding= [System.Text.Encoding]::ASCII
    $uencoding = [System.Text.Encoding]::UNICODE
    
    [System.Text.Encoding]::Convert([System.Text.Encoding]::UNICODE, $encoding, $uencoding.GetBytes((Get-Content .\utest.txt))) | % { $myStr += [char]$_}
    
    $myStr  # this will now be ascii encoded
    
    

    Other variations of this type of logic can be used.

    Documentation:

    http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx


    G. Samuel Hays

    Monday, September 10, 2012 3:30 PM
  • Thank you Larry, Samuel for your respond.

    Here is where I am with the problem:

    When i run edit t1.txt, each character visually followed by space and at the end of each line I see these funny characters, see output below.

    After I run the line you recommended, I still see characters with space after it. Funny characters have dissapeared.

    vi editor still does not understand them. May be you could help me to understand what is the real problem here.

     C R E A T E   P R O C E D U R E   [ M H M a r g i n ] . [ p U p d a t e T r a░
                             @ m g p I d     u M a r g i n G r o u p I D ♪        ░
             A S ♪                                                                ░
             B E G I N ♪                                                          ░
     ♪                                                                            ░
             S E T   N O C O U N T   O N ♪                                        ░
       ♪                                                                          ░
     ♪                                                                            ░

    ---- vi editor:

    I^?F^? ^? ^?E^?X^?I^?S^?T^?S^? ^?(^?S^?E^?L^?E^?C^?T^? ^?*^? ^?F^?R^?O
    ^?M^? ^?s^?y^?s^?.^?o^?b^?j^?e^?c^?t^?s^? ^?W^?H^?E^?R^?E^? ^?o^?b^?j^
    ?e^?c^?t^?_^?i^?d^? ^?=^? ^?O^?B^?J^?E^?C^?T^?_^?I^?D^?(^?N^?'^?[^?M^?
    H^?M^?a^?r^?g^?i^?n^?]^?.^?[^?p^?U^?p^?d^?a^?t^?e^?T^?r^?a^?d^?e^?W^?i
    ^?t^?h^?S^?e^?c^?I^?n^?f^?o^?]^?'^?)^? ^?A^?N^?D^? ^?t^?y^?p^?e^? ^?i^
    ?n^? ^?(^?N^?'^?P^?'^?,^? ^?N^?'^?P^?C^?'^?)^?)^?

    Thank you in advance.


    gene

    Monday, September 10, 2012 4:31 PM
  • Are you sure the encoding of your file is Unicode?

    G. Samuel Hays

    Monday, September 10, 2012 4:35 PM
  • Is there a PowerShell script that given read access to a file will report the file's content's
    encoding?
     
    Monday, September 10, 2012 4:47 PM
  • http://poshcode.org/2153 that might do.

    G. Samuel Hays

    • Proposed as answer by Larry Weiss Tuesday, September 11, 2012 3:36 AM
    Monday, September 10, 2012 4:49 PM
  • Hello Larry, Samuel, thank you again for your valuable input.

    Here is what I get from runng get-encoding.ps1.

    Please explain me my problem :)

    BodyName          : utf-7
    EncodingName      : Unicode (UTF-7)
    HeaderName        : utf-7
    WebName           : utf-7
    WindowsCodePage   : 1200
    IsBrowserDisplay  : False
    IsBrowserSave     : False
    IsMailNewsDisplay : True
    IsMailNewsSave    : True
    IsSingleByte      : False
    EncoderFallback   : System.Text.EncoderReplacementFallback
    DecoderFallback   : System.Text.UTF7Encoding+DecoderUTF7Fallback
    IsReadOnly        : True
    CodePage          : 65000


    gene

    Monday, September 10, 2012 5:02 PM
  • What version of PS are you using?

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Monday, September 10, 2012 5:14 PM
  • Looks like there is a sister to Get-Encoding:

    Set-Encoding


    Grant Ward, a.k.a. Bigteddy


    Edit:  Although it won't solve the OP's problem, because it uses Get-Content straight away.
    • Edited by Bigteddy Monday, September 10, 2012 5:22 PM
    Monday, September 10, 2012 5:18 PM
  • PS V3 added an -Encoding parameter for get-content.  The parameter value enum includes Unknown, but it isn't clear if that will auto-detect.  You can explicitly declare UTF7.

    get-content file.txt -encoding UTF7 | 
      set-content newfile.txt -encoding ASCII


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Monday, September 10, 2012 5:23 PM
  • All, thank you for the time and attention to this matter.

    last suggesionwith -encoding UTF7 did not help to convert the file into Ascii formt.

    It may be silly, but at this time I am opening notepad for each of 5 - 30 files from vbscript, save it as Ansi file. Notepad does not care about particular type of encoding as long as it's some type of Unicode file.

    I was trying to find a way to code some similar functionality with powershell.


    gene

    Monday, September 10, 2012 5:38 PM
  • I don't know if this'll work... but give it a try.

    $encoding = [System.Text.Encoding]::ASCII
    $uencoding = [System.Text.Encoding]::UTF7
    
    [System.Text.Encoding]::Convert($uencoding, $encoding, $uencoding.GetBytes((Get-Content .\utest.txt))) | % { $myStr += [char]$_}
    
    $myStr

    I'm really just kind of curious. :)


    G. Samuel Hays

    Monday, September 10, 2012 6:16 PM
  • Samuel,

    can you help me to construct this command.  My input file is t1.txt. I understand I need to replace utest.txt --> t1.txt.

    Is there anything else needs to be done to adapt it?


    gene

    Monday, September 10, 2012 6:23 PM
  • That's correct - just replace .\utest.txt with the path of your t1.txt. 

    If that works, you can then pipe the $myStr to a file.  ALSO - I ran into a similar problem some years back with weird encoding that I didn't want.  I simply did the following from the command prompt:

    type t1.txt >> t1-ascii.txt

    At the time (and I believe its still the case) - type dynamically converts to ascii.  So - in that case (before powershell), I just wrote a batch to "type" all of the files into new files.  Might want to give that a try as it could be a hell of a lot easier. :)



    G. Samuel Hays

    Monday, September 10, 2012 6:26 PM
  • Samuel,

    I beleive the last } is extra. when I removed it, no output was produced.

    'Type' command was the first I tried. Then other things too. 'gc' converts every unicode type which could be created by Win7 notepad. But this file in particular it does not. I guess i will continue to call notepad in a loop from vbs. Thank you for your help.


    gene

    Monday, September 10, 2012 7:12 PM