none
Search and replace for Unicode characters RRS feed

  • Question

  • Hello,

    I have a function which searches and replaces characters on file. It works with ASCI sharacters, but not when the strings which needs to be replaced contains Unicode (', 'ā') etc.  The source file is codding utf8 .

    $file = "file.txt" 

    $SearchReplace = @($file)
    #Process files by performing a search and replace
    foreach ($file in $SearchReplace) 
    {
    #Select-Object -Skip 1 |
        (Get-Content $file) | 
            Foreach-object { $_ -replace 'unicode_string' , ';'   } | 
         out-file -encoding Unicode $file
    }

    How to get working the search (and replace) function with Unicode characters?


    Thanks!



    • Edited by techcons Wednesday, July 30, 2014 10:10 AM
    Wednesday, July 30, 2014 10:02 AM

Answers

  • Potentially. I would say it has to be something to do with your environment although I have to admit I'm not aware of any reason it would fail. I was testing on Win7 Enterprise with PS V4 so there is a pretty big difference in terms of version. Have you tried running it through the console and the ISE? I think the console had some Unicode limitations in V2 - I'm not sure it would cause the script to fail though.
    • Marked as answer by techcons Wednesday, July 30, 2014 1:32 PM
    Wednesday, July 30, 2014 1:26 PM

All replies

  • Does it work if you specify the encoding for Get-Content?

    (Get-Content $file -encoding UTF8)

    Wednesday, July 30, 2014 10:45 AM
  • No. it does not. I have verified that script does not recognize the diacritic (', 'ā') characters when at all the operations with the files I have specified encoding utf-8/Unicode.




    • Edited by techcons Wednesday, July 30, 2014 12:45 PM
    Wednesday, July 30, 2014 12:31 PM
  • Do you receive any errors? Are you able to share the contents of one of the files you are having trouble with?
    Wednesday, July 30, 2014 12:45 PM
  • $file = "test.txt" 
    $filenamesSearchReplace_1 = @($file)
    foreach ($file in $filenamesSearchReplace_1) 
    {
        (Get-Content -encoding utf8 $file) | 
            Foreach-object { $_ -replace 'tēst fīleūš' , ';'   } | 
         out-file -encoding Unicode $file
    }

    File test.txt

    wret sddas tēst fīleūš dasd dsfcs 

    Updated:  I have not received any errors.


    • Edited by techcons Wednesday, July 30, 2014 1:06 PM
    Wednesday, July 30, 2014 12:54 PM
  • Using that file and your code worked perfectly for me - tēst fīleūš was replaced with a semicolon. Does it error at all when you run it?

    Wednesday, July 30, 2014 1:07 PM
  • No, it simply ignores the characters. 

    I have Windows Server 2003 with Powerhsell 2.0. Maybe this can be the problem?

    I have language settings for Unicode programs setup correctly.



    • Edited by techcons Wednesday, July 30, 2014 1:12 PM
    Wednesday, July 30, 2014 1:11 PM
  • Potentially. I would say it has to be something to do with your environment although I have to admit I'm not aware of any reason it would fail. I was testing on Win7 Enterprise with PS V4 so there is a pretty big difference in terms of version. Have you tried running it through the console and the ISE? I think the console had some Unicode limitations in V2 - I'm not sure it would cause the script to fail though.
    • Marked as answer by techcons Wednesday, July 30, 2014 1:32 PM
    Wednesday, July 30, 2014 1:26 PM
  • You have pointed me in right direction. When opening file with ISE I get ANSI charset instead of notepad. When I changed it, search and replace has started working!

    Thanks!

    Wednesday, July 30, 2014 1:32 PM