none
How to user PowerShell -replace to remove the dreaded em dash from a file name? RRS feed

  • Question

  • I have a bunch of .msg files in a directory and want to zip them up (to send them to our spam filtering company), but I keep getting errors on file names that contain the em dash (—).  I believe it's (char)0x2014. [edited]... the correct syntax is [char]0x2014.

    I wish to eliminate the em dash in the file name but I can't seem to find anything on the web about replacing em dashes.  When I paste '—', PowerShell replaces it with a '-'.

    I found this neat function that does a great job replacing multiple characters within a string.
    http://powershell.com/cs/blogs/tobias/archive/2011/04/28/multiple-text-replacement-challenge.aspx

    I am using the below code for my string replacement.

    Thank you for your help.

    function Replace-Text {
    param(
    [Parameter(Mandatory=$true)]
    $text,
    
    $replacementlist = "(-,,',,%,,$,,@,,#,,&,,’,"
    )
    Invoke-Expression ('$text' + -join $(
    foreach($e in $replacementlist.Split(',')) { 
    '.Replace("{0}","{1}")' -f $e, $(
    [void]$foreach.MoveNext()
    $foreach.Current) 
    } 
    )
    )
    }
    $path = '<path>'
    
    Get-ChildItem -Path $path | Rename-Item -NewName {(Replace-Text $_.Name).trim()}

    • Edited by pmabke Wednesday, September 24, 2014 6:55 PM
    Wednesday, September 24, 2014 2:08 PM

Answers

  • Here is a simple filter that will strip out all nonprintable chanracters.

    $newname=$file.Name -replace '[^a-zA-Z\.]'

    I just tested it on Win 7 which supports has a Unicode filesystem and it successfullstrips the bad characters.

    We also need to do this on files uploaded to SharePoint and OneDrive.

    The ~ and other similar characters are not allowed in may systems.  You should keep a translation log and tag all renamed files with some obvious tag.


    ¯\_(ツ)_/¯

    • Marked as answer by pmabke Wednesday, September 24, 2014 4:32 PM
    Wednesday, September 24, 2014 3:49 PM
  • $file.Name -replace ([char]0x2014)

    ¯\_(ツ)_/¯

    • Marked as answer by pmabke Wednesday, September 24, 2014 6:51 PM
    Wednesday, September 24, 2014 5:41 PM

All replies

  • What is the question?


    ¯\_(ツ)_/¯

    Wednesday, September 24, 2014 3:35 PM
  • 0x8212 is the Unicode character.  Depending on many things file names are not stored in Unicode in all systems.

    0x8212 is not an emdash. An emdash in ASCII is 0x97.

    I do not think there is an ASCII character for the character you posted.


    ¯\_(ツ)_/¯

    Wednesday, September 24, 2014 3:41 PM
  • Here is a simple filter that will strip out all nonprintable chanracters.

    $newname=$file.Name -replace '[^a-zA-Z\.]'

    I just tested it on Win 7 which supports has a Unicode filesystem and it successfullstrips the bad characters.

    We also need to do this on files uploaded to SharePoint and OneDrive.

    The ~ and other similar characters are not allowed in may systems.  You should keep a translation log and tag all renamed files with some obvious tag.


    ¯\_(ツ)_/¯

    • Marked as answer by pmabke Wednesday, September 24, 2014 4:32 PM
    Wednesday, September 24, 2014 3:49 PM
  • Thanks jrv for the tip.  It did remove the em dash as well as any other character that would cause a problem in a file name.  It would be helpful to be able to remove just the em dash, but your solution does work. 

    Thank you.

    Wednesday, September 24, 2014 4:32 PM
  • I think my answer is somewhere in the page (http://stackoverflow.com/questions/631406/what-is-the-difference-between-em-dash-151-and-8212) but I can't determine how to use that info using the -replace parameter.
    Wednesday, September 24, 2014 4:46 PM
  • $file.Name -replace ([char]0x2014)

    ¯\_(ツ)_/¯

    • Marked as answer by pmabke Wednesday, September 24, 2014 6:51 PM
    Wednesday, September 24, 2014 5:41 PM
  • Perfect.  Thanks so much.

    I altered the original code as below:

    function Replace-Text {
    param(
    [Parameter(Mandatory=$true)]
    $text,
    $em = ([char]0x2014),
    $replacementlist = "$em,,-,,',,%,,$,,@,,#,,&,,’,"
    )
    Invoke-Expression ('$text' + -join $(
    foreach($e in $replacementlist.Split(',')) { 
    '.Replace("{0}","{1}")' -f $e, $(
    [void]$foreach.MoveNext()
    $foreach.Current) 
    } 
    )
    )
    }

    • Edited by pmabke Wednesday, September 24, 2014 6:56 PM
    Wednesday, September 24, 2014 6:51 PM
  • $fixed=$string -replace  "[$em-%$@#&]"

    ¯\_(ツ)_/¯

    Wednesday, September 24, 2014 7:04 PM