locked
MS Word 2010 apostrophe encodes to a different value starting Feb 21, 2012 RRS feed

  • Question

  • We have a news website that has a page that uses a text box to allow people to enter news stories and have them uploaded to our database.

    It is common for people to write their stories in MS Word then cut and paste them into the text box on our website.

    Starting on Feb 21, 2012, we noticed that apostrophes in stories written in MS Word were being converted to a strange sequence of characters after being pasted into the text box.  The charcters are show below between the dashed line.

    ----------------------------------

    �

    -------------------------

    This causes a problem because we can no longer translate the MS Word apostrophe (') into a text based apostrophe.  I had to ad special code to translate the obove string to a text apostrophe in order to get this to work as it has before Feb 21, 2012.

    This seems to indicate a problem with MS Word 2012.


    Gary Frickey

    Wednesday, February 22, 2012 7:35 PM

Answers

  • After further testing, it appeared that the encoding problems were occuring on a postback after the textbox had been populated and before the textbox contents were captured in a response.request().

    I found a solution JavaScript solution - http://jhy.io/tools/convert-word-to-plain-text which I modified so that it will make the necessary translation using the "onblur" event and this has solved my problem.

    I am still not sure why we started having this problem now because we have been allowing Word contents to be pasted into this same text box for a while now and have not run into this problem.  Perhaps it is due to the fact that this current word document had smart quotes and smart apostrophes.

    Anyway, please consider this issue closed.

    Thanks for your time,

    Gary


    Gary Frickey

    • Marked as answer by gf1 Saturday, March 3, 2012 12:26 AM
    Saturday, March 3, 2012 12:26 AM

All replies

  • On or about February 9th there were updates pushed down to Office 2007 and Office 2010. I wonder if those updates caused this?  If they did, your only short term fix is to do what you have done and make an allowance for this problem the update caused.

    Kind Regards, Rich ... http://greatcirclelearning.com


    Friday, February 24, 2012 2:32 AM
  • Can you tell me why this was marked as "Answered"?  I really don't think Rich's comment is an answer, only a speculation on his part.

    Thanks,

    Gary


    Gary Frickey

    Wednesday, February 29, 2012 3:54 PM
  • I have done additional research and here are more details.

    When I paste contents from an MS Word document into an html textarea text box, the double quote, apostrophe and hyphen get translated to the � string of characters.

    In my asp code for the html page I convert the MS Word open quote (“), closed quote (”), apostrophe (’) and hyphen (-) to text equivelents of the same characters (", ", -)  however this translations don't occur because the ASP code sees the text box value for these as  ï¿½ 

    If I add logic to look for the character string � in my asp logic, I can see that and translate it to a text value...however, my question is why do the Quote, Apostrophe and Hyphen all get converted to the same value � when I paste it from MS Word into a text box?


    Gary Frickey

    Thursday, March 1, 2012 4:31 PM
  • More info, if I take the text area that contains the MS Word contents and perform a server.htmlencode on it, all of the quotes, apostrophes and hyphens convert to the same value   �

    My html page is using the character set ISO-8859-1 which it has been using for a long time before this problem started occuring


    Gary Frickey

    Thursday, March 1, 2012 6:15 PM
  • After further testing, it appeared that the encoding problems were occuring on a postback after the textbox had been populated and before the textbox contents were captured in a response.request().

    I found a solution JavaScript solution - http://jhy.io/tools/convert-word-to-plain-text which I modified so that it will make the necessary translation using the "onblur" event and this has solved my problem.

    I am still not sure why we started having this problem now because we have been allowing Word contents to be pasted into this same text box for a while now and have not run into this problem.  Perhaps it is due to the fact that this current word document had smart quotes and smart apostrophes.

    Anyway, please consider this issue closed.

    Thanks for your time,

    Gary


    Gary Frickey

    • Marked as answer by gf1 Saturday, March 3, 2012 12:26 AM
    Saturday, March 3, 2012 12:26 AM