locked
scan from PDF to Word creates problems RRS feed

  • Question

  • When I transfer a document from a scanned pdf into word, I cannot rectify spacing, line breaks, and indention problems. It seems there's a code in the pdf scan that prevents my modifying and editing my document. How can I fix this?
    Wednesday, April 8, 2015 4:01 PM

Answers

  • You would need to use an Optical Character Recognition (OCR) utility to convert the scanned image into text that you may be able to Edit.  Most scanners come with such a utility, but many of them are fairly rudimentary and its a bit of a case of you get what you pay for.

    If you don't have such a utility, you could Google for OCR


    Doug Robbins - Word MVP dkr[atsymbol]mvps[dot]org

    • Marked as answer by Steve Fan Tuesday, April 21, 2015 9:22 AM
    Thursday, April 9, 2015 6:17 AM
  • First, I agree that you need an OCR translation before you can attempt editing. Otherwise, it is like trying to open a car door, on a picture of a car. With a scanned document that has not gone through OCR you have a picture of a document, whether it is pdf or jpg.

    Second, converted documents will always be problematic. Different conversion software makes different assumptions but the formatting and layout is likely to be anywhere from somewhat difficult to "I would do better just retyping this." The conversion software aims primarily to produce something that can be opened in Word and edited while maintaining the same appearance. The underlying structure is likely to be filled with multiple unneeded section breaks multiple styles on no use of styles and crazy numbering. With anything big or complex, I will often cut and paste as plain text and then apply formatting using Styles.

    Also, be aware that no OCR program is perfect and some are worse than others. A double proofreading would be a very good idea.


    Charles Kenyon Madison, WI

    • Marked as answer by Steve Fan Tuesday, April 21, 2015 9:22 AM
    Thursday, April 9, 2015 1:00 PM

All replies

  • You would need to use an Optical Character Recognition (OCR) utility to convert the scanned image into text that you may be able to Edit.  Most scanners come with such a utility, but many of them are fairly rudimentary and its a bit of a case of you get what you pay for.

    If you don't have such a utility, you could Google for OCR


    Doug Robbins - Word MVP dkr[atsymbol]mvps[dot]org

    • Marked as answer by Steve Fan Tuesday, April 21, 2015 9:22 AM
    Thursday, April 9, 2015 6:17 AM
  • First, I agree that you need an OCR translation before you can attempt editing. Otherwise, it is like trying to open a car door, on a picture of a car. With a scanned document that has not gone through OCR you have a picture of a document, whether it is pdf or jpg.

    Second, converted documents will always be problematic. Different conversion software makes different assumptions but the formatting and layout is likely to be anywhere from somewhat difficult to "I would do better just retyping this." The conversion software aims primarily to produce something that can be opened in Word and edited while maintaining the same appearance. The underlying structure is likely to be filled with multiple unneeded section breaks multiple styles on no use of styles and crazy numbering. With anything big or complex, I will often cut and paste as plain text and then apply formatting using Styles.

    Also, be aware that no OCR program is perfect and some are worse than others. A double proofreading would be a very good idea.


    Charles Kenyon Madison, WI

    • Marked as answer by Steve Fan Tuesday, April 21, 2015 9:22 AM
    Thursday, April 9, 2015 1:00 PM