none
Sharepoint online not searching IN pdf files

    Question

  • Hi

    I have a standard SharePoint online team site with a document library (in classic mode) that has about 900 pdfs.  If you search by the Name in the Find a File it appears to work just fine but if we try searching for text within the PDF file it returns no results.

    Example, inside each of the pdfs there is a field for Assembly/Part # that is filled in with text - trying to search the library with that text never returns a PDF result (it will return Word or Excel results if they have the same text inside them).  Searching the entire site also gives no results.

    The site Search and offline availability is set to yes, the library's setting for show in search results is yes.  There is no approval turned on nor publishing and the users all have at least read access to the entire library and all items within

    I have used the Reindex site button and waited 24 hours with the same no results returned
    I have reindexed the library and waited 24  hours with no results returned

    The PDFs are not scanned - it is a PDF form that the users fill in using Acrobat and then upload to the library.

    What am i missing?  I've done some research and everywhere it says that this should happen automatically and that as long as it's not a scanned version of the pdf (and therefore an image) SharePoint online should be able to search within a PDF file.

    Any insight or help is greatly appreciated!

    Thanks,
    Stephanie

    Thursday, November 1, 2018 7:17 PM

All replies

  • Hi Stephanie,

    SharePoint Online already includes a PDF iFilter that allows SharePoint Online to index the text contents of PDF files.

    Per my test in my SharePoint online, I search the text in the PDF file, it will return the correct PDF file.

    There is one common issue that many PDF files are either totally or partially image files having originated from scanned documents or faxes.

    These documents are considered “dead content” because their contents are essentially images and, as a result, cannot be searched or indexed.

    To make these documents discoverable again, they need to be transformed into a format that can be searched and indexed by the SharePoint crawler.

    You could use Aquaforest Searchlight to transfer.

    And you also could check the video.

    For more detailed information, refer to the article below.

    Configuring SharePoint for PDF Files.

    https://www.aquaforest.com/wp/index.php/configuring-sharepoint-for-pdf-files/

    There is a similar post:

    https://social.technet.microsoft.com/Forums/windows/en-US/a41a2c9f-1b29-43da-8733-c8e42f263da7/sharepoint-online-search?forum=onlineservicessharepoint

    Best regards,

    Sara Fan


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Friday, November 2, 2018 9:23 AM
    Moderator
  • Hi Stephanie,

    If the reply is helpful to you, you could mark the reply as answer. Thanks for your understanding.

    Best regards,

    Sara Fan

    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Monday, November 5, 2018 1:49 AM
    Moderator
  • Hi Stephanie,

    I am checking to see how things are going there on this issue. Please let us know if you would like further assistance.

    If the issue was resolved, you can mark the helpful post as answer to help other community members find the helpful information quickly. 

    Best regards,

    Sara Fan

    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Thursday, November 22, 2018 2:39 AM
    Moderator
  • Hi Sarah.

    Sorry for long time in replying - I had also posted this in the SPO community and was trying to work with those folks as well. 

    Still having issues with this and I do not think the partial image is going to be the issue... the PDF file was created in pro and uploaded to the library.  Users then download a copy, fill in their information, save it to their desktops and then re-uploaded that file.

    Still running through more testing as i'm still trying to get this to work... is there a way for me to check if the pdf is considered an image or partial image?

    Monday, November 26, 2018 7:12 PM
  • Hi Stephanie,

    I would like to take a closer look at one of the PDF files. Can you mail me one file? 

    Paul



    Monday, November 26, 2018 7:50 PM
  • Hi Paul,

    Thanks for posting in TechNet forum.

    This is a public forum, to protect your data, please avoid providing your privacy information here.

    Best regards,

    Sara Fan


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Wednesday, November 28, 2018 2:05 AM
    Moderator
  • Hi Stephanie,

    I am afraid there is no way to check if the pdf files are considered as an image in SharePoint online.

    Best regards,

    Sara Fan


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.


    Thursday, November 29, 2018 8:33 AM
    Moderator
  • So pretty sure it's not an image - I created a Word document and saved as pdf and still the search will not pick up anything inside the document.  I've done this on a brand new library and then also a brand new site

    But the global search seems to work... which I don't understand

    Do you have any other ideas on what else I can try or need to look at?

    Thanks,

    Stephanie


    • Edited by Rubyscye Monday, December 3, 2018 4:16 PM
    Monday, December 3, 2018 4:11 PM
  • Hi Stephanie,

    You could Reindex Document Library. Go to document library->library settings->advanced settings->Reindex Document Library.

    Best regards,

    Sara Fan


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Tuesday, December 4, 2018 8:50 AM
    Moderator