none
Searchable PDF Files Not Showing Up in Search Results RRS feed

  • Question

  • Several documets have been scanned and converted into searchable .pdf. Prior to uploading into SharePoint, the documents can be searched within Adobe Acrobat and the search works as expected. When the files are uploaded into SharePoint Online, the documents do not appear in seach results when searching upon a word contained in the document. I have waited in excess of 12 hours for indexing. Is this the expected behavior of SPOL or could you please provide some insight?

    Thanks

    Terry
    Monday, September 21, 2009 1:57 PM

Answers

  • Hi Terry,

    PDF documents are supported and should work. Obviously 12 hours with no hits is not good. At this point, I'd suggest opening a ticket with support. They will want to know details about your site colleciton so they can identify the farm, comb the logs, etc.
    Monday, September 21, 2009 9:22 PM

All replies

  • Hi Terry,

    PDF documents are supported and should work. Obviously 12 hours with no hits is not good. At this point, I'd suggest opening a ticket with support. They will want to know details about your site colleciton so they can identify the farm, comb the logs, etc.
    Monday, September 21, 2009 9:22 PM
  • Do you happen to know if PDF\A files are searchable in SharePoint?  I need to store archivable versions of pdf (using a tool that creates a PDF\A file) but when I upload to SharePoint 2007 and search, I don't get an hits returned.  My 'normal' pdf files return in the hit list just fine. 

    KevinHou

    Tuesday, July 20, 2010 8:15 PM
  • I'm having the same problem. Support told me they're working on it. Seems to only affect PDFs for me, too.
    Tuesday, July 20, 2010 9:21 PM
  • I figured out my own problem.  I had built my search page by adding the Advanced Search Box and Search Core Results web parts (available OOTB in the Web Parts gallery).  On the Search Core Results web part properties (edit the web part), under the "Results Query Options" the default setting is "Remove Duplicate Results".  When I unchecked that, everything started working for me.  My test files were of all the same content ("This is a test"), in Word and .txt files.  I had created PDF/A versions of these and uploaded them as well to my library.  So all 4 files had the same content, but all four had different file names.  What I don't know is what MS's definition of "Duplicate Results" is.  Sure, all my files had the same one line content, but they were 4 DIFFERENT files, not duplicate files.  Also, if I searched on a keyword in the "Name" property I was only getting one result, not 4 (my file names where different but all had a common word contained within the filename - "Test 1 - Word.doc", "Test 2 - Text.txt", "Test 1 - PDF-A.pdf", "Test 2 - PDF-A.pdf").  So how is it that these four "Names" are considered duplicates?  Screwy.  But, when I remove the option to "Remove Duplicate Results", I'm getting all 4 files returned with my search. 

    So, the problem I reported in my post above of no PDFs showing up, is because the Word or Text file was being returned by the search results and all others were being filtered by SharePoint as "duplicates" and thus not showing up in my results web part.

    KevinHou

    Wednesday, July 21, 2010 5:03 PM
  • Troy,

    I'm facing the same issue. The PDF OCR'ed document content is not searchable, though PDF document  contents uploaded earlier are searchable.

    Can you suggest any solution for this?

    Thanks

     

    Thursday, June 16, 2011 7:38 PM