none
FAST Not Indexing PDF Documents RRS feed

  • Question

  • Hello,

    I have a problem with SharePoint 2010 and FAST Search indexing PDF files. I've setup the PDF file type extension in SharePoint, and also added a Document Icon which is working as expected when viewing PDF files from SharePoint sites. I also have enabled the Advanced Filter Pack on the FAST server using the powershell script. When I attempt to crawl PDF files stored in SharePoint, they show up in the search results as DispForm.aspx. When I index PDF files stored on the file system, they do not get indexed. I can use docpush on the FAST server to push a PDF into the index, and no errors are produced, and the PDF shows in the search results as it should, linked directly to the PDF with the PDF icon showing. The output of the comand is as follows:

    PS C:\> docpush -c "sp" .\Document.pdf
    
    [2011-03-03 15:04:20.589] INFO  sp All add operations completed
    
    PS C:\>
    
    
    
    

    It seems like SharePoint doesn't understand to pass the PDF file to the FAST content processor, but I am unsure whats required to get this working.

    Any help would be appreciated.

    Thursday, March 3, 2011 11:07 PM

Answers

  • Indexing PDF's should work out of the box and are pre-configured. Make sure pdf is added to the include list of file extension on the Query SSA, and _not_ listed on the file extensions on the Content SSA, as that list is for exclusion, and not inclusion as the Query SSA and standard SharePoint search.

    If you have edited any of these lists you have most likely removed indexing of pdf's, thinking you added it.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    • Marked as answer by tylercranston Tuesday, March 15, 2011 6:45 PM
    Saturday, March 5, 2011 6:45 PM

All replies

  • Indexing PDF's should work out of the box and are pre-configured. Make sure pdf is added to the include list of file extension on the Query SSA, and _not_ listed on the file extensions on the Content SSA, as that list is for exclusion, and not inclusion as the Query SSA and standard SharePoint search.

    If you have edited any of these lists you have most likely removed indexing of pdf's, thinking you added it.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    • Marked as answer by tylercranston Tuesday, March 15, 2011 6:45 PM
    Saturday, March 5, 2011 6:45 PM
  • Make sure that Content SSA and Query SSA are integrated properly. Are you able to feed any other type of content(word, text, html etc??)
    Monday, March 7, 2011 10:42 PM
    Moderator
  • m0nk3yb0i,

    You are able to index and search PDF files from SP sites. This tells me that your Content SSA & iFilter setup is correct for PDF files.

    You indicated that "When I index PDF files stored on the file system, they do not get indexed.". May i ask how did you come to the conclusion that PDF files do not get indexed? Can you provide details regarding "How the PDF files are crawled from file share?" and "what do crawl logs report for these PDF files (which are crawled from file share)?"

    My guess is that there are additional security ACLs on the PDF files due to which PDF files are not showing up in search results (i.e. due to security trimming)? However before we come to this conclusion, it is best to verify the successful/unsuccessful crawl of PDF files from file share.

    My 2 cents.

    Tuesday, March 8, 2011 5:39 AM
  • Thank you Mikael, I didn't realize this and I don't recall seeing this in the FAST documentation on technet. After removing PDF from the extension list on the query SSA, PDF's are now displaying correctly after performing a full crawl on the content.
    Tuesday, March 15, 2011 6:47 PM