none
PDF full text search RRS feed

  • Question

  • Hi,

    I have configured FAST Search to index pdf and its working. When I do a search it returns results if I enter the full word which I want to search. For an example word "hello", it will return results. But if I put a term "hel" or"hell" no results get returned.

    Do I have to do any special configurations to get this done?

    I tried

    1. FAST OOTB pdf search (advance filter pack is enabled.)

    2. Adobe iFilter with FAST Search

    BTW this works well, if I use SharePoint 2010 enterprise search with Adobe iFilter

    Appreciate any help on this.

    Thanks.


    Kolitha de Silva ----------------- www.dkolitha.com / dkolitha.wordpress.com


    • Edited by Kolitha Friday, October 19, 2012 8:20 PM
    Friday, October 19, 2012 8:18 PM

All replies

  • Hi,

    If you want partial matching you have to use wildcards. hel* or hell* should both match hello.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Saturday, October 20, 2012 7:03 PM
  • Thanks Mikael.

    Surprisingly wildcard does not work for a specific use case. There is a number with four leading zeros in the document. This does not get search even if I put wild cards like 000054*. Also users want to search documents y entering the number without leading zeros. Any pointers around this why its not happening. When I check the ifilter output using a downloaded tool, it was showing that the pdf get processed correctly.


    Kolitha de Silva ----------------- www.dkolitha.com / dkolitha.wordpress.com

    Sunday, October 21, 2012 3:11 PM
  • Hi,

    How long are the numbers you are trying to match? And the term could be dropped depending on wildcard expansion and other internal rules. If you have a pattern for the numbers I would suggest writing a customer extensibility stage and match the numbers and write them out to a separate managed property. Then you can specifically search this managed property. If you don't want users to append * to the queries, you will have to create custom logic and do this yourself.

    And pdf's are not using ifilter for extraction with FS4SP, but a separate component. By turning on ffddumper or adding a spy stage you can see what text is actually used during indexing.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Sunday, October 21, 2012 8:13 PM
  • Hi,

     The number is 10 digits. I tried installing foxit ifilter. When I tried from windows search it gives correct results. But the issue is same with FS4SP. I added a spy to pipeline, but I was not too clear on how to look at the output and find out the issue. 

    When I try to do a docpush I am getting the below error as well.

    [2012-10-23 14:05:57.685] ERROR      sp An error occured when submitting operati
    on: : IOError: [Errno 2] No such file or directory

    Thanks.


    Kolitha de Silva ----------------- www.dkolitha.com / dkolitha.wordpress.com


    • Edited by Kolitha Tuesday, October 23, 2012 6:06 PM
    Tuesday, October 23, 2012 5:59 PM
  • Hi Kolitha,

    fixml output shows that a long number like 0000005441 is pocessed as <a>0000005441T nn5441L 0000005441L nn5441T</a>. If you search by 5441 you won't able to see any results as none of the indexed words matches the query.

    But if you just want to get the result by searching 5441 then you have to match one of the indexed word. Possible solution is to prepend the query with nn with added OR condition to normal search query and submit. You can do this by extending the Core Results Web Part.

    Regards,

    Diluk J

    Tuesday, October 23, 2012 6:31 PM