Resources for IT Professionals > Forums Home > Microsoft Enterprise Search Forums > FAST Search for SharePoint > How can we drop a document from indexing in FAST Search for SharePoint 2010 ?

Proposed Answer How can we drop a document from indexing in FAST Search for SharePoint 2010 ?

  • Friday, May 06, 2011 6:58 AM
     
     

    Hi All,

    Does anyone have worked on dropping a document from being indexed in FAST Search for SharePoint 2010 ?

    Does this has to be done during pre-processing stages or post-processing stages ?

    Please let me know on how this can be achieved in FS4SP 2010.

    Thanks,

    Ajay

All Replies

  • Friday, May 06, 2011 7:55 AM
     
     

    There are various ways to do this.

    Before indexing,

    1 - You can use crawl exclusion rules or

    2 - Under library settings - advanced settings, you can disable this specific library to appear in search results.

    After indexing,

    1 - You can create scopes to narrow the search results with powerful fql queries.

     

    Hope this helps.


  • Friday, May 06, 2011 8:27 AM
     
     

    is there an option of doing this "during indexing" ?

    I havent tried this, but can we do that during the pipeline extensibility stage ??

    -A

  • Saturday, May 07, 2011 7:57 AM
    Moderator
     
     Proposed Answer

    Hi Ashwani

    You are not the first to ask this question, and I am sorry to say that, NO, that is not possible. You can in theory try to empty all content inside pipeline extensibility, so that no queries will match the given document, but I don't that is a feasible solution. E.g., you cannot alter the url, so queries matching the url string would still match.

    Regards


    Thomas Svensen | Microsoft Enterprise Search Practice
  • Saturday, May 07, 2011 7:46 PM
     
     

    Thomas,
    if you return an exit code != 0 in a custom pipeline stage for a specific url I think it will drop indexing that item, as the pipeline fails. Haven't tried this myself, and not the prettiest solution, but do you think it would work?

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
  • Sunday, May 08, 2011 11:53 AM
     
     

    Hi Thomas/Mikael,

    Please let me know if you have any solution to drop documents being indexed in FS4SP 2010.

    Thanks,

    Ajay

  • Sunday, May 08, 2011 7:58 PM
     
     

    My suggestion did not work. Returning an error from a custom stage will only skip the stage, not abort processing of the pipeline.

    My next attempt tried to use the Offensive Content Filter, but there is no way to assign offensive words to the title/body in a custom stage. There is also a mention of a field called "ocfcontribution", but I have no idea how to set this.

    So, your only solution would be to write an unsupported stage in Python, and implement your drop logic there.

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
  • Monday, May 09, 2011 4:51 AM
    Moderator
     
     Proposed Answer

    Hi

    There is a solution which does not use unsupported and should work:

    In a pipeline extensibility stage, write the URL of the pages you want deleted to a file (would have to be the AppData/LocalLow folder of the user running FAST), and then have scheduled task running at frequent intervals, reading this file, and submitting a "docpush -d <URL>", ref. http://technet.microsoft.com/en-us/library/ee943508.aspx

    I don't have an image/install available for testing it, but I am pretty sure it should work.

    Regards


    Thomas Svensen | Microsoft Enterprise Search Practice
  • Monday, May 09, 2011 7:01 AM
     
     

    Thomas,

    That's quite ingenious, and I'm sure it will work. And probably the only scenario where docpush is useful in a production scenario :)

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
  • Monday, May 09, 2011 7:23 AM
     
     

    I agree.. this should work.

    we had done this quite a few time in production, for some skewed scenarios, with the OLD ESP. :)

    I hope that the support for something like "ProcessorStatus.NotPassing" gets added to the pipeline extensibility, else docpush would become a de-facto mechanism in quite a bit of cases.

    Thanks,

    Ashwani

  • Friday, May 20, 2011 3:51 PM
     
     

    Mikael,

    Could you please elaborate on what did you mean by "...Offensive Content Filter, but here is no way to assign offensive words to the title/body in a custom stage."? What's the downside of assigning offensive content to a title and having it dropped in OCF stage?

     

    Thank you,

     

    Mike.

  • Friday, May 20, 2011 7:44 PM
     
     

    Sadhak,

    There is no way you can modify the title or body of an indexed document during the pipeline (in a supported manner), therefore, you can't change words to get it dropped by the offensive content filter.

    If you however can assign words to the title or body during crawl/index time, then it will be dropped as expected.

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
  • Friday, May 20, 2011 8:45 PM
     
     

    Mikael,

     

    Thank you for clarifying.

     

    It seems pretty straight forward to me - you just write "offensive" string into a title/body crawled property. And it looks "supported" as well :-).

    I tested rewriting title and it works.

    I would feel more comfortable assigning certain words to 'ocfcontribution' crawled property, but haven't been able to find ways (supported or unsupported) to make it work. It would be nice of Microsoft to explain in more details how to do it, particularly in supported way.

     

    Thank you,

     

    Mike.

  • Friday, May 20, 2011 11:08 PM
    Moderator
     
     

    Hi Mike

    If you are able to change the title/body of your documents at crawl time, then why not add a "noindex" tag in the body instead, assuming your content is HTML, of course. That would be a cleaner and more officially supported approach.

    Regards


    Thomas Svensen | Microsoft Enterprise Search Practice
  • Monday, July 25, 2011 6:25 AM
     
     

    Dear Ajay,

     

    Documents can be excluded during pre-process and post process.

     

    1. During Pre-Process i.e during indexing you can add exclusion rules in the crawler itself via advance settings or through a document processing pipeline.
    2. During Post-process i.e. querying you can use advanced fal features and narrow down the scope.
    3. In the worst case such as documents are indexed and re-indexing is very inconvenient that using search business center you can apply rules to drop the documents and will not return in the result-set.

    Regards,

    Chirag Shah

    Enterprise Search


     

     

     

     



  • Saturday, December 10, 2011 8:12 PM
     
     

    Hi,

    I solved it!

    Read my blogpost: http://techmikael.blogspot.com/2011/12/how-to-prevent-item-from-being-indexed.html

    You can use the Offensive Content Filter to help you out.

    -m


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
  • Sunday, December 11, 2011 4:32 PM
     
     
    Great Blog Mikael. thanks for posting this.
    Sriram S