none
Setting specific content in body in FS4SP 2010 RRS feed

  • Question

  • Hi,

    I want to remove comman content from the body like header, footer etc by applying some regular expressions but the issue that i am facing is body is a read only crawled property and i can't use that. Is there any other crawl property available to which i can map body content and managed property body will have that change affected?


    Ankit Gupta
    Tuesday, September 27, 2011 10:53 AM

All replies

  • Hi Ankie,

    You cannot modify what is written to the managed property "body". However, if you have control over the html you are crawling then you can put a div with class name = "noindex" around the content you want to exclude.

    For example:

    <div>
       some text
       <div class="noindex">
          this is excluded
       </div>
    </div>
    


    The "noindex" class name can exist with other names as well if you already have div tags surrounding the content. This method works for the oob search as well as for FAST.

    Your other option is to create a custom pipeline extensibility stage where you extract only what you want from the html, write this to a crawled property which you map to a managed property of your creation (I do not think it will help to map it to "body"). The next step would be a custom full-text index which uses your managed property.

    Hopefully, my first suggestion is an option for you as it involves less work and complexity.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Tuesday, September 27, 2011 12:04 PM