none
Remove menus from body

    Question

  • Hi all,

    I'm indexing an external site with no way of changing the rendered html. It contains some menus, headers and footers which the SharePoint crawler is indexing along with the relevant content.

    I can't put <no index> tags.

    I can create a content enrichment web service, but can only parse the "body" managed property, where I can't "lock" on the menus and remove them. Is the original HTML saved to some other managed property? 

    Any ideas/options?

    Thanks,

    Amir

    Sunday, December 30, 2012 9:16 AM

Answers

All replies

  • Amir,

    isnt' it your thread for the same question? http://social.technet.microsoft.com/Forums/en-US/sharepointsearch/thread/eb9cdfc7-5899-4ed4-a9cf-485e44013991

    Managed Property 'body' is read only and cannot be overwritten by web-service callout during indexing.

    You might try to do the following: create another MP, let's say "body_my" and save there your modified content of body. Then we need to use new property in index. I'm not sure that original "body" MP can be unmapped, but you might want to reduce it's context weight group (in Advanced Searcheable Settings) or put in into another full text index to exclude from default searching. .

    Sunday, December 30, 2012 10:22 AM
  • In order to access original HTML markup of your pages, consider using RawData property of input item (mind to configure SendRawData setting for your webserivce) . Body property will contain parsed text from HTML because document parsing happens before custom stages(as it was in FS4SP)

    • Marked as answer by Amir at eWave Sunday, December 30, 2012 12:25 PM
    Sunday, December 30, 2012 11:25 AM
  • Hi Alexey,

    Thank you, that was exactly what I was looking for.

    All I had to do was:

    string html = Encoding.UTF8.GetString(item.RawData);

    Much appreciated !

    Amir

    Sunday, December 30, 2012 12:26 PM