none
can i add my linguistic processor in fast RRS feed

  • Question

  • I have a C++ module which can stem and normalize the arabic language and i want to use it inside fast 
    do anyone knows how to do that?
    thanks
    Thursday, December 13, 2012 2:51 PM

Answers

All replies

  • Hi,

    Yes and no. You cannot add it so that it works on the "body" content directly, but you can add a custom processing extensibility stage which reads the "body" field and writes it to another crawled property which you include in the full text index.

    Thanks,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Sunday, December 16, 2012 8:45 PM
  • Hi Ihab, Hi Mikael,

    It IS possible to overwrite the body managed property, and I have successfully done so - and it is documented and supported, kind of.

    Take a look at the following link:

    http://msdn.microsoft.com/en-us/library/ff795813(v=office.14).aspx

    To map specific XML content to the body managed property

    1. Specify an XML Mapper configuration that maps specific parts of the XML to a new crawled property with a unique name.

    2. Specify a mapping of this crawled property to the managed property namedbody. For more information, see Manage Crawled Properties by Using Windows PowerShell (FAST Search Server 2010 for SharePoint) on Microsoft TechNet.

    3. Ensure that body has the MergeCrawledProperties flag set in the index schema. For more information, see Manage Managed Properties by Using Windows PowerShell (FAST Search Server 2010 for SharePoint) on TechNet.

    I've done so myself with my own linguistic processor, but do notice one thing:

    Let's say that your pipeline extension outputs a crawled property named _output_cp.

    You map it to body managed property and restart a crawl.

    For all items where _output_cp exists - this will work and body will be overwritten. When _output_cp doesn't exist - the body is left empty, even when MergeCrawledProperties is set to true.

    To fix this, you have to make sure that you're pipeline extension is writing the original body crawled property to _output_cp when no changes are done to him.

    Good luck!

    Amir

    Monday, December 17, 2012 1:52 PM
  • Hi Amir,

    You are of course right that you can use the XMLMapper for this in certain cases.

    The issue in Ihab's case is that he has to run some custom code, and the XMLMapper runs quite a bit earlier than the CustomerExtensibility in the pipeline. So if he want's to follow supported routes, he will have to output the stemmed output to a new crawled property which is made searchable (by indicating it should be included in the index, or map it to the body managed property once you set it to "MergeCrawledProperties".

    The unsupported way would be to add a python stage to replace the body property with the new content.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Monday, December 17, 2012 6:59 PM
  • Hi All,

    thank you Mikael and Amir for your great help.

    I think Mikael solution is too close to my problem i want to read the body and write it to another crawled property with the new output of my linguistic processor but i appreciate any sample code or examples for this if any. I found a lot of examples but it was adding a meta data to the index such as the size of the document or the number of words 

    thanks

    Wednesday, December 19, 2012 12:37 PM
  • Hi,

    I have some code to get you started at http://techmikael.blogspot.no/2010/12/how-to-debug-and-log-fast-search.html. There's a zip file download at the bottom of the post.

    Basically you will write out your stemmed/normalized text instead of the word count.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    • Marked as answer by Ihab Ramadan Wednesday, December 19, 2012 12:56 PM
    Wednesday, December 19, 2012 12:44 PM
  • i will try this

    thanks Mikael

    Wednesday, December 19, 2012 12:57 PM
  • Hi Mikael,

    I tried to run your sample in my sharepoint environment but i faced some problems

    1- The wordcount property did not appear when i run the command Get-FASTSearchMetadataCrawledProperty -name "wordcount"

    2- I followed the steps in example but the Wordcount process did not appear in the processes when i do full crawl  

    can you advice

    thanks

    Tuesday, December 25, 2012 9:26 AM
  • Hi,

    If you read the blog post the module will create a crawled property named "wordcount" and there is powershell commands to create a managed property with the same name and do the mapping between the cp and the mp.

    After a recrawl, then you should have values in the mp named "wordcount" if you request it to be returned back in the results. When the .exe is called it runs pretty fast so it's hard to spot manually (hence adding sleep and hook in to it for debugging).

    Also check your crawl log for any other errors which might have occured.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Tuesday, December 25, 2012 2:34 PM