none
Indexing IN FS4SP RRS feed

  • Question

  • Is it possible to feed the indexer of one FS4SP farm with the fixmls from another FS4Sp farm over HTTP?

    My scenario needs contents from 2 geographically separated location to be crawled and indexed.Can somebody suggest a better way to get all the indexed content in the same place.


    Tuesday, March 29, 2011 12:11 PM

Answers

  • If the documents are of different natures, I would perhaps do federation search between the locations and show them as two result sets in the results. This way you don't have to crawl across the low bandwidth line.

    If you have to merge the sets you could always transfer the fixml files and use the Content API to push in the fixml's. You just need to make sure you handle deletes in some fashion as they are not part of the fixml.

    Another possibility is to create a custom pipeline stage which dumps all your fields to a file, which you transfer, and then again use the Content API to insert on the other end. This way you will be able to handle updates/deletes (if you edit a config file to allow the custom stages to run on delete operations that is).

    If you are not up for any coding, then crawling the remote location is the only supported way. Maybe you can add some hardware compression on the line to reduce the data being sent.

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Wednesday, March 30, 2011 11:32 AM
  • Remote crawling or federated search are really the only viable options.

    Although you could in theory feed using the Content API, it will still go through the standard content pipeline. Feeding fixml to the Content API would probably yield quite strange results, and dumping all data from a custom pipeline stage would be pretty complex. Properly supporting security-trimming and deletes would be probably not be possible at all.

    Regards


    Thomas Svensen | Microsoft Enterprise Search Practice
    Thursday, March 31, 2011 8:10 AM
    Moderator

All replies

  • Short answer no. There is no direct refeed from fixml support with FS4SP.

    Can you describe your topology and what data sources you are indexing?

    Regards,
    Mikael Svenson 

     


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Tuesday, March 29, 2011 6:52 PM
  • We are trying to implement a multinode deployment of FS4SP.

    We have installed FS4SP and SP farms in one location, say A. There are various types of content sources that needs to be indexed. One of them resides in location A itself where are the others are in some remote location with low bandwidth, say B.We are planning to have the central SP farm and FS4SP farm deployed at location A

    We want to bring the crawled content from B to A.But location B is having very low bandwidth so we are not sure wthr we can crawl the contents at B from A.can u suggest some better way by which we can either bring the crawled contents or teh index from B to A so that so that all the index will be ar location A.

    Wednesday, March 30, 2011 3:55 AM
  • If the documents are of different natures, I would perhaps do federation search between the locations and show them as two result sets in the results. This way you don't have to crawl across the low bandwidth line.

    If you have to merge the sets you could always transfer the fixml files and use the Content API to push in the fixml's. You just need to make sure you handle deletes in some fashion as they are not part of the fixml.

    Another possibility is to create a custom pipeline stage which dumps all your fields to a file, which you transfer, and then again use the Content API to insert on the other end. This way you will be able to handle updates/deletes (if you edit a config file to allow the custom stages to run on delete operations that is).

    If you are not up for any coding, then crawling the remote location is the only supported way. Maybe you can add some hardware compression on the line to reduce the data being sent.

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Wednesday, March 30, 2011 11:32 AM
  • Remote crawling or federated search are really the only viable options.

    Although you could in theory feed using the Content API, it will still go through the standard content pipeline. Feeding fixml to the Content API would probably yield quite strange results, and dumping all data from a custom pipeline stage would be pretty complex. Properly supporting security-trimming and deletes would be probably not be possible at all.

    Regards


    Thomas Svensen | Microsoft Enterprise Search Practice
    Thursday, March 31, 2011 8:10 AM
    Moderator
  • I agree, all in all it's a difficult task to do in an efficient way.

    You would really like the content extraction to happen before you send the data over the wire to reduce bandwidth, while you keep all the other benefits of crawling it with the default crawlers and pipeline.

    Or if you could have had two index columns and two rows, one on each location, where each location had a master column, and have it sync over to the other location afterwards. and of course two crawl components, one in each location.

    Would this be possible in some way?

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Thursday, March 31, 2011 8:16 AM