none
Sharepoint 2010 FAST search and crawling in a distributed environment RRS feed

  • Question

  • hi guys

    Q re Sharepoint 2010 FAST search and specifically the crawling component in a distributed environment

    environment details:
    * Sharepoint 2010 installed in AD resource forest
    * Sharepoint 2003/2007 servers in AD accounts forest
    * multiple intranet/content web servers in accounts forest
    * one way AD trust b/w resource forest and accounts forest
    * high speed WAN links connect accounts and resource forests

    Goal:
    want to leverage 2010 FAST capability to index content from all sharepoint and intranet based servers
    BUT
    want to avoid crawling across WAN links

    Q can we use a crawler on the accounts forest side (requires a 2010 instance there?) to keep crawl traffic local and then present the results back to the 2010 farm on the resource forest side as a webpart, site etc?

    thanks in advance for any thoughts

    regards,

    Nick

    Thursday, January 12, 2012 3:18 AM

Answers

  • Nick,

    Do I have it right that you don't need to crawl any content from the resource forest, you need to crawl content from accounts forest and simply present it to resource forest 2010 farm?

     

    If that's the case, you could do something like this(but you would need to have FAST for Sharepoint 2010 farm on accounts forest side):  crawl local content via FAST Content SSA to get it into FAST index, but then be able to publish your FAST Query SSA from accounts forest and subscribe to that SSA via Sharepoint 2010 farm on resource side.

     

    http://technet.microsoft.com/en-us/library/ee704558.aspx

    http://technet.microsoft.com/en-us/library/ff621100.aspx

     

    Really depends on what you are looking for.


    Igor Veytskin
    Tuesday, January 17, 2012 3:37 AM
    Moderator
  • Hi Nick,

    I think you got it :D 4) is correctly assumed; only the search result xml/html will be transferred over the WAN during search. By hosting the FS4SP server(s) in the same network location as the SharePoint data/database you avoid sending the full binary over WAN.

    Regards,
    Mikael Svenson 

     


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Tuesday, January 24, 2012 10:14 AM

All replies

  • Hi,

    Does it really matter if the crawler pulls the item over WAN, or if it crawls "locally" and push the item from the SSA  to FS4SP over WAN?

    No matter how you do it, the whole binary item has to be sent over to FS4SP, and I assume FS4SP is in the resource forrest.

    Both options:

    1. account farm <- SSA in account farm pulls content and pushes it -> WAN -> FS4SP Content distributor

    2. account farm <- WAN <- SSA in resource farm pulls content and pushes it -> FS4SP Content distributor

    Let me know if I missed some point here.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Thursday, January 12, 2012 8:06 PM
  • Hi Mikael,

    thanks for the reply.

    re your first q you state that the full binary still has to transfer across the WAN to the FAST box:

    By full binary do you mean that the FAST crawl is pulling the actual indexed file/word doc etc across and storing it locally as well? or is the binary reference just re registering a FAST pointer to the source on a remote file server/MOSS farm etc?

    re if it matters if the crawl is local and push back or remote and pull across the WAN:

    The driver in this environment is a security one. The resource forest is a less trusted environment so we want to avoid opening up dozens of HTTP/HTTPS ports through to our internal servers based in accounts forest. If we can keep it to one push rule traversing the two ie accounts forest local crawler box pushes to resource forest FAST sever then we would be more comfortable allowing this functionality.

    based on the above does this mean your proposed option 1) is the best way to achieve this?

    thanks again

    regards

    Nick

    Thursday, January 12, 2012 10:52 PM
  • Nick,

    Do I have it right that you don't need to crawl any content from the resource forest, you need to crawl content from accounts forest and simply present it to resource forest 2010 farm?

     

    If that's the case, you could do something like this(but you would need to have FAST for Sharepoint 2010 farm on accounts forest side):  crawl local content via FAST Content SSA to get it into FAST index, but then be able to publish your FAST Query SSA from accounts forest and subscribe to that SSA via Sharepoint 2010 farm on resource side.

     

    http://technet.microsoft.com/en-us/library/ee704558.aspx

    http://technet.microsoft.com/en-us/library/ff621100.aspx

     

    Really depends on what you are looking for.


    Igor Veytskin
    Tuesday, January 17, 2012 3:37 AM
    Moderator
  • Hi Igor,

    thanks for the reply - yes that is correct the intention is to keep the accts forest crawling traffic local to that side of the WAN and behind firewall and then present the results back across the WAN to the resource forest based Sharepoint 2010 farm.

    If I read the 2nd article correctly FAST does not require a 2 way trust to present the FAST results we should be OK with the 1 way trust?

    note Note:

    If the server farms are located in different domains, the User Profile service application requires both domains to trust one another. For the Business Data Connectivity and Secure Store service application administration features to work from the consuming farm, the domain of the publishing farm must trust the domain of the consuming farm. Other cross-farm service applications work without a trust requirement between domains.

    thanks again

    regards,

     

    Nick

    Tuesday, January 17, 2012 4:53 AM
  • Hi,

    If I get this. The FAST servers are in the resource forest, data in the account forest. So some way or another the whole binary file will be sent across the WAN from account to resource, as the text extraction happens on the FAST server and not in the Content SSA.

    I have another question as well, as you hve 2003/2007 servers in the account forest, how do you intend to present the results? As a custom search page going against a 2010 server, or have them redirected to a search center on a 2010 server?

    Thanks,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Tuesday, January 17, 2012 7:30 AM
  • Hi Igor/Mikael

    thanks for the replies - yes we are mixing up Sharepoint 2010 provided by a remote 3rd party supplier with our existing on premise Sharepoint 2003/2007 infrastructure.

    so as per your links is my understanding of the solution to avoid the full binary traversing the WAN correct?:


    1) we keep FAST search local to the target data being crawled (requires 2010 FAST server locally in the accounts forest where the 2003/07 MOSS and intranet resides)

    2) we publish the FAST query/results as a service application to the remote 2010 Sharepoint Farm

    3)  users searching Sharepoint 2010 in the remote resource forest will display results from the published FAST service application in the accounts forest 

    4) search results/files/binaries will display in resource forest 2010 farm but logically would remain local to the accounts forest - ie binaries will necer traverse the WAN until a result/file is clicked?  

    - is 4) correct?

    thanks again for clarifying the expected behaviour - it is unclear from any of the online technet articles etc I have read.

    regards,

    Nick

    Tuesday, January 24, 2012 6:05 AM
  • Hi Nick,

    I think you got it :D 4) is correctly assumed; only the search result xml/html will be transferred over the WAN during search. By hosting the FS4SP server(s) in the same network location as the SharePoint data/database you avoid sending the full binary over WAN.

    Regards,
    Mikael Svenson 

     


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Tuesday, January 24, 2012 10:14 AM
  • thanks for clarifying that guys that's a great help!

     

    regards,

     

    Nick

    Wednesday, January 25, 2012 12:00 AM