none
FAST Enterprise Web Crawler practical usage RRS feed

  • Question

  • Hi,

    I have a couple of questions regarding the FAST Enterprise Web Crawler. I know it offers a lot of configuration options compared to the standard Indexing Connector but I wanted to know how often in reality would you feel the need to use and configure the Enterprise crawler. I mean, is it something that people usually configure when they have a F4SP10 installation or are they usually satisfied with what the Indexing Connector offers?

    Also I wanted to ask about how one should go about configuring a distributed enterprise crawler. 

     

    Regards,

    Fahad

    Monday, August 29, 2011 3:12 PM

All replies

  • Hello Fahad,

    I would like to quote TechNet for your first question (http://technet.microsoft.com/en-us/library/ff383278.aspx):

    • Use the FAST Enterprise Web Crawler when:
    • Use when you have many web sites to crawl.
    • Use when the web site content contains dynamic data, including JavaScript.
    • Use when the organization needs access to advanced web crawling, configuration and scheduling options.
    • Use when you want to crawl RSS web content.
    • Use when the web site content uses advanced logon options.

    so, for most cases I go with the SharePoint web crawler if I can, and use the FAST crawler as a "last resort". This is mainly due to the fact that I like to work with one crawler framework for maintainability reasons.

    As for configuring a distributed enterprise crawler you first have to edit your deployment.xml file. There is a sample at http://technet.microsoft.com/en-us/library/ff354931.aspx#element_crawler. Default all start uri's will distributed among the different servers in your distributed setup, but you can target domains to specific hosts with configuration options.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    • Marked as answer by Fahad Owais Tuesday, August 30, 2011 1:29 PM
    • Unmarked as answer by Fahad Owais Wednesday, August 31, 2011 8:01 AM
    Monday, August 29, 2011 7:05 PM
  • Hi Fahad,

    I've noticed that practically, the FAST Enterprise Crawler does a better job concerning "problematic" web pages, that have dynamic data, as Mikael mentioned.

    It dramatically increases the recall, because it creates more documents but it also produces more relevant documents, both which that the Sharepoint crawler simply can't create.

    When I'm using FAST Enterprise Crawler, the "Top-10" search results I get are more relevant.

    Do notice that if that web site is mostly filled with dynamic data, the browser engine, which processes the webpages, becomes a very tight bottleneck and slows down the entire crawling process.

     

    Regards,

    Amir Ben Ari

    • Marked as answer by Fahad Owais Tuesday, August 30, 2011 1:29 PM
    • Unmarked as answer by Fahad Owais Wednesday, August 31, 2011 8:01 AM
    Tuesday, August 30, 2011 7:05 AM
  • Thanks Mikael and Amir for clearing that up. Really appreciate it.
    Tuesday, August 30, 2011 1:29 PM
  • Ok one more thing I wanted to ask... 

    In case of the distributed crawler setup, if I added a crawler configuration using "crawleradmin" to any one of the hosts that are part of the setup, would the configuration be propagated automatically to all the other servers?

    I unmarked the answers temporarily... I figured the thread might not get noticed if it's marked as answered.
    Tuesday, August 30, 2011 3:59 PM