none
Enterprise Crawler Rewrite Rules Performance Impact RRS feed

  • Question

  • Hi,

    According to the documentation Enterprise_Crawler_Guide.pdf, the rewrite_rules has the following performance impact:

    Rewrite rules can be used to rewrite links parsed out of documents by applying
    regular expression and repeating captured text. This implies that all rewrite rules
    are attempted applied for every link, and it can therefore be very expensive in
    terms of CPU usage depending on the number of rules and their complexity. It
    is therefore advised to limit the amount of rewrite rules for large scale crawls.

    If I insert a rewrite rule to append a query parameter to every url, "?crawl=true", how big of a performance hit is this? In total, there are probably around 2-3 rewrite rules at MAX.

    Thanks in advance!


    • Edited by GoodbyeWorld Thursday, February 16, 2012 3:57 AM
    Thursday, February 16, 2012 3:56 AM

Answers

  • Hello,

    I agree with Dan.  A couple of rewrite rules shouldn't have a big impact.  The rules are loaded into memory and are applied to every url.  It is pretty easy to test. 

    Crawl without the re-write rules and get crawleradmin --status this should give you an approximate documents/sec.

    Now crawl with the re-write rules.

    Best Wishes,

    Michael Puangco | Senior Support Escalation Engineer | US Customer Service & Support

    Customer Service & Support                         Microsoft| Services


    Sunday, April 1, 2012 5:20 PM
    Moderator

All replies

  • How big is Your crawl? I wouldn't worry at all about 2-3 rules.

    Dan Gøran Lunde

    Friday, February 24, 2012 12:18 PM
  • Hello,

    I agree with Dan.  A couple of rewrite rules shouldn't have a big impact.  The rules are loaded into memory and are applied to every url.  It is pretty easy to test. 

    Crawl without the re-write rules and get crawleradmin --status this should give you an approximate documents/sec.

    Now crawl with the re-write rules.

    Best Wishes,

    Michael Puangco | Senior Support Escalation Engineer | US Customer Service & Support

    Customer Service & Support                         Microsoft| Services


    Sunday, April 1, 2012 5:20 PM
    Moderator