none
High memory usage in custom crawling component RRS feed

  • Question

  • We are running into some performance issues with a custom crawl component that we built, and just wondering if anyone has any suggestions for how to reel our memory usage back in, or if we just simply need more memory.

    Setup:  1 WFE, 2 App Servers (crawl components) with 8GB memory each.  Dataset to be crawled is about 24 million records.

    We have a custom crawling component that provides an 'IdEnumerator' method and a 'SpecificFinder'.  For small sets of data (tens of thousands), the IdEnumerator is called, and a couple minutes after the call completes, the crawl components start calling away at the SpecificFinder and indexing all the documents.

    However, when we try to do a full crawl of our dataset (24+ million records), the server hunkers down in 100% mode for almost two hours after the IdEnumerator completes.  Memory continues to climb steadily until it maxes the system out, and then the search process crashes with a memory error.  No SpecificFinder is ever called. 

    Is it normal for that amount of pre-processing to be done prior to starting to actually index data using the SpecificFinder method, and if so, do we just need more memory in our servers?  Or, is anyone aware of a way to throttle that process down?

    I know we can throttle the portion of the crawl that looks up specific records, by using Crawler Impact Rules.  However, I can't seem to find anything to help out with the initial processing of the IdEnumerator resultset.

    Any help at all would be really appreciated!

    Thanks,

    Matt


    Thursday, December 8, 2011 3:21 PM

All replies

  • Assuming you are using BCS framework. Have you implemented batching ?. if not, you can pass the records to FAST by batch by batch using BCS Framework.
    Sriram S
    Thursday, December 8, 2011 5:28 PM
  • I tried batching back when we were only on 4GB of memory, and were using a ReadList function instead of IdEnumerator.  Perhaps the reason it failed at that time was that we were just severely short on memory.

    I will give batching another shot with the IdEnumerator and see what happens.

     

    As an update to my original post - we upgraded the servers to 16GB of memory each this morning, and the mssearch process still outgrew what was available and crashed.

    Thursday, December 8, 2011 5:41 PM
  • I implemented batching via the LastId filter, and it is now chunking away at the data.

    While I expected to see memory usage rise and fall with each batch, it seems to be steadily growing at the same pace as it was without the batching, which doesn't have me too confident.

    Should know more in a couple hours.

     

    UPDATE:

    It crashed with the same memory error at about 13 million records.  The mssearch process was up to about 15GB of memory usage.  Looks like whether I am batching or not, it is loading something up in memory prior to the actual crawling of specific results that is just huge.

    Our entity object is pretty big, so I trimmed it down to just 3 fields and kicked off the crawl again, thinking maybe it is provisioning instances of the objects right away, but not so much.  Even with the very small entity, the memory use is growing at the same pace.

    Thursday, December 8, 2011 7:58 PM