FAST Search crawl fails, running out of resources RRS feed

  • Question

  • We are having some issues getting a full crawl to complete by FAST Search connector in our environment. We are attempting to crawl ~24 million records from SQL Server 2008 R2. The crawl process (mssearch.exe) slowly eats up all the memory on the server, and when the memory is completely full it starts using the disc space by storing some files on the Search Service account’s users temp folder and eats up the complete space on the disc… when the server completely run’s out of resource the Mssearch.exe crashes, and restarts itself from the beginning.

    Our query environment consists of two SharePoint 2010 Servers w/ 16GB memory each, both set up as crawl components pointing at the same crawl database.

    We tried using the Out-Of-the-Box SQL connector and also tried using a custom .NET connector assembly, implemented to perform batching based on the "LastID" filter and an "IDEnumerator". I can see that the connector is batching records properly, so I would expect that the SP Server would stream each batch over to the FAST server as it reads them - however, it still seems to be holding everything in memory or on disc. No connections are being made from mssearch.exe to the FAST server during this time.

    What else have we tried?

    Custom .NET connector using a "ReadList" operation and batching filters based on BatchId and HasMoreData.

    Out-of-the-box SQL Connector hitting stored procedures

    Out-of-the-box SQL Connector hitting the SQL table directly

    Unfortunately we see the same exact behavior with all of these.

    Has anyone out there seen anything similar to this, and have any idea how to get around this?

    When we attempted to crawl 4 million records or anything less than that the crawl job completes successfully, One thing I noticed here is that FAST Search connector crawls the complete data and keeps everything in its memory, once it is done with the crawl it then pushes the Index files to FAST server and FAST databases, I expect this pushing happens periodically or in batches so we don’t run out of resources on the SharePoint application server where this crawl component resides.

    In our production environment we need to crawl more than 80 million records and there is no way we can increase the resources on the crawl server to hold everything in its memory, In that case I guess it might need more than 120 GB ram and 300 GB of disc space.

    Any suggestions will be appreciated, I thank you in advance. 



    Wednesday, March 21, 2012 3:50 PM

All replies

  • Came across your post while searching about using FAST search on 20+ million records. Is this resolved and i'm just curious to know how it was handled? 

    Thanks, DC SharePointer

    Thursday, December 5, 2013 8:02 PM