locked
Unable to Complete Indexing: Running out of memory on 64-Bit, 34 GB indexer machine RRS feed

  • Question

  • We have a cluster with multiple indexer columns and recent changes to the number of document we're feeding has most recently bumped the number of documents per indexer up from 4.5 million to 6.2 million documents (37% growth).

    One or more of the indexers is having problems completing indexing. We've tried several things, from upping the number of partitions, to adjusting the maximum about of RAM that could be used, and other rtsearchrc.xml settings to no avail. The message we're seeing in the Web UI System Logs for that indexer is the following:

    Time     Level     Message
    2012-10-02 03:23:45     Info     ft::session_holder: Session 2130 has been terminated, closing.
    2012-10-02 03:23:45     Info     ft::session_holder: Session 2129 has been terminated, closing.
    2012-10-02 01:23:09     Info     ft::session_holder: Session 2129 has been terminated, closing.
    2012-10-02 01:17:10     Info     master_indexing_thread (p:6,j:1): Setting new max limit for partition 4 to 1567077 docs.
    2012-10-02 01:02:48     Info     master_indexing_thread (p:6,j:1): Indexing failed due to out of memory.
    2012-10-02 01:02:48     Warning     index_worker (4_0): Indexing failed.
    2012-10-02 01:02:48     Info     simple_index_producer_callback (4): Indexing failed with error message: Indexing failed, check index producer log.
    2012-10-01 22:34:45     Info     work_order (4_0): Index State Reset - Not using incremental indexing
    2012-10-01 19:57:57     Info     work_order (5_0): Index State Reset - Not using incremental indexing
    2012-10-01 16:40:00     Info     work_order (6_0): Index State Reset - Not using incremental indexing
    2012-10-01 16:20:42     Info     state::runtime: Indexing resumed
    2012-10-01 16:20:42     Info     indexer_admin_servant: Reset index requested.

    If we examine the indexing producer logs, we see stuff like this:

    [2012-10-02 23:33:49.553] VERBOSE    index_producer_5 processing string vector 'bavntrcountry'
    [2012-10-02 23:33:49.553] VERBOSE    index_producer_5 attribute vector size is 7557904 chunksize is 52428800
    [2012-10-02 23:33:49.782] VERBOSE    index_producer_5 processing string vector 'bavnnavcat'
    [2012-10-02 23:33:49.782] VERBOSE    index_producer_5 processing string vector 'bavnnavfam'
    [2012-10-02 23:33:49.782] VERBOSE    index_producer_5 attribute vector size is 1075573140 chunksize is 52428800
    [2012-10-02 23:35:12.613] ERROR      index_producer_5 indexer_holder: Ran out of memory. Indexing failed
    [2012-10-02 23:35:12.613] INFO       index_producer_5 indexer_holder: FIXML files: 174942, Documents total: 0, Documents ok: 2088889, Documents w/error: 0

    The attribute vector size of 1075573140 is, to use the scientific term, ginormous compared to the others. If those numbers represent bytes, then it is ~1025MB (which seems curiously close to 1024MB, or 1GB)

    We have already been consulting the How to configure indexer partitions for FAST ESP article and have looked at other articles on these forums regarding issues with indexing and was not able to find anything to help us. According to the article, if we allocate up to 3GB in 4GB that the docsDistributionMaxMB setting should be able to handle per partition, we theoretically can store ~4 million documents per large indexing partition, which doesn't seem to be the case.

    We don't several iterations over the course of the last week or so modifying the number of indexing partitions for a column, the amount of docDistributionMaxMB, the document partition triggers and haven't been able to make headway to resolving this issue. Here are the relevant bits of our current rtsearchrc.xml config:

    ...
    docsDistributionPst="100,100,100,100,100,50,33"
    docsDistributionMax=""
    docsDistributionMaxMB="3072"
    numberDocsPerFixml="200"
    fsearchCachePstDist="1,13,27,13,1,13,7,5,5,5,5,5"
    maxMBCacheSize="600"
    maxSetDirs="30000"
    maxFixmlFiles="4000000"
    diskspaceMBWarning="3000"
    indexerSwapDetect="-W -a 40000 -t 1000 -k 250 -K 500"
    performCleanup="true"
    cleanupTimes="3-5"
    useSequencing="false"
    docIndexType="filehash"
    debugLog="false"
    maxStaticPartitions="7"
    numberPartitions="7"
    blacklistInterval="20"
    compressFixml="true"
    compactFixml="true"
    indexingThreads="2"
    maxActiveIndexingJobs="1"
    ...
    <!-- Index triggers -->
    <index-scheduling type="docCount" triggers="10000,100000,1750000,1750000,1750000,1750000"/>
    ...

    One small note is we changed docsDistributionMaxMB from docsDistributionMaxMB="-1" to docsDistributionMaxMB="3072" on our latest attempt to resolve the memory issues to no avail. 

    We do not have much time for more configuration change iterations, we need to understand the root issue so the we can resolve the issue. It takes at least 12 hours or so for a reset index to complete and for us to know if the latest change worked. Based upon our current config, we should be able to support up to ~7 million documents on this indexer column and each large partition SHOULD be able to support up to ~4 million documents on its own even though we limit it to 1.75 million via doc triggers.

    We've read that lowering indexing threads can reduce memory usage errors, but we do not really want to make things slower if we can help it. We're at a bit of a loss as to a next step. We're contemplating having 9 partitions (2 small, 7 large) and limit the large partitions to 1 million documents each. From a memory standpoint these machines shouldn't have an issue (34GB, 64-Bit CentOS). We're doing fine on disk storage as well.

    Any suggestions for what the issue is or how to resolve? I've worked with ESP for over 8 years and this is the first time we've run into an indexer issue like this that we couldn't resolve in a short period on our own. Mercy. Thank you for your time!

    -Michael

    Wednesday, October 3, 2012 6:21 PM

All replies

  • We tried it with 9 partitions last night, 7 limited to 1 million docs, still having memory issues:

    Time	Level	Message
    2012-10-04 08:03:08	Info	master_indexing_thread (p:8,j:1): Setting new max limit for partition 6 to 1566675 docs.
    2012-10-04 07:51:29	Info	master_indexing_thread (p:8,j:1): Indexing failed due to out of memory.
    2012-10-04 07:51:29	Warning	index_worker (6_0): Indexing failed.
    2012-10-04 07:51:29	Info	simple_index_producer_callback (6): Indexing failed with error message: Indexing failed, check index producer log.
    2012-10-04 05:29:01	Info	work_order (6_0): Index State Reset - Not using incremental indexing
    

    We reviewed index_producer_6.log and found the following:

    [2012-10-04 07:51:19.575] VERBOSE    index_producer_6 Calling PrepareAttrVectors
    [2012-10-04 07:51:22.138] VERBOSE    index_producer_6 processing string vector 'bavntrdocsrc'
    [2012-10-04 07:51:22.138] VERBOSE    index_producer_6 processing string vector 'bavntrdocflags'
    [2012-10-04 07:51:22.138] VERBOSE    index_producer_6 attribute vector size is 79538299 chunksize is 52428800
    [2012-10-04 07:51:24.093] VERBOSE    index_producer_6 processing string vector 'bavncnitemtype'
    [2012-10-04 07:51:24.093] VERBOSE    index_producer_6 attribute vector size is 7599 chunksize is 52428800
    [2012-10-04 07:51:24.093] VERBOSE    index_producer_6 processing string vector 'bavncnassetcontexts'
    [2012-10-04 07:51:24.093] VERBOSE    index_producer_6 attribute vector size is 10714 chunksize is 52428800
    [2012-10-04 07:51:24.093] VERBOSE    index_producer_6 processing string vector 'bavntrstates'
    [2012-10-04 07:51:24.093] VERBOSE    index_producer_6 attribute vector size is 6657954 chunksize is 52428800
    [2012-10-04 07:51:24.522] VERBOSE    index_producer_6 processing string vector 'bavntrcertifications'
    [2012-10-04 07:51:24.522] VERBOSE    index_producer_6 attribute vector size is 1241439 chunksize is 52428800
    [2012-10-04 07:51:24.587] VERBOSE    index_producer_6 processing string vector 'bavntrowners'
    [2012-10-04 07:51:24.587] VERBOSE    index_producer_6 attribute vector size is 434967 chunksize is 52428800
    [2012-10-04 07:51:24.604] VERBOSE    index_producer_6 processing string vector 'bavntrcountry'
    [2012-10-04 07:51:24.604] VERBOSE    index_producer_6 attribute vector size is 7762420 chunksize is 52428800
    [2012-10-04 07:51:24.812] VERBOSE    index_producer_6 processing string vector 'bavnnavcat'
    [2012-10-04 07:51:24.812] VERBOSE    index_producer_6 processing string vector 'bavnnavfam'
    [2012-10-04 07:51:24.812] VERBOSE    index_producer_6 attribute vector size is 1290862324 chunksize is 52428800
    [2012-10-04 07:51:29.422] ERROR      index_producer_6 indexer_holder: Ran out of memory. Indexing failed
    [2012-10-04 07:51:29.422] INFO       index_producer_6 indexer_holder: FIXML files: 193922, Documents total: 0, Documents ok:
     2088901, Documents w/error: 0

    This makes the largest attribute vector 1.2 GB. If we add all of the vectors together, its 1.29 GB. I do not know what the chunksize is and cannot find reference to it in ESP docs. If its in bytes as well, is exactly 50 MB.

    At the moment it doesn't seem that adjusting partition size make any difference with the memory issue. We're going to revert back to our original config and adjust the doctrigger to support up to 6.5 million docs for the moment.

    We are wondering if there is some odd content in the dataset causing issues. We tried to plot out attribute vector sizes from the producer logs and notice across the cluster there seemed to be a spike that of attribute vector that was double the size (~2GB) of anything else. The biggest we've been seeing with latest iterations is ~1.2GB. Its very very strange. 

    Thursday, October 4, 2012 4:35 PM
  • Hi Michael,

    While adding partitions is a short term solution to give you more breathing room, you might want to look into adding a new column and refeed the content with fixmlfeeder as a long term solution if you anticipate more growth and if the attribute vectors memory remain  high. Procedure of adding column is outlined in the operations guide.

    Wednesday, October 31, 2012 8:17 PM
  • Hi Michael,

    I would suggest that you open a ticket with our Technical Support team.  There could be something content specific that is involved in the behavior that you are seeing.  Also, I would recommend that you enable the index producer logs to log in debug mode, so that we could get more information about what is taking place when the out of memory issue is encountered.

    To enable debug logging for the index producer, one can edit $FASTSEARCH/etc/LoggerConfig.xml on the indexer node.  In this file, for the indexproducer entry, and change the "file" threshold to debug.  After saving the file, restart the indexer component, for the change to take effect.   Let us know your results.

    Thanks!

    Rob Vazzana | Sr Support Escalation Engineer | US Customer Service & Support

    Customer Service   & Support                            Microsoft| Services

    Friday, November 16, 2012 10:40 PM