none
rtsearchrc.xml index trigger changes when adding extra index partitions RRS feed

  • Question

  • Is it necessary to add index trigger values as more index partitions are added?

    We use FAST ESP 5.3 SP4 and our documents are updated frequently.

    Assumption: Default rtsearchrc.xml values of:
        useSequencing="true"
        docsDistributionPst="100,100,100,100"
        docsDistributionMax= 200000,200000,200000,200000
        numberPartitions="4"
        <index-scheduling type="docCount" triggers="10000,100000,200000"/>

    Would these be the recommended changes to add a new partition?:
        useSequencing="true"
        docsDistributionPst="100,100,100,100,100"
        docsDistributionMax= 200000,200000,200000,200000,200000              
        numberPartitions="5"
        <index-scheduling type="docCount" triggers="10000,100000,200000,200000"/>

     


    • Edited by Churchill729 Thursday, October 13, 2011 7:35 PM corrected value
    Thursday, October 13, 2011 7:02 PM

Answers

  • Glad to assist.  You are correct.  The old version is still indexed and using up disk space until the partition containing it, is re-indexed (either through triggers or through resetindex). Due to blacklisting, only the new version is searchable.

     

    Thanks!

    Rob Vazzana | Sr Support Escalation Engineer | US Customer Service & Support

    Customer Service & Support                          Microsoft | Services

    • Marked as answer by Churchill729 Wednesday, October 26, 2011 3:13 PM
    Tuesday, October 25, 2011 5:01 PM
    Moderator

All replies

  • When adding partitions I guess you do it to decrease the size of the largest (part 2), so you would need to tune docsDistributionPst

    docsDistributionPst="100,100,100,50,33"

    will give you 3 large partitions with the same size and two small ones which can index new documents quickly

    (part 4 will take 33% of all documents (1/3), part 3 50% of the remaing (50% of 2/3), and part 2 100% of the rest (100% of 1/3)

    the triggers depends on how many documents you have in your index

    Friday, October 14, 2011 8:00 AM
  • So docsDistributionPst and triggers perform similar functions; to control the waterfall of documents from partition to partition.

     

    When I load large volumes of documents using my 5-partition configuration example, I notice that once partition 3 is fully indexed, these documents simply roll into partition 4 and are indexed again.

    Would it make more sense to configure a version archive indexing (Ch. 4, ESP Configuration Guide) such as:

    1) deleting the docsDistributionPst attribute

    2) using docCountArchive instead of docCount

    3) only using triggers for partitions 0,1,2 

    4) set the reindexLimit attribute to 100 in order to disable automatic re-indexing?


    • Edited by Churchill729 Friday, October 14, 2011 8:27 PM typo correction
    Friday, October 14, 2011 8:26 PM
  • Archive indexing is intended for content that does not change very often. If most of your content is updated on a frequent basis, then archive indexing would not be the best index scheduling option.

     

    Thanks!

    Rob Vazzana | Sr Support Escalation Engineer | US Customer Service & Support

    Customer Service & Support                          Microsoft | Services

    Wednesday, October 19, 2011 2:15 PM
    Moderator
  • Rob, Thank you for the reply.

    Assuming a single node installation using docCountArchive index partitions:

    The FAST ESP documentation seems to indicate that when a document is updated by resubmitting it to FAST (in partition 0), the old version is still blacklisted even if it is in a higher archive partition.

    The negative of this approach seems to be that the old blacklisted version of the document is not cleaned up until that partition is re-indexed. Only the correct, newest, version of the document would still be returned as a search result.

    Is this interpretation correct? 

    Friday, October 21, 2011 8:51 PM
  • Glad to assist.  You are correct.  The old version is still indexed and using up disk space until the partition containing it, is re-indexed (either through triggers or through resetindex). Due to blacklisting, only the new version is searchable.

     

    Thanks!

    Rob Vazzana | Sr Support Escalation Engineer | US Customer Service & Support

    Customer Service & Support                          Microsoft | Services

    • Marked as answer by Churchill729 Wednesday, October 26, 2011 3:13 PM
    Tuesday, October 25, 2011 5:01 PM
    Moderator
  • It's probably just a typo, but the triggers values for the new partitioning scheme have the exact same number for the last 2 triggers.  So when partition number 3 gets over 200,000 docs, it'll trigger a build of partition 4, and when partition 4 gets over 200,000 (which will happen anytime the previous trigger is tripped), it'll trigger a build of partition 5.

    You probably want the last two values to be something like 200,000 and 2,000,000, so that you aren't just shuffling the index from one partition to another, but rather are aggregating into a larger partition.


    Garth Grimm
    Avery Ranch Consulting
    www.averyranchconsulting.com
    Tuesday, November 1, 2011 9:51 PM
  • Garth,

    That is the heart of my original question. We actually have 10+ partitions and have found for our hardware, the max partition size should not exceed 2,000,000. It seems silly to have a 10 level pyramid of triggers.

    When using 10K, 100K, 1M, 2M, 2M, 2M, etc...structure of triggers, the eventual shuffling from one partition to the next seems to serve the purpose of cleaning up the blacklist. 

    I am considering using docCountArchive (with only 2 or 3 levels of triggers) and then setting the reindexLimit attribute to 20%. Hopefully this will keep the blacklist small without all the constant partition shuffling.

    Wednesday, November 2, 2011 8:59 PM
  • Hi Rob, I might have a question for you about indexing partitions,

    do you have any idea if we are able to attach each index-partition to a particular hard drive? Let's say I'd like to have different performance for my first two indexing partitions, and the rest of them using a regular HDD. Is this scenario possible in FAST Search for SharePoint 2010??

    Thanks

    Saturday, May 26, 2012 8:17 PM