none
FAST ESP collection overview document count issue RRS feed

  • Question

  • Hi,

    I'm using JDBCConnector to index data coming from a SQL Server database, the document count for my collection is usually stopping after 1 and half million processed documents, if I run a quick search for documents above the reported count on collection overview, I can find them without any trouble.

    Is it a ESP issue?

    Thursday, March 29, 2012 2:43 PM

Answers

  • Hello Dana,



    The standard workaround to address many duplicates/corruption in fixml is to
    clearing fixmldb. The correct process will revalidate the fixml based on the
    actual documents found in the index. The process would require a resetindex to
    be run to create the new partitions.



    The process should not cause any search outage and would only require you to stop
    feeding.



    I would also recommend upgrading to the latest searchengine patch10. There have
    been several duplicate issues address in the previous patches



    How
    to clear fimxldb and resetindex

    1. Stop all feeding applications
    2. Stop all contentdistributor(s) (backups before the master)
      nctrl stop contentdistributor
      please verify that the process is completely stopped
      You can also check %FASTSEARCH%\var\log\contentdistributor for further log
      entries

    3. indexeradmin –a suspendindexing
      Please make sure the AdminGUI>Match Engines lists the indexing
      partitions as idle

    4. Stop all indexers (backups before the master)
      Please make sure that indexers are idle in the AdminGUI>Matching Engines
      nctrl stop indexer please verify that the process is completely stopped
      You can also check %FASTSEARCH%\var\log\indexer for further log entries

    5. Modify    $FASTSEARCH/etc/config_data/RTSearch/webcluster/rtsearchrc.xml on the
      admin node and add verifyMagicFileEnabled="true" to the options section

    6. delete or move $FASTSEARCH/data/data_fixml/hashFlushed on
      all indexer nodes

    7. delete or move $FASTSEARCH/data/ data_fixml/*.dat on all
      indexernodes

    8. Start all the indexers (masters before backup indexers)
      nctrl start indexer  Please check that the processes started 
      Please make sure that indexer remain idle in the AdminGUI>Matching Engines
    9. before continuing or index log (%FASTSEARCH%\var\log\indexer\indexer.log
      message 
      INFO   indexer RTSearch: Started Indexer

    10. indexeradmin -a resetindex (this should not affect search)
    11.  Please make sure that indexer have completed resetindex in the
    12. AdminGUI>Matching Engines before continuing
    13. Start contentdistributor
      nctrl start contentdistributor
       
    14. Start Feeding applications

    If you leave add verifyMagicFileEnabled="true" on your system. It will take longer to recover when the indexer was improperly shutdown. I prefer this option on as it always checks for possible fixml corruption when the indexer is improperly shutdown.


    Best Wishes,

    Michael Puangco | Senior Support Escalation Engineer | US Customer Service & Support

    Customer Service & Support                        Microsoft| Services




    Tuesday, April 3, 2012 7:37 PM
    Moderator

All replies

  • Hi Rogério,

    I have seen issues with the document counts on the collection overview screen.  What version of ESP are you using including service pack and patches?  You can see what patches you have installed at %fastsaerch%/var/installer/patch-history.xml.  Do you see a discrepancy between this command:
      indexerinfo doccount collectionname

    and compare with a search at the qrserver for the below:
      meta.collection:collectionname

    Let us know your results.

    Thanks!
    Rob Vazzana

    Friday, March 30, 2012 8:33 PM
    Moderator
  • Hey Rob,

    I am actually experiencing this problem with one of my clients and was meaning to post about it. I am seeing what appears to be duplicate documents on one of our production environments. 

    Just as you mentioned above we are doing a doc count via "indexerinfo doccount <collection name>" and comparing it to meta.collection:collection_name on the QR server.

    For some of our collections the returned hit list on the QR server is higher than the indexerinfo count. Sometimes it is just one duplicate document but in most scenarios the QR server is posting duplicates in excess of 1000.

    We are using the JDBC connector to do nightly feeds and are running on the following production environment:

    - ESP 5.3 SP4 with searchengine patch 02

    - 3 node setup (one admin node, 2 QR+indexer nodes)

    Any ideas on how we could resolve this issue? I have seen you mention in other posts that rebuilding the fixml db should solve the issue but I don't believe we have this option because it is on a production environment.

    Thanks for your help.

    -- Dana

    Monday, April 2, 2012 1:45 PM
  • Hello Dana,



    The standard workaround to address many duplicates/corruption in fixml is to
    clearing fixmldb. The correct process will revalidate the fixml based on the
    actual documents found in the index. The process would require a resetindex to
    be run to create the new partitions.



    The process should not cause any search outage and would only require you to stop
    feeding.



    I would also recommend upgrading to the latest searchengine patch10. There have
    been several duplicate issues address in the previous patches



    How
    to clear fimxldb and resetindex

    1. Stop all feeding applications
    2. Stop all contentdistributor(s) (backups before the master)
      nctrl stop contentdistributor
      please verify that the process is completely stopped
      You can also check %FASTSEARCH%\var\log\contentdistributor for further log
      entries

    3. indexeradmin –a suspendindexing
      Please make sure the AdminGUI>Match Engines lists the indexing
      partitions as idle

    4. Stop all indexers (backups before the master)
      Please make sure that indexers are idle in the AdminGUI>Matching Engines
      nctrl stop indexer please verify that the process is completely stopped
      You can also check %FASTSEARCH%\var\log\indexer for further log entries

    5. Modify    $FASTSEARCH/etc/config_data/RTSearch/webcluster/rtsearchrc.xml on the
      admin node and add verifyMagicFileEnabled="true" to the options section

    6. delete or move $FASTSEARCH/data/data_fixml/hashFlushed on
      all indexer nodes

    7. delete or move $FASTSEARCH/data/ data_fixml/*.dat on all
      indexernodes

    8. Start all the indexers (masters before backup indexers)
      nctrl start indexer  Please check that the processes started 
      Please make sure that indexer remain idle in the AdminGUI>Matching Engines
    9. before continuing or index log (%FASTSEARCH%\var\log\indexer\indexer.log
      message 
      INFO   indexer RTSearch: Started Indexer

    10. indexeradmin -a resetindex (this should not affect search)
    11.  Please make sure that indexer have completed resetindex in the
    12. AdminGUI>Matching Engines before continuing
    13. Start contentdistributor
      nctrl start contentdistributor
       
    14. Start Feeding applications

    If you leave add verifyMagicFileEnabled="true" on your system. It will take longer to recover when the indexer was improperly shutdown. I prefer this option on as it always checks for possible fixml corruption when the indexer is improperly shutdown.


    Best Wishes,

    Michael Puangco | Senior Support Escalation Engineer | US Customer Service & Support

    Customer Service & Support                        Microsoft| Services




    Tuesday, April 3, 2012 7:37 PM
    Moderator