locked
Public Documents count is greater than the indexed document count RRS feed

  • Question

  • After a Bulk feed using the content API, the public documents count is greater than the indexed ones. Is there a way to clean up?

     

    Thank you very much.

    Monday, June 27, 2011 3:33 PM

Answers

  • Hi Fanis,

     

    Duplicates were a known issue in ESP 5.2 and 5.3.  To prevent duplicates, I would recommend making sure all nodes are at ESP 5.3 SP4, apply the latest searchengine patch (currently searchengine.patch06 is the latest,) and performing a fixmldb rebuild with resetindex.  There is a KB article about this known issue here:

    http://support.microsoft.com/kb/2518644

     

    Let us know your findings.

    Thanks!

    Rob Vazzana | Microsoft | Enterprise Search Group | Sr Support Escalation Engineer |  http://www.microsoft.com/enterprisesearch 

    Monday, July 25, 2011 9:34 PM

All replies

  • Have you tried to clean the collection with collection-admin -m clear -n <collection_name>?

     

    Félix

    Thursday, July 7, 2011 9:40 PM
  • Hi Félix,

     

    the number of documents is about 2 mil and if I clean the collection everything will be lost. I was wondering if there is a way to remove the duplicate documents...

    Tuesday, July 12, 2011 10:56 AM
  • Hi Fanis,

     

    Duplicates were a known issue in ESP 5.2 and 5.3.  To prevent duplicates, I would recommend making sure all nodes are at ESP 5.3 SP4, apply the latest searchengine patch (currently searchengine.patch06 is the latest,) and performing a fixmldb rebuild with resetindex.  There is a KB article about this known issue here:

    http://support.microsoft.com/kb/2518644

     

    Let us know your findings.

    Thanks!

    Rob Vazzana | Microsoft | Enterprise Search Group | Sr Support Escalation Engineer |  http://www.microsoft.com/enterprisesearch 

    Monday, July 25, 2011 9:34 PM
  • Thanks Rob,

    I will try it next time it appears because I had to go ahead with a project so I re fed FAST from scratch...

    Wednesday, July 27, 2011 7:57 AM