locked
Crawl log error: "The item may be too large or corrupt. You may also verify that you have the latest version of this IFilter." RRS feed

  • Question

  • I'm having problem crawling one of our sharepoint sites. When starting a full crawl it ends almost immediately and in the crawl log I see the error: 
    "The item may be too large or corrupt. You may also verify that you have the latest version of this IFilter.".

    My first thought was that it must have something to do with the Foxit PDF filer we are using, but it works splendid on another site collection on the same server.

    I have verified that the site is accessible from the index server using the crawler access account.
    Then I thought it might have something to do with large documents or document libraries and therefore increased the timeout value to 120 s and changed the keys MaxGrowFactor and MaxDownloadSize, but sadly this didn't help.

    When looking in the event log on the index server I can see this information message:
     "The search service stopped the filter daemon because it was consuming too many resources. A new daemon will automatically be started, and no user action is required." ... which might have something to do with my problem. However, the index server is not short on either hd space nor ram.

    When playing around with the search settings trying to fix the problem I noticed that when I added a Crawler impact rule which limits the number of simultanious request to 1 and to wait 60 s beteween requests, then the I get a different behavour. The crawler is now not stopping directly (as it did before) but instead continues to proccess content. But still I'm getting the above error on most of the sites, even on sites that don't contain any documents at all and have minimal content.

    Help with fixing this problem would really be appreciated :)

    Regards
    Daniel

    Friday, August 29, 2008 7:48 AM

Answers

  • I'm also experiencing the problem above.  I've implemented the following registry updates and completed a incremental update:

     

    Key

    Default Value

    New Value

    HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\DedicatedFilterProcessMemoryQuota

    104,857,600

    209,715,200

    HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\FilterProcessMemoryQuota

    104,857,600

    209,715,200

    HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\FolderHighPriority

    50

    500

     

    The crawl is still in progress but I no longer get the error described above.  thanks to Josh Gaffey for posting this - http://blogs.msdn.com/joshuag/archive/2009/10/05/crawling-large-lists-in-sharepoint-2007.aspx


    http://technicallead.wordpress.com/
    • Proposed as answer by Francois H. Pienaar Thursday, April 29, 2010 10:21 AM
    • Unproposed as answer by Mike Walsh FIN Thursday, April 29, 2010 10:31 AM
    • Proposed as answer by mfmh Wednesday, May 5, 2010 11:07 AM
    • Marked as answer by Mike Walsh FIN Wednesday, May 5, 2010 11:39 AM
    Thursday, April 29, 2010 10:21 AM

All replies

  • Hello
    I'm having the sameraw a 360Mb pdf files; setting a wait time up to 300 seconds, now it works.
    Now there is a strange behavior, the pdf has 22k pages and only the first 3k pages are indexed.
    Is not the best solution but is better than nothing
    regards
    Gerly
    Friday, September 5, 2008 7:39 AM
  • In the registry the entry CB_chunkBufferSizeInMegabytes should be checked ... i'm thinking that the answer is there
    Gerly
    Tuesday, November 4, 2008 9:59 AM
  • Hello,

    Can you let us know whether you discovered anything further on this issue or where we may find details?

    thx
    f
    mfn
    Wednesday, May 20, 2009 8:58 AM
  • Hi,

    Anybody been able to resolve this issue?

    The crawler in our case is able to crawl all the other web applications but fails on just one giving the error:

    "The item may be too large or corrupt. You may also verify that you have the latest version of this IFilter"

    However, the crawler is able to crawl the other site collections in that web application. It is failing for the root site collection.

    Thanks,
    Sanjeev
    Monday, August 10, 2009 3:59 PM
  • I'm also experiencing the problem above.  I've implemented the following registry updates and completed a incremental update:

     

    Key

    Default Value

    New Value

    HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\DedicatedFilterProcessMemoryQuota

    104,857,600

    209,715,200

    HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\FilterProcessMemoryQuota

    104,857,600

    209,715,200

    HKLM\SOFTWARE\Microsoft\Office Server\12\Search\Global\GatheringManager\FolderHighPriority

    50

    500

     

    The crawl is still in progress but I no longer get the error described above.  thanks to Josh Gaffey for posting this - http://blogs.msdn.com/joshuag/archive/2009/10/05/crawling-large-lists-in-sharepoint-2007.aspx


    http://technicallead.wordpress.com/
    • Proposed as answer by Francois H. Pienaar Thursday, April 29, 2010 10:21 AM
    • Unproposed as answer by Mike Walsh FIN Thursday, April 29, 2010 10:31 AM
    • Proposed as answer by mfmh Wednesday, May 5, 2010 11:07 AM
    • Marked as answer by Mike Walsh FIN Wednesday, May 5, 2010 11:39 AM
    Thursday, April 29, 2010 10:21 AM