Ask a questionAsk a question
 

AnswerFiltering process could not be initialized

  • Wednesday, January 02, 2008 5:43 PMmcwoods Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I am recieving the following crawl log error on a handful of documents (125 of our 3.7 million indexed). 

    "The filtering process could not be initialized. Verify that the file extension is a known type and is correct."

    This mostly happens on XLS docs but also on several DOC docs and PPT docs.  I have the new ifilter installed and seems to be functioning correctly because it indexes the other 3.699 million douments without a hitch.  I am assuming that it is a document corruption issue, but there is another crawl log error indicating a corrupted document. 

    Any ideas on why this would be happening?  Anyone else seeing this?

Answers

  • Monday, January 07, 2008 2:44 PMmcwoods Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    No, the names don't have any characters that are causing problems. 

     

    But I did find out what is causing the error to be thrown in 80 or 90% of the cases: the file is corrupt.  I started checked the files that were returning this error and 80 to 90% were corrupt and had to be repaired.  Upon repairing (if possible) the files are crawled successfully (with the same file name).  There are a few files, however, that return the error, but are not corrupted.  Not sure on that, but at least I have an answer for most of the errors!

     

    I am not sure why it does not throw the "file is corrupt" error that is usually seen in the logs, but am satisfied to know that in most cases the "filtering process could not be initialized" error indicates a file corruption error.   

     

All Replies

  • Thursday, January 03, 2008 6:43 AMMike Walsh MVPMVP, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Proposed Answer
    Just a thought, but are there any peculiarties in the *names* (rather than the file extensions) of the files it doesn't like.

    Mike Walsh
    • Proposed As Answer byRickomet Wednesday, November 05, 2008 9:00 PM
    •  
  • Monday, January 07, 2008 2:44 PMmcwoods Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    No, the names don't have any characters that are causing problems. 

     

    But I did find out what is causing the error to be thrown in 80 or 90% of the cases: the file is corrupt.  I started checked the files that were returning this error and 80 to 90% were corrupt and had to be repaired.  Upon repairing (if possible) the files are crawled successfully (with the same file name).  There are a few files, however, that return the error, but are not corrupted.  Not sure on that, but at least I have an answer for most of the errors!

     

    I am not sure why it does not throw the "file is corrupt" error that is usually seen in the logs, but am satisfied to know that in most cases the "filtering process could not be initialized" error indicates a file corruption error.   

     

  • Sunday, January 20, 2008 6:42 PMNitin Chandola Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Are there any peculiarties in the file path... such as

    file://us01-flsvr01/keith's%20stuff/xrf/xrf_102407.xls

    (notice the apostrophe)

  • Monday, January 21, 2008 2:08 PMmcwoods Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Thanks for the reply. 

     

    No, no peculiar characters in the file path.  Most of the files were just corrupt, as you can read in my answer above. 

     

    Thanks.

     

  • Friday, March 28, 2008 3:26 PMCharftong Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    I'm having a very simliar issue.  I get about 80-90 error messages that state "The filtering process could not be initialized." after a crawl and here is an example path of one of the documents this happens on:

    http://site.com/shared%20documents/training/archive/exercises/3.4%20service%20request%20manager_ex.doc

     

    The file is not corrupt but I noticed that on all the links that return this error there is a 'exe' within the path(in this case the exercises folder).  Could this be causing an issue?  Has anyone else seen this?  Is there a way around this?

     

  • Wednesday, October 15, 2008 3:44 PMGene Magerr Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Has Code
    I was receiving these errors in the crawl log as well. When I would click on the links to any of the documents in the crawl log, I would be taken to a page that said a virus was found. Turns out Forefront for SharePoint had tagged 12,000 documents as having viruses. i knew these documents were clean, and opened a ticket with the Forefront team. Turns out one of the scan engines (the Command Engine, from Autentium only) was tagging these documents (migrated from SharePoint 2003) Here's an email sent to me with an explaination and resolution. The first resolution worked (look at the bottom as well) I did a full crawl after this and my errors went from 166 to 15. Hope this helps someone else.

    Issue:

                Forefront blocked multiple files as “Virus= is based on a remote template (Command).  This caused a large number of files to have their access blocked.

     

    Cause:

                This detection is only seen using the Command Engine, from Autentium only.  They have decided to make this and other somewhat similar detections on those types of files.  They state that these types of detections refers to a template that is not local.  As it is impossible to determine whether the remote template is good or bad they flag the document referring to it as suspicious.  They also stat that the Microsoft SharePoint 2003 now generates these possible dangerous files by default.
    This type of behavior can also be seen using Forefront for Exchange as well.  The same steps and regkey will resolve this as issue as well.

     

    Resolution:

     

    There are two different ways that this can be handled. The first and fastest way to take care of this issue is a workaround.

     

    This was be easily changed by choosing a different engine to replace Command.  To do this we simply went into the Forefront Server Security Administrator -
    “Settings” > “Antivirus”.  You had to choose the Scan Job needed by Highlighting: Realtime and then later Manual Scan Job.  Down below in that section there are check boxes for the engines that are being used for that particular Scan Job.  Another engine can be used in its place if desired.

    The other option which will allow you to continue running the Command engine, you will need to make a registry change. You will have to go into the registry path of:

    HKLM\SOFTWARE\Wow6432Node\Microsoft\Forefront Server Security\SharePoint.   ß 64 Bit

     

    HKLM\SOFTWARE\Software\Microsoft\Forefront Server Security\SharePoint     ß 32 Bit

     

    You will have to create a DWORD registry key called:

     

    CommandRemoteTemplateReturnNotInfected.

    The value for this should be set to 1. After making this change you will have to recycle the Forefront services to make it active.

     

    As for removing the Virus Flags from the files which is the cause of the SharePoint pages being blocked, there was another step that was needed.  For this particular change we had to go into the Forefront Server Security Administrator - “Settings” > “General Options” > “Scanning” > “Scan On Scanner Update”.  This box was checked to make sure that current virus flags became outdated with the new virus engine update.  The next time that the same file or web page is accessed, it will be scanned again correctly.