none
The FAST Search backend reported warnings when processing the item. ( Document conversion failed: Access is denied. ) RRS feed

  • Question

  • I receive this warning from my crawl of a repository full of Office documents. While much of the content is crawled correctly, there are a number of documents that report the error message "The FAST Search backend reported warnings when processing the item. ( Document conversion failed: Access is denied. )".

    I've watched the crawl from the network and it looks like the documents with processing warnings are crawled properly. So my thought is to look at the FAST backend. Is my document converter getting overloaded? What should I be looking at?

    Thanks,

    Kevin

    Friday, July 2, 2010 6:26 PM

Answers

  • Hi Manas,

    The issue has been solved by resolving permissions issues on my %FASTSEARCH% root folder.

    • Marked as answer by Kevin-SP2010 Friday, July 16, 2010 5:49 PM
    Friday, July 16, 2010 5:49 PM

All replies

  • Hello Kevin

    The document conversion failed message is quite generic and can get logged because of multiple reasons. When the document conversion message appears the issue could be with the converter itself.

    Can you check the disk space on FAST box . If FS14 server is low on disk space , it can cause indexer to halt.

    We also need to narrow down if the issue is only happening with a specific type of document like one note documents .

    To get more information on the converter, run the converter directly against a few docs that are causing the message to appear.

    Steps :
    ============

    Copy a few docs that are causing the message to appear to a temp directory on the FAST SERVER. From the FAST server command prompt navigate to the temp directory or the directory you copied over the files . Run the below command

    Run: IFilter2Html.exe (file_name) > (filename)_convert_test.txt . Look for errors in (filename)_convert_test.txt.
     
    The conversion issue should appear in output file.

    ==========================================================
    Manas -MSFT

    Wednesday, July 7, 2010 1:29 AM
  • Hi Manas,

    Thanks for the reply. I checked disk space and I have plenty of it. As an aside, I moved several of the documents with issues to another Sharepoint site I control and I received the same crawl errors when I crawled the content.

    I ran the conversion on several documents, but I did not see any error messages. I did see character issues and I've noticed that many of the documents with issues are in German or contain German characters. Let me know if this helps and I'll look into the issue further.

    Thanks

    [Update]

    I reviewed one of the simpler files that is not being crawled properly and I've included some of the ifilter2html output below. 

    <html>

    <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#11/11" content="False" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#12/4108" content="Title ǂ 1 ǂ " scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#13/4126" content="Address Line One ǂ " scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#15/30" content="Landor Associates" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#16/11" content="False" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#17/3" content="886" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#19/11" content="False" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#22/11" content="False" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#23/3" content="730895" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#5/3" content="6" scheme="MSSCrawledProperty" />

    <meta name="IFilter/D5CDD502-2E9C-101B-9397-08002B2CF9AE/#6/3" content="1" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#10/64" content="1601-01-01T00:07:00Z" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#11/64" content="2006-03-02T07:53:00Z" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#12/64" content="2009-07-20T08:19:00Z" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#13/64" content="2009-08-13T11:04:00Z" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#14/3" content="1" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#15/3" content="140" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#16/3" content="747" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#18/30" content="Microsoft Office Word" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#19/3" content="0" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#2/30" content="Address Line One" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#4/30" content="Larssex" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#7/30" content="ABB_1823_Brev_A4_Rubrik_Ny" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#8/30" content="Emma Enfors" scheme="MSSCrawledProperty" />

    <meta name="IFilter/F29F85E0-4FF9-1068-AB91-08002B27B3D9/#9/30" content="4" scheme="MSSCrawledProperty" />

    </head>

    <body>

    Friday, July 9, 2010 5:22 PM
  • Hello Kevin,

    Thank you for the update . So is the problem here that the document does not end up in the index?

    In order to further narrow down the issue you may want to run from C:\FASTSearch\bin>

    psctrl doctrace on

    psctrl debug on

    Then use doclog to examine what throws the error. You can get a doclog -a after you have crawled a few problematic URIs. Log location  %FASTSEARCH%\var\log\.

    Please do let us know what you see on the logs. Also once you are done with gathering the logs please remember to turn off the same to avoid excessive logging. If you have narrowed down the issue happening just with German documents this would need further analysis and repro at our end.

    Monday, July 12, 2010 3:44 PM
  • Hi Manas,

    I'm having a similar problem indexing files, but slightly different, I hope you don't mid me butting in.

    My issue is 'Document conversion failed: 0' to be exact FAST search server pipeline fails to process PDF documents. 
    couple of things - most other file types work, I have installed advanced filter pack. Reinstalled cleanly and have no other errors or warnings in any of the var logs that I can see.

    here is a dump of the doclog which shows a pdf failing to convert. I think who ever gets this fixed is on the cutting edge of FAST :) stopping short of debugging .py modules.

    Also one more detail ifilter2html.exe -l does not show .pdf as one of the supported extensions. (FAST rtm)

     

      INFO   Get "63E90878-0292-490D-8B7C-F3905A8B65FD:http_lastmodified:31"

      INFO   Processor Decompressor:

      INFO   GetValue "encoding"

      INFO   Processor CrawlerDemarshaller:

      INFO   GetValue "extra_data"

      INFO   Processor CrawlerRSSProcessor:

      INFO   GetValue "url"

      INFO   GetValue "extra_data"

      INFO   No extra data for http://www.citycol.com/computing/ecdl/v4m1pt.pdf

      INFO   Processor CrawlerSitemapProcessor:

      INFO   GetValue "url"

      INFO   GetValue "extra_data"

      INFO   No sitemap data for http://www.citycol.com/computing/ecdl/v4m1pt.pdf

      INFO   Processor AttachmentsHandler:

      INFO   Has "html"? Attribute not found

      INFO   GetValue "28636AA6-953D-11D2-B5D6-00C04FD918D0:#11:31"

      INFO   GetValue "28636AA6-953D-11D2-B5D6-00C04FD918D0:#11:31"

      INFO   GetValue "C82BF597-B831-11D0-B733-00AA00A1EBD2:LINK.OFFICECHILD:31"

      INFO   GetValue "C82BF597-B831-11D0-B733-00AA00A1EBD2:LINK.OFFICECHILD:31"

      INFO   GetValue "00020386-0000-0000-C000-000000000046:EntityNamespace:31"

      INFO   GetValue "00020386-0000-0000-C000-000000000046:EntityNamespace:31"

      INFO   Get "00140329-0000-0140-C000-000000141446:DAV:contentclass:31"

      INFO   Has "data"? Attribute found

      INFO   Has "data_1"? Attribute not found

      INFO   Doing nothing, not a multi-attachment type item

      INFO   Processor UTFDetectorConverter:

      INFO   Has "data"? Attribute found

      INFO   Get "data"

      INFO   Document is not in utf-16 or utf-32 - no conversion

      INFO   Processor FastFormatDetector:

      INFO   GetValue "00020386-0000-0000-C000-000000000046:EntityNamespace:31"

      INFO   GetValue "00020386-0000-0000-C000-000000000046:EntityNamespace:31"

      INFO   Has "data"? Attribute found

      INFO   Has "html"? Attribute not found

      INFO   Get "0B63E343-9CCC-11D0-BCDB-00805FCCCE04:FileExtension:31"

      INFO   GetValue "url"

      INFO   Has "data_utf8"? Attribute not found

      INFO   Get "data"

      INFO   Has "mime"? Attribute not found

      INFO   Set "mime": '<Missing>' ---> 'application/pdf'

      INFO   Set "extension": '<Missing>' ---> '.pdf'

      INFO   Set "7262A2F9-30D0-488F-A34A-126584180F74:ext:31": '<Missing>' ---> ''

      INFO   SetMeta "7262A2F9-30D0-488F-A34A-126584180F74:ext:31[crawled property]": '<Missing>' ---> 'CP'

      INFO   Get "7262A2F9-30D0-488F-A34A-126584180F74:ext:31"

      INFO   Add "format": '<Missing>' ---> 'Adobe PDF'

      INFO   Processor FormatDetector:

      INFO   Has "html"? Attribute not found

      INFO   Has "data"? Attribute found

      INFO   Get "data"

      INFO   Set "outsidein_mime": '<Missing>' ---> 'application/pdf'

      INFO   Set "outsidein_format": '<Missing>' ---> 'Adobe PDF'

      INFO   Processor XMLMapper:

      INFO   Processor SimpleConverter:

      INFO   Has "converter"? Attribute not found

      INFO   Has "data"? Attribute found

      INFO   Has "html"? Attribute not found

      INFO   Get "mime"

      INFO   Processor PDFConverter:

      INFO   Has "converter"? Attribute not found

      INFO   Has "data"? Attribute found

      INFO   Get "mime"

      INFO   Has "html"? Attribute not found

      INFO   Set "converter": '<Missing>' ---> 'PDFConverter'

      INFO   Get "data"

      WARNING Document conversion failed:  (warning code 0)

      INFO   Processor IFilterConverter:

      INFO   Has "converter"? Attribute found

      INFO   Get "converter"

      INFO   Document has been designated a different converter, skipping conversion

      INFO   Processor SearchExportConverter:

      INFO   Has "converter"? Attribute found

      INFO   Get "converter"

      INFO   Document has been designated a different converter, skipping conversion

     

    Tuesday, July 13, 2010 2:41 PM
  • Hi Manas,

    I've isolated a few good and bad files on a fileshare.

    I have a known good pdf and it was crawled without error.

    I have a bad excel spreadsheet it had errors when crawling. The error message wasn't helpful - 

    '  INFO   Processor IFilterConverter:

      INFO   Running conversion in a child process

      WARNING Document conversion failed: Access is denied. (warning code 0)'

     

    I also tried a docpush on the same excel file and received these messages:

    PS E:\search\fastsearch\var\log> docpush -c sp 'E:\Downloads\TestDocs\pv_obj_id_E5EE706C74FF6E2042407B8B85197789007A0000.xls'

    [2010-07-13 10:40:34.859] WARNING    sp Reported warning with http://cohowinery.com/E:\Downloads\TestDocs\pv_obj_id_E5EE706C74FF6E2042407B8B85197789007A0000.xls: processin

    g::Document conversion failed: Access is denied.

    [2010-07-13 10:40:34.859] INFO       sp All add operations completed


    I'm going to proceed with some other documents to see if there are similar failures, but this is what I have so far.

    Thanks,
    Kevin

     

    Addendum:

    Found these messages in the doc_warnings_sp file inthe data_fixml directory.

    fb078fb79cb99d715895dbb585b29392 87 ó^`ª-Document conversion failed: Access is denied.
    processing±
    6efd8096372ba2fbed5c4570dda876b8 87 ó^`ª-Document conversion failed: Access is denied.
    processing²
    7c79afba42e6ab88879ddf2d1e1cfabb 87 ó^`ª-Document conversion failed: Access is denied.
    processingÄ
    6efd8096372ba2fbed5c4570dda876b8 87 ó^`ª-Document conversion failed: Access is denied.
    processingÇ
    7624f6ec0a621900884f5c4c744a1bc6 87 ó^`ª-Document conversion failed: Access is denied.
    processingÉ
    7c79afba42e6ab88879ddf2d1e1cfabb 87 ó^`ª-Document conversion failed: Access is denied.
    processingÄ
    7c79afba42e6ab88879ddf2d1e1cfabb 87 ó^`ª-Document conversion failed: Access is denied.
    processingÄ
    7c79afba42e6ab88879ddf2d1e1cfabb 87 ó^`ª-Document conversion failed: Access is denied.
    processingÄ[1]

     

    Tuesday, July 13, 2010 3:43 PM
  • Hello Kevin,

    This looks like to be an issue specifically with some documents. You may want to take a procmon while you do the docpush and see if you get some information . As I had mentioned if you identify the issue to be specifically with a type of document or as you had mentioned earlier  like just with German documents this would need further analysis and repro at our end.

    Wednesday, July 14, 2010 4:41 PM
  • Hello Robjay,

    This looks like to be different issue then what Kevin is talking about . Issue seems to be with specific documents . 

    From bin folder you may want to run

    pdftotext -raw <input> <output> for instance  . This should get you a more specific error message.

    Wednesday, July 14, 2010 4:58 PM
  • Hi Manas,

    The issue has been solved by resolving permissions issues on my %FASTSEARCH% root folder.

    • Marked as answer by Kevin-SP2010 Friday, July 16, 2010 5:49 PM
    Friday, July 16, 2010 5:49 PM
  • Hi Manas,

    I am now gettng desperate, I have in fact found out that .doc .xls .pdf files are also not converitng and I am having the same error code.

    WARNING Document conversion failed:  (warning code 0)

    I have executed the pdftotext command and the output looked reasonable, it was the text contained within the PDF

    also ifilter2html works on the word document which produces text that is within the .doc file

    DOC_WARNINGS is full of non ascii names as shown below 

    00b02090c4be6a847a6b0f3eb95d7dd5 70 ó^`ª&#0;&#0;&#0;&#0;&#0;&#0;&#0;&#0;&#0;&#0;Document conversion failed: &#0;&#0;&#0;

    processing&#0;&#0;&#0;&#0;&#0;&#0;&#0;&#0;&#0;&#0;

     

    Wednesday, July 28, 2010 7:10 AM
  • New update.

    Word, PDF and as far as I can see most documents are not being indexed full text and coming up with this error. They are being indexed on file name and metadata only.

    The good news is that I have rebuilt a personal VM farm with FAST and it in fact does work. Where does this leave me..

    1. I know that it in fact works on my test VM domain, FAST server and Moss2010 server.

    2. I don't know why its falling over in production.

     

    So I am still stuck.

    Thursday, August 12, 2010 3:09 PM
  • Solved for me also, I reapplied permissions to %FASTSEARCH% root folder.

    Thursday, August 26, 2010 7:10 AM
  • kevin,

    What permission issues did you resolve.

     

    Thanks


    Lalit Joshi
    Tuesday, September 27, 2011 6:28 PM