none
Error while crawling XML file RRS feed

  • Question

  • Hi,

     

    Following steps were followed while crawling an xml file

     

    1.  I have added IFliter rule in user_converter_rules.xml :

         <IFilter>

            <trust>

              <ext name=".xml" mimetype="text/xml" />

           </trust>

       </IFilter>

    2. Enabled the XML mapper in pipelineconfig.xml

     

    3. Created an XMLMapper.XML file in D:\FASTSearch\etc\config_data\DocumentProcessor with following entry

    <XMLPropertiesCreator>

      <propset>d6ee4933-09c4-46e3-a5e4-b3787cb4a090</propset>

      <type>31</type>

      <XMLMappings>

        <Mapping attr="body" path="//Title"/>

      </XMLMappings>

    </XMLPropertiesCreator>

     

     

    4. After each steps I did PSCTRL Reset

     

    5. My XML file Format is as:

    <Document>

      <Title>xxx</Title>

      <Title>yyy</Title> 

    </Document>

     

    But I am getting following error.

     

    The FAST Search backend reported warnings when processing the item. ( Document conversion failed: )

    Can anyone suggest me what could be the problem and how to solve this issue.

    Thanks,

    Saranya

    Thursday, November 10, 2011 8:14 AM

All replies

  • Hi Saranya,

    Do not enable XMLMapper in pipelineconfig.xml. You need to enable XMLMapper to "Yes" in optionalprocessing.xml file.

    C:\FASTSearch\etc\config_data\DocumentProcessor\optionalprocessing.xml

    Hope this helps!

    Regards,

    Ajay Shivarathri

     

    Friday, November 11, 2011 7:38 AM
  • Hi,

    Me too having the same issue but Ajay I did the same thing which you suggested but still same error:

    The FAST Search backend reported warnings when processing the item. ( Document conversion failed: )

    Regards,

    -R-A-M-

    Monday, November 14, 2011 12:17 PM
  • You can get a more detailed description of the problem by increasing the logging verbosity

     

    PS>psctrl doctrace on

    PS>psctrl debug on

    Then use doclog to examine the logs. You can get a doclog -a after you have crawled a few problematic URIs. Log location  %FASTSEARCH%\var\log\

    The logs verbosity should be turned off after the investigation or else it starts filling up the disks quickly

    Monday, November 14, 2011 4:57 PM
  • Hi,

    One of the way to resolve this issue is  remove  this entry <ext name=".xml" mimetype="text/xml" /> from ifilter value in user_converter_rules.xml If possible try to delete it instead of comment it.

    These approach solved my problem.

    Hope it may solve your issue as well.

    Regards,

    -R-A-M-

     


    • Edited by -R-A-M- Wednesday, November 16, 2011 8:07 AM
    • Proposed as answer by -R-A-M- Wednesday, November 16, 2011 8:08 AM
    Wednesday, November 16, 2011 8:06 AM
  • Check this MSDN Article. 

    The item processing pipeline detects the type of content by analyzing the actual data in the retrieved item. If it contains valid XML, it will be treated as XML and converted by using the XML Mapper. Some XML content may not have valid XML declarations and may contain element names that are frequently used in HTML. In such cases, the crawled XML items might be mistaken for HTML items. One solution to this problem is to bypass format detection for crawled items with ".xml" as the file name extension. You can do that by adding the following conversion rule in the configuration file user_converter_rules.xml.

    <ConverterRules>
       <IFilter>
          <trust>
             <ext name=".xml" mimetype="text/xml" />
          </trust>
       </IFilter>
    </ConverterRules>
    

    for  more information, click this link http://msdn.microsoft.com/en-us/library/ff795813.aspx

     


    Sriram S
    Thursday, November 17, 2011 11:37 AM