none
BCS, binary BLOB and ANSI text files RRS feed

  • Question

  • Hi Guys,

    I ran into the strangest thing:

    I have a BCS model which parses binary data from a column, using a StreamAccessor method to do so.

    Everything is okay, as long as the files are supported by FAST/SharePoint, except for .txt files which are encoded in ANSI. All other files are okay and all other encodes are fine (UTF8 for example).

    An entry to the index is made, but with no crawled properties, neither those of the binary column nor other columns, suggesting the failure during indexing.

    The table is located on a MS SQL 2008 R2 server.

    When crawling the same file with as a "file share" content source, it works.

    The only idea I can think of, that it somehow relates to IFilters, suggesting that when using a BCS and a file share crawl it doesn't use the same IFilters. Maybe the BCS is using IFilters installed on the SharePoint server and a "file share" is using IFilters installed in the FAST server.

    Any ideas?


    Thursday, August 30, 2012 12:51 PM

Answers

  • Well, after much effort we found out the problem, or at least I think we did:

    The test files which my client tried to index were too short, contained only a word or two.

    Checking the indexing process with the spy files, I found out that due to the shortness of the file, FAST could not identify the language and thus not process the text file correctly.

    According to FAST documentations, the language recognition algorithm is a statistical one which require a minimum amount of text to work.

    It had nothing to do with encoding and/or BCS.

    Feel free to ignore this post.

    Amir


    • Marked as answer by Amir at eWave Thursday, September 6, 2012 11:25 AM
    Thursday, September 6, 2012 11:25 AM

All replies

  • Well, after much effort we found out the problem, or at least I think we did:

    The test files which my client tried to index were too short, contained only a word or two.

    Checking the indexing process with the spy files, I found out that due to the shortness of the file, FAST could not identify the language and thus not process the text file correctly.

    According to FAST documentations, the language recognition algorithm is a statistical one which require a minimum amount of text to work.

    It had nothing to do with encoding and/or BCS.

    Feel free to ignore this post.

    Amir


    • Marked as answer by Amir at eWave Thursday, September 6, 2012 11:25 AM
    Thursday, September 6, 2012 11:25 AM
  • I am trying to crawl the BLOB content which is a binary column. After crawling i am getting all the fields other than the binary field so am unable to map and not getting the Fast search results when i search with the contents


    Ananth

    Friday, December 7, 2012 3:44 AM