none
FAST for SharePoint 2010 - Searching for escape characters returns incorrect results RRS feed

  • Question

  • We are using FAST for SharePoint 2010 to perform specific field searches.
    When the search term contains escape characters, the results seem to indicate that these escape characters are removed before the search is executed.
    So, when I search for "-1" I get results back that are both "-1" and "1" in my resultset.

    I am using the KeyWordQuery object with EnableFQL=true.
    When I search for items that equal "-1" I get results that where the value equals "1".
    This is also the same for searches for items equalling "+1".

    My FQL is pretty simple, eg "fieldname:equals("-1")

    Is there a way to pass through escape chars to the FAST search engine?

    Thanks in advance.

    Alex

    Friday, December 16, 2011 4:40 AM

All replies

  • All special characters are ignored by the fast while indexing the content.
    Sriram S
    Friday, December 16, 2011 6:40 AM
  • Hi,

    If you want to search for words with non-letters/digits you can create a custom extensibility stage and replace those characters by a word sequence. Then you can implement the same rewrite logic for your queries.

    As you cannot modify the body of the content you have to create a copy of the text with modified words. Depending on the scenario you have for multiple word searching or phrases, you can get away with storing only the modified words and not the whole text. This will save index space.

    Example:

    I like to program in C#.


    Here you can substitute '#' for 'sharp' -> Csharp. Syntax highlighting will not work.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Friday, December 16, 2011 9:03 PM
  • Thanks for the info.

    We are just using expression based searching and not pure content searches.
    In the resultset I can confirm that "-1" is present, so the data I am looking for is in the Index.
    As well as this, I am able to searchand resturn data for strings like "BF1:BA1" which contain special chars.

    So, basically, I was wondering what "special" chars cannot be searched for, and then the process on how to search for them.
    I was hoping for a replacement system like "\-1" or the like so reserved words can be submitted for searching.

    The custom extensibility stage approach would be a last resort for us as we want to keep customisations to a minimum.

    Thanks again,

    Alex

    Sunday, December 18, 2011 9:31 AM
  • The technical term for the functionality to "blame" here is tokenization. All content that is indexed, as well as the queries that are sent into the index are tokenized, i.e. stripped from characters such as punctuation, hyphens, etc. Basically everything that is not alpha-numerical is removed. This is to ensure good recall of your search queries.

    Although it is possible to change the list of characters that are stripped (from memory, might be wrong: <FASTSEARCH>\etc\tokenization.xml), this is neither supported nor very helpful if you want to keep customization to a minimum.

    Id say Mikael's approach is what you want.

    Cheers,

    Marcus


    Marcus Johansson | Search Nerd | comperiosearch.com | linkedin.com/in/marcusjohansson
    Sunday, December 18, 2011 10:00 PM
  • Thanks for all the info here...

    I have performed some further tests and it seems that the "tokenization" is not consistent.

    I have 2 search indexes now, a FAST and a Standard Office Search Index.
    Here are my test results:

    I have 3 Items in the index;
    <FieldName>="-1"
    <FieldName>="1"
    <FieldName>="1"
    FAST Index FQL Search: <FieldName>:equals("-1") returns 3 results - Incorrect
    FAST Index Keyword Search: <FieldName>:"-1" returns 3 results - Incorrect
    SharePojnt Index Keyword Search: <FieldName>:"-1" returns 1 result - Correct

    When I update the "-1" to "1-1" the results are different...
    <FieldName>="1-1"
    <FieldName>="1"
    <FieldName>="1"
    FAST Index FQL Search: <FieldName>:equals("1-1") returns 1 result - Correct
    FAST Index Keyword Search: <FieldName>:"1-1" returns 1 results - Correct
    SharePojnt Index Keyword Search: <FieldName>:"1-1" returns 1 result - Correct

    So, from this, there seems to be a bug in the way the search is sent to the index.
    If the search criteria starts with "-" then the first char is not sent thru as a search parameter.
    If the search criteria contains a "-" in the middle then it is sent thru as is and the search works as expected.

    It seems strange that SharePoint Server Search operates in a more consistent manner that FAST does.

    Not sure where to go now other than to change the way we store data which means that our data is no longer search independent...

    Alex

    Monday, December 19, 2011 12:05 AM
  • Hi Alex,

    It might be inconsistent, but it won't solve your case :)

    You can spend time trying a bunch of scenarios to figure what works and what won't, or spend the time doing the work-around and move on.

    As for SP search being more consistent only mean they have a different engine which works on that particular case. Special characters are tokenized away and that's how we as search programmer should think about it (even though it might work on some special cases). Saves a lot of headache.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Monday, December 19, 2011 7:05 PM
  • Thanks for the reply.

    I have moved on and accepted that FAST will tokenise the query before sending to be searched.
    I will treat these as data issues and try and catch them before being stored in our sites...
    I know this isnt the best, but since altering the tokenization for FAST is not really supported I have no other choice.

    Do you know if the tokenization replacement is documented?
    I found this but it seems difficult to read and doesnt explicitly explain the use cases.
    http://msdn.microsoft.com/en-us/library/ee626405(v=office.12).aspx
    It would be nice if MS stated something like "Your query will be tokenized during the search and characters will be replaced in the following scenarios..."

    Again, thanks for all your help,

    Alex

    Monday, December 19, 2011 10:01 PM