locked
FAST ESP lemmatization RRS feed

  • Question

  • Searching for 

          Assembly (10638 results)

    • Seems to return docs with 'assemblies' too
    • Returns docs querying 'asm'  (per my synonym list)

          Assemblies (9835)

    • Seems to return docs with 'assembly'
    • Does not return doc with 'asm'

    Similar another example- 

    Search for “Listen”

          List (8296)

          Listen (514)

          Listener (388)

          Listeners (379)

          Listening (271)

    What setting should I be worried about in this case? 

    Thanks in advance

    Friday, February 10, 2012 7:03 PM

All replies

  • Regarding the first query, the results seem to make sense, as you stated yourself, the first includes the synonyms.  You could add the "asm" synonym also to "assemblies" to make the results more equal?

    On the second part, "listen" is a verb, and by default they are not in the default lemmatizer dictionaries, the default for English is NA (Nouns, Adjectives).  The lemmatization in ESP is not based on stemming.  Check the ESP Advanced Linguistics Guide for more details: 

    The default level is normalization of nouns and adjectives (NA).

    There are more dictionary types available (depending on language).  But you are listing words of different types (e.g. nouns, verbs) so they will always have different recall.

    Thursday, February 23, 2012 10:08 PM