none
Customize character normalization for german umlauts RRS feed

  • Question

  • Hello,

    we are using Fast4Sharepoint 2010 and want to customize the character normalization. FAST standard normalisation ist from ä -> a, but for german users ä should be mapped to ae. The same or the other umlauts. Where can I set this custom mappings?

    Thanks for your help.

     Lutz

    Wednesday, April 4, 2012 7:18 AM

All replies

  • Hi,

    Basically you cannot, but if you get your hands on some FAST ESP documentation you will probably be able to do it. The file to start looking at is C:\FASTSearch\etc\tokenizer\tokenization.xml.

    The other option without meddling with internal config files we are not supposed to touch is to create a custom extensibility stage where you do the replacement yourself, and output the data to a new crawled property which is also searchable. This way you will get hits for both forms of the normalization.

    Regards,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/

    Wednesday, April 4, 2012 7:41 AM
  •  Hi Mikael,

    thanks for your fast response. On stack Overflow a found an example for some tokenization.xml customizing:

    <normalizationlist name="German to Norwegian">
       <normalization description="German u with diaeresis, to Norwegian u">
          <input>x75</input> 
          <output>xFC</output> 
          <output>x75</output>
       </normalization>
      </normalizationlist>

    But when I understand you correctly, this is not an MS supported way to do customization?

    thanks,

     Lutz

    Wednesday, April 4, 2012 10:57 AM
  • Hi,

    you are correct that this is unsupported, but if it solves your issue go for it. As long as you document the change and make sure that the file is not being overwritten with service packs or updates.

    Regards,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/

    Wednesday, April 4, 2012 11:56 AM