none
Name matching - Fuzzy RRS feed

  • Question

  • Hi,

    My client would like to do name matching for combining multiple sources of data or to prevent duplication using SSIS.

    What methods can I use to perform fuzzy name matching in SSIS from different sources?

    Like "o'brian" vs "obrian"

    how to handle "Willams" vs "Bill"
    Rob, Bob, etc...

    It would be helpful if you could provide a quick description / materials related to it on what to do.

    Thanks

    Thursday, August 11, 2011 10:11 PM

Answers

  • Thanks Muqadder. I was able to get thru similarities and confidence, etc.. however how could i handle Willams vs Bill & Robert vs Bob scenarios?

    I don't think it will match Bill to Willams or Bob to Roberts. Those names differ too much. But if you compare more columns like lastname, residence, etc. it will match. Just make the firstname column less important.
    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
    • Marked as answer by Eileen Zhao Thursday, August 25, 2011 8:39 AM
    Friday, August 12, 2011 6:18 AM
    Moderator

All replies

  • There are two in-built components in SSIS that support fuzzy algorithms. FuzzyLookup component is simpler to configure and use based on the match percentage between "similar" values which is configurable. It can be used for most of the scenarios where a fuzzy match is needed (you need to tweak the match probability percentage till you get results you need and meet your requirements threshold). FuzzyGrouping transform is another transform in the SSIS toolbox you could use but needs more thorough understanding of the fuzzy algorithms themselves.

    You can read about how to configure FuzzyGrouping here : http://www.bimonkey.com/2009/11/the-fuzzy-grouping-transformation/ and FuzzyLogic here: http://www.bimonkey.com/2009/06/the-fuzzy-lookup-transformation/

    I like the two posts since they elaborate the conceot and usage via examples.

     

    Hope this helps!

     

    Cheers!!

    Muqadder.

     

     

    Thursday, August 11, 2011 10:22 PM
  • Thanks Muqadder. I was able to get thru similarities and confidence, etc.. however how could i handle Willams vs Bill & Robert vs Bob scenarios?
    Friday, August 12, 2011 12:51 AM
  • Thanks Muqadder. I was able to get thru similarities and confidence, etc.. however how could i handle Willams vs Bill & Robert vs Bob scenarios?

    I don't think it will match Bill to Willams or Bob to Roberts. Those names differ too much. But if you compare more columns like lastname, residence, etc. it will match. Just make the firstname column less important.
    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com
    • Marked as answer by Eileen Zhao Thursday, August 25, 2011 8:39 AM
    Friday, August 12, 2011 6:18 AM
    Moderator