none
Fuzzy Matching in SSIS RRS feed

  • Question

  • Hi there!

    Im trying to do a 'fuzzy logic lookup' if that makes any sense?

    I have two tables, 'Actuals' and 'RequiresFuzzy'

    Actuals looks like this:

    Model                                                                                                                             ID                                                                                                                                                                                                                                                                                                                                                                                                                                                             
    FORD FOCUS HATCHBACK 1.6 Zetec 5dr 2002                                                                   53265
    RENAULT MEGANE SCENIC ESTATE 1.6 16V Privilege Monaco 5dr 2001                                86954
    VAUXHALL VECTRA HATCHBACK 1.8 LS 5dr                                                                       67674          

    Requires Fuzzy looks like this:                                            

    Model                                                                                                                             ID                                                                                                                                                                                                                      

     ABARTH 500 HATCHBACK 1.4 16V T-Jet 3dr 2012                                                            NULL                                                        
     ABARTH GRANDE PUNTO HATCHBACK 1.4 Turbo 3dr 2008                                                NULL

    Theres about 90,000 records in Actuals and 150,000 in RequiresFuzzy

    I need to do a fuzzy lookup on against the two tables and return the most likely ID into the RequiresFuzzy.ID field.

    I notice there is are two data flow tasks in SSIS called Fuzzy Grouping and Fuzzy Lookup.

    Has anyone used these before? or know where I can get an idiiots walkthru?

    Cheers,

    Zoe

                                                                      
                                                                                                                                                                                                                                        

    Tuesday, May 29, 2012 2:18 PM

Answers

  • Fuzzy Lookup is a possible solution, you would need to accurately set it up, I suggest you even use several (differently set) to compare the data quality returned.

    And see a walk-through: http://www.youtube.com/watch?v=-yGe88Q6Zk0


    Arthur My Blog

    • Marked as answer by Zoe.Ohara Wednesday, May 30, 2012 11:45 AM
    Tuesday, May 29, 2012 2:28 PM
    Moderator

All replies