none
Fuzzy Matching - Scores RRS feed

  • Question

  • Hi there,

    After setting up the threshold to perform a fuzzy matching merge between two data sets, is it possible to get the individual similarity scores per match? Fuzzy Lookup for excel does something similar and is very useful for my current scenario. 

    I. E 

    Threshold: 0.80
    Left Table Right Table Similarity Score
    Umbrell@ Umbrella 0.95
    Umbrellaaa Umbrella 0.82

    Don't see any option in Power Query (Fuzzy Merge) to get this info. 

    Many thanks, 

    Joaquin

    Monday, May 4, 2020 9:30 AM

Answers

  • Hi,

    This info is currently not available but will be available soon in Power Query. We are planning to make this option available by October 2020.

    Thanks,
    Srinidhi

    • Proposed as answer by Lz._ Wednesday, May 6, 2020 4:32 AM
    • Marked as answer by Imke FeldmannMVP Sunday, June 14, 2020 6:02 AM
    Tuesday, May 5, 2020 9:20 PM
  • Hi, Jqarroyo.

    You want something like this?

    let
        fn = (text1 as text, text2 as text) as number =>
            let
                text1 = Text.Upper(text1),
                text2 = Text.Upper(text2),
                matching_chars = List.Count(List.Intersect({Text.ToList(text1), Text.ToList(text2)})),
                average_length = (Text.Length(text1) + Text.Length(text2)) / 2,
                coef =  matching_chars / average_length
            in
                coef,

        Source = Table.FromRows(
           {{"Umbrell@",   "Umbrella"},
            {"Umbrellaaa", "Umbrella"}},
            type table [Left_Table = text, Right_Table = text] ),
        AddedCustom = Table.AddColumn(Source, "Similarity score", each fn( [Left_Table], [Right_Table] ), Number.Type)
    in
        AddedCustom


    Wednesday, May 6, 2020 8:40 AM

All replies

  • Hi,

    This info is currently not available but will be available soon in Power Query. We are planning to make this option available by October 2020.

    Thanks,
    Srinidhi

    • Proposed as answer by Lz._ Wednesday, May 6, 2020 4:32 AM
    • Marked as answer by Imke FeldmannMVP Sunday, June 14, 2020 6:02 AM
    Tuesday, May 5, 2020 9:20 PM
  • Hi, Jqarroyo.

    You want something like this?

    let
        fn = (text1 as text, text2 as text) as number =>
            let
                text1 = Text.Upper(text1),
                text2 = Text.Upper(text2),
                matching_chars = List.Count(List.Intersect({Text.ToList(text1), Text.ToList(text2)})),
                average_length = (Text.Length(text1) + Text.Length(text2)) / 2,
                coef =  matching_chars / average_length
            in
                coef,

        Source = Table.FromRows(
           {{"Umbrell@",   "Umbrella"},
            {"Umbrellaaa", "Umbrella"}},
            type table [Left_Table = text, Right_Table = text] ),
        AddedCustom = Table.AddColumn(Source, "Similarity score", each fn( [Left_Table], [Right_Table] ), Number.Type)
    in
        AddedCustom


    Wednesday, May 6, 2020 8:40 AM
  • Hi Srinidhi, 

    Was this released in October? I spotted something in the "table fuzzyjoin" documention

    https://docs.microsoft.com/en-us/powerquery-m/table-fuzzyjoin

    SimilarityColumnName  , will that do it?

    Any help on how to implement it would be appreciated!

    Thanks

    Craig



    • Edited by UnifyMitch Friday, November 20, 2020 4:09 AM
    Friday, November 20, 2020 4:00 AM
  • Yes, the SimilarityColumnName option would allow you to get the similarity scores. Using this option is quite simple.

    Earlier you would do something like:
    Table.FuzzyJoin(Table1, {"Key1"}, Table2, {"Key2"}, JoinKind.LeftOuter, [IgnoreCase=true, Threshold = 0.8]).

    You would now need to do:
    Table.FuzzyJoin(Table1, {"Key1"}, Table2, {"Key2"}, JoinKind.LeftOuter, [IgnoreCase=true, SimilarityColumnName="Similarity scores", Threshold = 0.8]). With this, you should see a new column being added that has the similarity scores for every match.

    Thanks,
    Srinidhi

    Friday, November 20, 2020 8:57 PM