Answered by:
Fuzzy Matching - Scores

Question
-
Hi there,
After setting up the threshold to perform a fuzzy matching merge between two data sets, is it possible to get the individual similarity scores per match? Fuzzy Lookup for excel does something similar and is very useful for my current scenario.
I. E
Threshold: 0.80 Left Table Right Table Similarity Score Umbrell@ Umbrella 0.95 Umbrellaaa Umbrella 0.82 Don't see any option in Power Query (Fuzzy Merge) to get this info.
Many thanks,
Joaquin
Answers
-
Hi,
This info is currently not available but will be available soon in Power Query. We are planning to make this option available by October 2020.
Thanks,
Srinidhi- Proposed as answer by Lz._ Wednesday, May 6, 2020 4:32 AM
- Marked as answer by Imke FeldmannMVP Sunday, June 14, 2020 6:02 AM
-
Hi, Jqarroyo.
You want something like this?
let
fn = (text1 as text, text2 as text) as number =>
let
text1 = Text.Upper(text1),
text2 = Text.Upper(text2),
matching_chars = List.Count(List.Intersect({Text.ToList(text1), Text.ToList(text2)})),
average_length = (Text.Length(text1) + Text.Length(text2)) / 2,
coef = matching_chars / average_length
in
coef,
Source = Table.FromRows(
{{"Umbrell@", "Umbrella"},
{"Umbrellaaa", "Umbrella"}},
type table [Left_Table = text, Right_Table = text] ),
AddedCustom = Table.AddColumn(Source, "Similarity score", each fn( [Left_Table], [Right_Table] ), Number.Type)
in
AddedCustom
- Proposed as answer by Aleksey Semiletov Thursday, May 7, 2020 7:24 AM
- Marked as answer by Imke FeldmannMVP Sunday, June 14, 2020 6:02 AM
All replies
-
Hi,
This info is currently not available but will be available soon in Power Query. We are planning to make this option available by October 2020.
Thanks,
Srinidhi- Proposed as answer by Lz._ Wednesday, May 6, 2020 4:32 AM
- Marked as answer by Imke FeldmannMVP Sunday, June 14, 2020 6:02 AM
-
Hi, Jqarroyo.
You want something like this?
let
fn = (text1 as text, text2 as text) as number =>
let
text1 = Text.Upper(text1),
text2 = Text.Upper(text2),
matching_chars = List.Count(List.Intersect({Text.ToList(text1), Text.ToList(text2)})),
average_length = (Text.Length(text1) + Text.Length(text2)) / 2,
coef = matching_chars / average_length
in
coef,
Source = Table.FromRows(
{{"Umbrell@", "Umbrella"},
{"Umbrellaaa", "Umbrella"}},
type table [Left_Table = text, Right_Table = text] ),
AddedCustom = Table.AddColumn(Source, "Similarity score", each fn( [Left_Table], [Right_Table] ), Number.Type)
in
AddedCustom
- Proposed as answer by Aleksey Semiletov Thursday, May 7, 2020 7:24 AM
- Marked as answer by Imke FeldmannMVP Sunday, June 14, 2020 6:02 AM
-
Hi Srinidhi,
Was this released in October? I spotted something in the "table fuzzyjoin" documention
https://docs.microsoft.com/en-us/powerquery-m/table-fuzzyjoin
SimilarityColumnName , will that do it?
Any help on how to implement it would be appreciated!
Thanks
Craig
- Edited by UnifyMitch Friday, November 20, 2020 4:09 AM
-
Yes, the SimilarityColumnName option would allow you to get the similarity scores. Using this option is quite simple.
Earlier you would do something like:
Table.FuzzyJoin(Table1, {"Key1"}, Table2, {"Key2"}, JoinKind.LeftOuter, [IgnoreCase=true, Threshold = 0.8]).You would now need to do:
Table.FuzzyJoin(Table1, {"Key1"}, Table2, {"Key2"}, JoinKind.LeftOuter, [IgnoreCase=true, SimilarityColumnName="Similarity scores", Threshold = 0.8]). With this, you should see a new column being added that has the similarity scores for every match.Thanks,
Srinidhi