locked
Advanced column matching between two tables (pattern) RRS feed

  • Question

  • I have one table containing a number of columns (attributes) related to a service, and to determine the service I need to match these attributes with a "mastertable" to determine the service

    I need to be able to use

    "XX*" = startswith XX

    * = Any value

    *XX* = Contains XX

    *XX = Ends with XX

    XX = Equal to

    I am thinking also to weight how relevant each match is by adding a value between 0-1, where 0 is no match and 1 is full match

    I realise that this is very ML like and could probably be done with som ML service but I dont know how...

    Example of my tables below.

    Anyone got an idea?

    Tuesday, November 27, 2018 3:37 PM

Answers

  • Power Query does not support wildcard matches, but in this case you can create a custom function that evaluates the wildcards to equivalent Power functions.

    For example, the following is a custom function called Text_WildcardMatch.

    (inputText as text, textToMatch as text) as logical =>
    if Text.StartsWith(textToMatch, "*") and Text.EndsWith(textToMatch, "*") then Text.Contains(inputText, Text.Remove(textToMatch, "*"))
    else if Text.StartsWith(textToMatch, "*") then Text.EndsWith(inputText, Text.Remove(textToMatch, "*"))
    else if Text.EndsWith(textToMatch, "*") then Text.StartsWith(inputText, Text.Remove(textToMatch, "*")) 
    else if textToMatch = "*" then true
    else if textToMatch = inputText then true
    else false

    Now you can use this function to match rows in the Data table with rows in the Master table.

    let
        Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
        ChangedType = Table.TransformColumnTypes(Source,{{"Assetname", type text}, {"Operating System", type text}, {"ClientAssetTypeUser", type text}}),
        AddedTable = 
            Table.AddColumn(
                ChangedType,
                "Column1",
                (i) => Table.SelectRows(
                           MasterTable, 
                           (j) => Text_WildcardMatch(i[Assetname], j[Assetname]) and
                                  Text_WildcardMatch(i[Operating System], j[Operating System]) and
                                  Text_WildcardMatch(i[ClientAssetTypeUser], j[ClientAssetTypeUser])
                       )
            ),
        ExpandedColumn1 = Table.ExpandTableColumn(AddedTable, "Column1", {"Service"})
    in
        ExpandedColumn1
    As far as adding a match weight to the Master table, the purpose of such a column is unclear, since it does participate in influencing how the match is performed (like is done in weighted index fuzzy lookups, for example).

    In your example, does the 0,8 weight or the 1 weight take into account all three columns that must be matched? How do you determine that a column value with no wildcard will result in an exact match, or will not match at all? On the other hand, please let me know if I have missed something fundamental with this weight stuff.

    • Marked as answer by Wernerheim Thursday, November 29, 2018 6:27 AM
    Wednesday, November 28, 2018 1:33 PM

All replies

  • Power Query does not support wildcard matches, but in this case you can create a custom function that evaluates the wildcards to equivalent Power functions.

    For example, the following is a custom function called Text_WildcardMatch.

    (inputText as text, textToMatch as text) as logical =>
    if Text.StartsWith(textToMatch, "*") and Text.EndsWith(textToMatch, "*") then Text.Contains(inputText, Text.Remove(textToMatch, "*"))
    else if Text.StartsWith(textToMatch, "*") then Text.EndsWith(inputText, Text.Remove(textToMatch, "*"))
    else if Text.EndsWith(textToMatch, "*") then Text.StartsWith(inputText, Text.Remove(textToMatch, "*")) 
    else if textToMatch = "*" then true
    else if textToMatch = inputText then true
    else false

    Now you can use this function to match rows in the Data table with rows in the Master table.

    let
        Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
        ChangedType = Table.TransformColumnTypes(Source,{{"Assetname", type text}, {"Operating System", type text}, {"ClientAssetTypeUser", type text}}),
        AddedTable = 
            Table.AddColumn(
                ChangedType,
                "Column1",
                (i) => Table.SelectRows(
                           MasterTable, 
                           (j) => Text_WildcardMatch(i[Assetname], j[Assetname]) and
                                  Text_WildcardMatch(i[Operating System], j[Operating System]) and
                                  Text_WildcardMatch(i[ClientAssetTypeUser], j[ClientAssetTypeUser])
                       )
            ),
        ExpandedColumn1 = Table.ExpandTableColumn(AddedTable, "Column1", {"Service"})
    in
        ExpandedColumn1
    As far as adding a match weight to the Master table, the purpose of such a column is unclear, since it does participate in influencing how the match is performed (like is done in weighted index fuzzy lookups, for example).

    In your example, does the 0,8 weight or the 1 weight take into account all three columns that must be matched? How do you determine that a column value with no wildcard will result in an exact match, or will not match at all? On the other hand, please let me know if I have missed something fundamental with this weight stuff.

    • Marked as answer by Wernerheim Thursday, November 29, 2018 6:27 AM
    Wednesday, November 28, 2018 1:33 PM
  • Thanks a lot! Nice solution!

    I realize I didn´t think the Match/Probability through that much.... If I am going to use it I would need to put a rank for each field/column. And a '0' would mean "definetly no match" and override other ranks. I'll skip that for now and use your nice solution as a start :)

    In paralleil I have set up a ML algoritm in Azure. Quite easy actually, but need a bit of work to evaluate the result...

    Thursday, November 29, 2018 6:33 AM