none
Help in complex transformation from source table based on criteria table with Power Query RRS feed

  • Question

  • Hello to all,

    Maybe someone could help me with this complex (I think) problem. I have the following 2 input tables.

    TABLE 1   TABLE 2
             
    DESCRIPTION  VALUE    PREFIX  CODE
    ID 0   7235  ABX1
    NAME  JFMSC   3553  POWQ
    TYPE  UHELQ   7459  UWEER
    DFRUL  F4   10012  ABX1
    ADDR 10012002   430  ABX1
    RRUL  P1      
    ADDR 723      
    RRUL  P1      
    ID 2      
    NAME  PLLSJS      
    TYPE  UHELQ      
    DFRUL  P3      
    ID 4      
    NAME  AAAARR      
    TYPE  UHELQ      
    DFRUL  T7      
    ADDR 35531156      
    RRUL  P1      
    ADDR 72358      
    RRUL  P1      
    ADDR 86401      
    RRUL  K9      
    ID 0      
    NAME  PPROOA      
    TYPE  RRHN      
    DFRUL  P1      
    ADDR 43001      
    RRUL  T8      
    ADDR 7459001      
    RRUL  D4      
    ADDR 430457      
    RRUL  W2      
    ADDR 745913      
    RRUL  P1      
    ADDR 74598001      
    RRUL  Y5      

    I´d like to get the an output table like below, that shows the CODE that is THE MOST similar compared with each number of field "ADDR" based on criterias of "TABLE 2". If there are repeated CODEs for each ID, I only want to show once each CODE (unique codes list).

    In summary, I want to Transform the data from Input Table 1 taken criterias based on Table 2 to get an output table like this:

    The value 86401 since is not related with any CODE, then show as it is to know that was not found in Criteria Table (Table 2) 

    ID NAME TYPE DFRUL CODE RRUL
    0 JFMSC UHELQ F4 ABX1 P1
    2 PLLSJS UHELQ P3
    4 AAAARR UHELQ T7 POWQ P1
    ABX1 P1
    86401 K9
    0 PPROOA RRHN P1 ABX1 P1
    UWEER P1


    Thanks in advance for any help.


    • Edited by cgkmal Monday, March 18, 2019 5:54 AM
    Sunday, March 17, 2019 5:01 AM

Answers

  • As Colin said, Table.Split is a new function... This code below doesn;t use this function.

    Still draft.
    PrefTbl the same as previous.
    DataTbl code:

    let
        fxCheck2 = (t as table) =>
            let
                First4 = Table.FirstN(t, 4),
                Custom2 = Table.Skip(t, 4),
                #"Added Index1" = Table.AddIndexColumn(Custom2, "Index", 0, 1),
                #"Integer-Divided Column" = Table.TransformColumns(#"Added Index1", {{"Index", each Number.IntegerDivide(_, 2), Int64.Type}}),
                #"Grouped Rows1" = Table.Group(#"Integer-Divided Column", {"Index"}, {{"tbl", each Table.PromoteHeaders(Table.Transpose(_[[DESCRIPTION], [VALUE]])), type table}}),
                Custom3 = Table.Combine(#"Grouped Rows1"[tbl]),
                Custom1 = Table.FromRows({First4[VALUE] & {Custom3}}, First4[DESCRIPTION] & {"tbl"} ),
                #"Expanded {0}" = Table.ExpandTableColumn(Custom1, "tbl", {"ADDR", "RRUL"}, {"ADDR", "RRUL"})
            in
                #"Expanded {0}",
            
    
        Source = Excel.CurrentWorkbook(){[Name="DataTbl"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"DESCRIPTION", type text}, {"VALUE", type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Indeks", 0, 1),
        #"Added Conditional Column" = Table.AddColumn(#"Added Index", "SplitHere", each if [DESCRIPTION] = "ID" then [Indeks] else null),
        #"Filled Down" = Table.FillDown(#"Added Conditional Column",{"SplitHere"}),
        #"Grouped Rows" = Table.Group(#"Filled Down", {"SplitHere"}, {{"tbl", each fxCheck2(_[[DESCRIPTION],[VALUE]]), type table }}),
        Custom1 = Table.Combine(#"Grouped Rows"[tbl]),
        #"Changed Type1" = Table.TransformColumnTypes(Custom1,{{"ADDR", type text}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type1", "CODE", (x) => if x[ADDR] = null then null else try (try (Table.SelectRows(PrefTbl, each Text.StartsWith(x[ADDR], [PREFIX])){0}[CODE] ) otherwise Table.SelectRows(PrefTbl, each Text.StartsWith([PREFIX], x[ADDR] )){0}[CODE]) otherwise "Not Found", type text     )
    in
        #"Added Custom"

    Thanks, Colin, for your kind words :-)

    • Marked as answer by cgkmal Wednesday, March 20, 2019 4:01 AM
    Wednesday, March 20, 2019 12:28 AM
  • First, I'd like to congratulate Andrey for a perfect score and a clever solution.

    I must confess that I "cheated" a bit with my solution. Being weary of what appeared to be ever changing requirements, I simply tailored my solution to match what turns out to be an incorrect reference output. It's was really my bad for ignoring the requirements, which were consistent, and bowing to the reference output values instead.

    So by changing how the prefix value is selected in the case of duplicates, I believe that I now match the real output :)

    let    
        source1 = Excel.CurrentWorkbook(){[Name="Table9"]}[Content],
        source2 = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
        addedIndex = Table.AddIndexColumn(source1, "Index"),
        replacedIndexValues = Table.ReplaceValue(
            addedIndex,
            each [Index],
            each if [DESCRIPTION] = "ID" then [Index] else null,
            Replacer.ReplaceValue,
            {"Index"}
        ),
        filledDown = Table.FillDown(replacedIndexValues,{"Index"}),
        groupedIndex = Table.Group(
            filledDown,
            {"Index"},
            {"Tables",
              each
              let
                  rowCount = Table.RowCount(Table.SelectRows(_, each [DESCRIPTION] = "ADDR")),
                  repeatedValues = if rowCount = 0 then _ else Table.Repeat(Table.FirstN(_, 4), rowCount) & Table.Skip(_, 4)
              in
                  repeatedValues
            }
        ),
        groupedDescription = Table.TransformColumns(
            groupedIndex,
            {"Tables", (i) => Table.Group(i, {"DESCRIPTION"}, {"Tables", (j) => Table.AddIndexColumn(j, "Index2")})}
        ),
        combinedTables = Table.Combine(Table.Combine(groupedDescription[Tables])[Tables]),
        addedPivotIndex = Table.AddColumn(combinedTables, "PivotIndex", each [Index] + [Index2]),
        pivotedColumn = Table.Pivot(
            addedPivotIndex,
            List.Distinct(addedPivotIndex[DESCRIPTION]), "DESCRIPTION", "VALUE"
        ),
        tableToMerge = Table.Buffer(source2),
        mergedTables = Table.AddColumn(
            pivotedColumn,
            "CODE",
            (i) =>
            Table.SelectRows(
                tableToMerge,
                (j) =>
                try
                    if i[ADDR] = j[PREFIX] then true
                    else if Text.StartsWith(i[ADDR], j[PREFIX]) then true
                    else if Text.StartsWith(j[PREFIX], i[ADDR]) then true
                    else false
                otherwise null
            )
        ),
        removedColumns = Table.RemoveColumns(mergedTables,{"Index", "Index2", "PivotIndex"}),
        transformedRows = Table.FromRecords(
            Table.TransformRows(
                removedColumns,
                (rowOuter) =>
                Record.TransformFields(
                    rowOuter,
                    {"CODE",
                       each
                       if Table.RowCount(rowOuter[CODE]) > 1 then
                           Table.First(Table.SelectRows(rowOuter[CODE], (rowInner) =>  rowInner[PREFIX] >= rowOuter[ADDR]))
                       else if Table.RowCount(rowOuter[CODE]) = 1 then Table.First(rowOuter[CODE])
                       else if Table.RowCount(rowOuter[CODE]) = 0 then
                          if rowOuter[ADDR] <> null then "NOT FOUND"
                          else null
                       else null
                    }
                )
            )
        ),
        codeValues = Table.TransformColumns(transformedRows, {"CODE", each try Record.Field(_, "CODE") otherwise _})
    in
        codeValues

    Pretty sure that it would be an easy mod for Bill also. Remember that his last posted code was still in draft. :)

    • Marked as answer by cgkmal Wednesday, March 27, 2019 8:18 PM
    Wednesday, March 27, 2019 4:48 PM
  • Hi Ger!

    I optimized my initial approach. Please see the code below. The code formatting is more suitable for viewing in Power BI than to see right here.

    Based on your bigsample, it seems that my approach is the slowest among the 3, but it would be interesting to know timing of the full dataset.

    I'll try to play with some 4-th approach to solve the task after you provide the performance of the current solution.

    Just in case - Pivot Table can provide you with the view like your first variant of result table, if you have Excel new enough.

    let
            Source = #"Source data",
        #"Added Index1" = Table.AddIndexColumn(Source, "Index", 0, 1),
    
            #"Added Custom" = Table.AddColumn(
                #"Added Index1", 
                "Main Key", 
                each 
                    if [DESCRIPTION] = "ID" then 
                        [Index]
                    else 
                        null,
            type number),
        #"Added Custom10" = Table.AddColumn(#"Added Custom", "Last notADDR", each if [DESCRIPTION] <> "ADDR" and [DESCRIPTION] <> "RRUL" then [Index] else null),
        #"Filled Down" = Table.FillDown(#"Added Custom10",{"Main Key", "Last notADDR"}),
        #"Added Custom2" = Table.AddColumn(#"Filled Down", "Key", each [Main Key] + (if [DESCRIPTION] = "RRUL" then [Index] - [Last notADDR] -2 else if [DESCRIPTION] = "ADDR" then [Index] - [Last notADDR] - 1 else 0)),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Index", "Main Key", "Last notADDR"}),
        #"Pivoted Column1" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[DESCRIPTION]), "DESCRIPTION", "VALUE"),
        #"Added Custom3" = Table.AddColumn(#"Pivoted Column1", "CODE", each if [ADDR] = null then null else let t = Table.SelectRows(Criterias, (x)=> 
    	    let s=List.Sort({x[PREFIX], [ADDR]}, each Text.Length(_)) in Text.StartsWith(s{1}, s{0})) 
        in if Table.RowCount(t) > 0 then Table.First(Table.Sort(t, (y)=> Number.Abs(Text.Length([ADDR]) - Text.Length(y[CODE]))))[CODE] 
    	   else "Not Found"),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"Key", "ADDR"}),
        #"Filled Down1" = Table.FillDown(#"Removed Columns1",{"ID", "NAME", "TYPE", "DFRUL"})
    in
        #"Filled Down1"

    • Edited by Andrey Minakov Sunday, March 24, 2019 12:02 AM
    • Marked as answer by cgkmal Wednesday, March 27, 2019 8:19 PM
    Saturday, March 23, 2019 10:36 PM

All replies

  • Why does DFRUL = F4 not appear in the output?
    Sunday, March 17, 2019 7:57 PM
  • Why does DFRUL = F4 not appear in the output?
    Hi Colin. It was a typo. I´ve fixed. Thanks
    Monday, March 18, 2019 5:53 AM
  • Hi cgkmal,

    Please try the following code:

    Table2

    let
        Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
        ChangedType = Table.TransformColumnTypes(Source,{{"PREFIX", type text}})
    in
        ChangedType

    and then Table1

    let
        source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        groupedRows = Table.Group(source, {"DESCRIPTION"}, {{"Table", each Table.AddIndexColumn(_,"Index"), type table}}),
        expandedTable = Table.ExpandTableColumn(groupedRows, "Table", {"VALUE", "Index"}),
        pivotedColumn = Table.Pivot(expandedTable, List.Distinct(expandedTable[DESCRIPTION]), "DESCRIPTION", "VALUE"),
        changedType = Table.TransformColumnTypes(pivotedColumn,{{"ADDR", type text}}),
        mergedTables = Table.AddColumn(
            changedType, 
            "CODE1", 
            (i) => Table.SelectRows(
                Table2, 
                (j) => Text.Contains(i[ADDR], j[PREFIX]) or 
                       Text.Contains(j[PREFIX], i[ADDR])
            )
        ),
        expandedColumn = Table.ExpandTableColumn(mergedTables, "CODE1", {"PREFIX", "CODE"}, {"PREFIX", "CODE"}),
        replacedValue = Table.ReplaceValue(
            expandedColumn,
            each [CODE],
            each if [CODE] = null then [ADDR] else [CODE],
            Replacer.ReplaceValue,
            {"CODE"}
        ),
        removedColumns = Table.RemoveColumns(replacedValue,{"Index", "ADDR", "PREFIX"})
    in
        removedColumns

    Monday, March 18, 2019 1:56 PM
  • Hello Colin,

    Thanks for your help. I've tried but I get the error below in this step "expandedColumn"

    Expression.Error: The field 'PREFIX' of the record wasn't found.
    Details:
        &#160;PREFIX=7235
        &#160;CODE=&#160;ABX1

    I'm not sure if I missed to make some previous step relating or merging the tables. I did like this:

    1-) I select table2 and then I load in Power Query going to Menu Data-->From Table.

    2-) I pasted your code for Table2, then I click on close and select "Keep"

    3-)  I select table1 and then I load in Power Query going to Menu Data-->From Table.

    4-) I pasted your code for Table1 and I receive error in step "expandedColumn"

    What I'm dping wrong? I'm using Excel 2016

    Thanks for the help. Regards


    Monday, March 18, 2019 3:47 PM
  • That's odd. I'm using your sample tables and it works fine. What's mysterious is that the merged step works, which uses the PREFIX column for the conditional join, and in the next step the column isn't recognized??? I have no idea what the cause for the error might be.
    Monday, March 18, 2019 5:26 PM
  • I found a leading space in PREFIX and VALUE fields. Now I get the output but is not like the output I'd like to get.

    I'm receiveing this output 

    ID NAME TYPE DFRUL RRUL CODE
    0 JFMSC UHELQ F4 P1 ABX1
    2 PLLSJS UHELQ P3 P1 ABX1
    4 AAAARR UHELQ T7 P1 POWQ
    0 PPROOA RRHN P1 P1 ABX1
    K9 86401
    T8 ABX1
    D4 UWEER
    W2 ABX1
    P1 UWEER
    Y5 UWEER

    and expected output is 

    ID NAME TYPE DFRUL CODE RRUL
    0 JFMSC UHELQ F4 ABX1 P1
    2 PLLSJS UHELQ P3
    4 AAAARR UHELQ T7 POWQ P1
    ABX1 P1
    86401 K9
    0 PPROOA RRHN P1 ABX1 P1
    UWEER P1


    Thanks for help so far

    Monday, March 18, 2019 5:57 PM
  • So add the following step at the end (after removedColumns). Note that you can drag the columns in the Power Query editor to rearrange then.

        reorderedColumns = Table.ReorderColumns(removedColumns,{"ID", "NAME", "TYPE", "DFRUL", "CODE", "RRUL"})
    in
        reorderedColumns

    Monday, March 18, 2019 6:11 PM
  • So add the following step at the end (after removedColumns). Note that you can drag the columns in the Power Query editor to rearrange then.

        reorderedColumns = Table.ReorderColumns(removedColumns,{"ID", "NAME", "TYPE", "DFRUL", "CODE", "RRUL"})
    in
        reorderedColumns

    Hi Colin,

    I'm not taking about the columns order, I mean the rows order and rows content and the codes related with each ID field. For example.

    1-) for ID=2 there is no CODE related because ID=2 block doesn't have any ADDR value that can be related with Table2 CODES.

    2-) ID=4 block has 3 ADDR values. One is related with POWQ code, the other is related with ABX1 code and 86401 is not related with any CODE from Table2, so instead of CODE appears the ADDR value itself that is 86401.

    3-) ID=0 block where TYPE=UHELQ has two values related with code ABX1, so ABX1 appears only once. That row is correct in your output.

    4-)  ID=0 block where TYPE=RRHN has five ADDR values. 2 ADDR values related with ABX1 and 3 related with code UWEER.

    I hope make sense.


    • Edited by cgkmal Monday, March 18, 2019 6:56 PM
    Monday, March 18, 2019 6:53 PM
  • I see. Apologies for not noticing the issue.

    However, there is a major problem. One of the steps we perform in Table1 is pivoting the description column so that the values appear in columns. After pivoting the column, the values in rows depend on the corresponding values in columns before the pivot step. Because we have multiple values for ADDR and RRUL, there is no way to know what corresponding values to put in the rows after the pivot, as thus the pivot operation fails. 

    As an example, the pivoted values for the first ID = 0 would have a row containing ID = 0, NAME = JFMSC, TYPE = UHELQ, DFRUL = F4. But which VALUE for ADDR and RRUL will appear in the corresponding ADDR and RRUL columns?


    Monday, March 18, 2019 8:33 PM
  • Hi Colin, no problem, maybe my explanation wasn't good enough previously.

    To answer your question, in the output desired I'm not considering the ADDR values to appear in table, but if would be easy to get an output having the ADDR values, the output could be changed to a format like this:

    ID NAME TYPE DFRUL ADDR RRUL CODE
    0 JFMSC UHELQ F4 10012002 P1 ABX1
    0 JFMSC UHELQ F4 723 P1 ABX1
    2 PLLSJS UHELQ P3
    4 AAAARR UHELQ T7 35531156 P1 POWQ
    4 AAAARR UHELQ T7 72358 P1 ABX1
    4 AAAARR UHELQ T7 86401 K9 NF
    0 PPROOA RRHN P1 43001 T8 ABX1
    0 PPROOA RRHN P1 7459001 D4 UWEER
    0 PPROOA RRHN P1 430457 W2 ABX1
    0 PPROOA RRHN P1 745913 P1 UWEER
    0 PPROOA RRHN P1 74598001 Y5 UWEER

    Do you think could be easier tho get a table like this? I think could be a previous step of my final desired output.

    Having this table I could do many things using a Pivot Table from this.

    Thanks for the support.



    • Edited by cgkmal Monday, March 18, 2019 10:31 PM
    Monday, March 18, 2019 10:30 PM
  • this is draft only....but should works :-)

    PrefTbl

    let
        Source = Excel.CurrentWorkbook(){[Name="PrefTbl"]}[Content],
        #"Changed Type" = Table.Buffer(Table.TransformColumnTypes(Source,{{"PREFIX", type text}, {"CODE", type text}}))
    in
        #"Changed Type"

    DataTbl

    let
    
        fxCheck = (t as table) =>
            let
                First4 = Table.FirstN(t, 4),
                Custom2 = Table.Skip(t, 4),
                TabOk = Table.Combine(List.Transform(Table.Split(Custom2, 2), each Table.PromoteHeaders(Table.Transpose(_)))),
                Custom1 = Table.FromRows({First4[VALUE] & {TabOk}}, First4[DESCRIPTION] & {"tbl"} ),
                #"Expanded {0}" = Table.ExpandTableColumn(Custom1, "tbl", {"ADDR", "RRUL"}, {"ADDR", "RRUL"})
            in
                #"Expanded {0}",          
    
        Source = Excel.CurrentWorkbook(){[Name="DataTbl"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"DESCRIPTION", type text}, {"VALUE", type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Indeks", 0, 1),
        #"Added Conditional Column" = Table.AddColumn(#"Added Index", "SplitHere", each if [DESCRIPTION] = "ID" then [Indeks] else null),
        #"Filled Down" = Table.FillDown(#"Added Conditional Column",{"SplitHere"}),
        #"Grouped Rows" = Table.Group(#"Filled Down", {"SplitHere"}, {{"tbl", each fxCheck(_[[DESCRIPTION],[VALUE]]), type table }}),
        Custom1 = Table.Combine(#"Grouped Rows"[tbl]),
        #"Changed Type1" = Table.TransformColumnTypes(Custom1,{{"ADDR", type text}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type1", "CODE", (x) => try (try (Table.SelectRows(PrefTbl, each Text.StartsWith(x[ADDR], [PREFIX])){0}[CODE] ) otherwise Table.SelectRows(PrefTbl, each Text.StartsWith([PREFIX], x[ADDR] )){0}[CODE]) otherwise x[ADDR], type text     )
    in
        #"Added Custom"

    Tuesday, March 19, 2019 11:16 AM
  • Hello Bill,

    Thanks for your help.

    The code for PrefTbl if fine, but I get error in step Custom1 in code for DataTbl.

    "Expression.Error: The import Table.Split matches no exports. Did you miss a module reference?"

    When I click on "Go to Error" points to

    = (t as table) =>

    and says

    "Expression.Error: The name 'Table.Split' wasn't recognized.  Make sure it's spelled correctly."

    Does it work for you?

    Best regards


    • Edited by cgkmal Tuesday, March 19, 2019 4:16 PM
    Tuesday, March 19, 2019 4:08 PM
  • Hi cgkmal,

    I'm been a bit busy, so thanks to Bill for stepping in. Table.Split is a fairly new function in Office 365, so it may not exist in your version of Excel. I'll leave it up to Bill to use alternative code.

    The solution I ended up with is more extensive than my previous attempt and Bill's code. I'm posting it anyway, but I'm sure that Bill's amended solution will be much better. I'll repost if I think of a shorter solution.

    Table2 is the same as before, so only Table1 has changed.

    let
        source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        addedIndex = Table.AddIndexColumn(source, "Index", 0, 1),
        replacedIndexValues = Table.ReplaceValue(
            addedIndex, 
            each [Index],
            each if [DESCRIPTION] = "ID" then [Index] else null,
            Replacer.ReplaceValue,
            {"Index"}
        ),
        filledDown = Table.FillDown(replacedIndexValues,{"Index"}),
        ADDRCountPerID = Table.AddColumn(
            filledDown, 
            "Count", (i) => Table.RowCount(
                Table.SelectRows(
                    filledDown, 
                    (j) => i[Index] = j[Index] and j[DESCRIPTION] = "ADDR"
                )
             )
        ),
        groupedIndex = Table.Group(
            ADDRCountPerID, 
            {"Index"}, 
            {"Tables", each if [Count]{0} = 0 then _ else Table.Repeat(Table.FirstN(_, 4), [Count]{0}) & Table.Skip(_, 4), type table}
        ),
        groupedDescription = Table.TransformColumns(
            groupedIndex, 
            {"Tables", (i) => Table.Group(i, {"DESCRIPTION"}, {"Tables", (j) => Table.AddIndexColumn(j, "Index2")})}
        ),
        combinedTables = Table.Combine(Table.Combine(groupedDescription[Tables])[Tables]),
        changedType = Table.TransformColumnTypes(
            combinedTables, 
            {{"VALUE", type text}, {"Index", type text}, {"Index2", type text}}
        ),
        addedPivotIndex = Table.AddColumn(changedType, "PivotIndex", each [Index] & [Index2]),
        pivotedColumn = Table.Pivot(
            addedPivotIndex, 
            List.Distinct(addedPivotIndex[DESCRIPTION]), "DESCRIPTION", "VALUE"
        ),
        mergedTables = Table.AddColumn(
            pivotedColumn, 
            "CODE1", 
            (i) => Table.SelectRows(
                Table2, 
                (j) => try Text.Contains(i[ADDR], j[PREFIX]) or 
                       Text.Contains(j[PREFIX], i[ADDR]) otherwise null
            )
        ),
        expandedColumn = Table.ExpandTableColumn(mergedTables, "CODE1", {"PREFIX", "CODE"}),
        replacedValue = Table.ReplaceValue(
            expandedColumn,
            each [CODE],
            each if [CODE] = null then [ADDR] else [CODE],
            Replacer.ReplaceValue,
            {"CODE"}
        ),
        sortedRows = Table.Sort(replacedValue, {each Number.FromText([Index]), Order.Ascending}),
        removedColumns = Table.RemoveColumns(sortedRows,{"Index", "Count", "Index2", "PivotIndex", "ADDR", "PREFIX"})
    in
        removedColumns

    Tuesday, March 19, 2019 7:33 PM
  • As Colin said, Table.Split is a new function... This code below doesn;t use this function.

    Still draft.
    PrefTbl the same as previous.
    DataTbl code:

    let
        fxCheck2 = (t as table) =>
            let
                First4 = Table.FirstN(t, 4),
                Custom2 = Table.Skip(t, 4),
                #"Added Index1" = Table.AddIndexColumn(Custom2, "Index", 0, 1),
                #"Integer-Divided Column" = Table.TransformColumns(#"Added Index1", {{"Index", each Number.IntegerDivide(_, 2), Int64.Type}}),
                #"Grouped Rows1" = Table.Group(#"Integer-Divided Column", {"Index"}, {{"tbl", each Table.PromoteHeaders(Table.Transpose(_[[DESCRIPTION], [VALUE]])), type table}}),
                Custom3 = Table.Combine(#"Grouped Rows1"[tbl]),
                Custom1 = Table.FromRows({First4[VALUE] & {Custom3}}, First4[DESCRIPTION] & {"tbl"} ),
                #"Expanded {0}" = Table.ExpandTableColumn(Custom1, "tbl", {"ADDR", "RRUL"}, {"ADDR", "RRUL"})
            in
                #"Expanded {0}",
            
    
        Source = Excel.CurrentWorkbook(){[Name="DataTbl"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"DESCRIPTION", type text}, {"VALUE", type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Indeks", 0, 1),
        #"Added Conditional Column" = Table.AddColumn(#"Added Index", "SplitHere", each if [DESCRIPTION] = "ID" then [Indeks] else null),
        #"Filled Down" = Table.FillDown(#"Added Conditional Column",{"SplitHere"}),
        #"Grouped Rows" = Table.Group(#"Filled Down", {"SplitHere"}, {{"tbl", each fxCheck2(_[[DESCRIPTION],[VALUE]]), type table }}),
        Custom1 = Table.Combine(#"Grouped Rows"[tbl]),
        #"Changed Type1" = Table.TransformColumnTypes(Custom1,{{"ADDR", type text}}),
        #"Added Custom" = Table.AddColumn(#"Changed Type1", "CODE", (x) => if x[ADDR] = null then null else try (try (Table.SelectRows(PrefTbl, each Text.StartsWith(x[ADDR], [PREFIX])){0}[CODE] ) otherwise Table.SelectRows(PrefTbl, each Text.StartsWith([PREFIX], x[ADDR] )){0}[CODE]) otherwise "Not Found", type text     )
    in
        #"Added Custom"

    Thanks, Colin, for your kind words :-)

    • Marked as answer by cgkmal Wednesday, March 20, 2019 4:01 AM
    Wednesday, March 20, 2019 12:28 AM
  • Hello Colin and Bill, I hope you're fine!

    Thanks so much for your great help and time.

    I've tried solutions of both of you and this is the result of testing with real size tables.

    Real Table1 has 1844 rows
    Real Table2 has 2061 rows

    The output it seems to be correct with both solutions, only differs some sorting, but that is ok.

    Regarding the performance,

    With Colin's solution happens like this:
    It takes 02:00 min to finish the processing when I execute code for Table1 within "Advance Editor" of Power Query Editor.
    It takes 01:28 min when I do Close & Load

    With Bill's solution happens like this:
    It takes 00:05 seg to finish the processing when I execute code for Table1 within "Advance Editor" of Power Query Editor.
    It takes 00:57 seg when I do Close & Load

    So, to execute the query, Bill's code runs 24 times faster and 1.5 times faster to Close & Load.

    I'm new to Power Query and I see that some of the steps you use can be done without knowing anything of programming, but many other steps have lot of programming knowledge involved, then I'd like your advice please.

    Do you know a good site/resource where I can see examples of the expressions like those you use in your steps?

    For example, steps that I think cannot be done only clicking in Transform options are

    fxCheck2 = (t as table) => (What does this mean?)

    or when you use "each", "if, else", "_", "Index", etc.

    One more question,

    Is possible to define the fields names or some values in "DESCRIPTION" column as variables? this in order to change easier if in Table1 changes some texts, for example?

    let 
    
       // define variables
    	field1 = "DESCRIPTION"
    	field2 = "VALUE"
    	v1="ID"
    	v2="NAME"
    	v3="DFRUL"
    	v4="ADDR"
    	v5="RRUL"
    	
    	Some other code
    	
    	// And then use them like this?
    	#"Expanded {0}" = Table.ExpandTableColumn(Custom1, "tbl", {v4, v5}, {v4, v5})
    	#"Added Conditional Column" = Table.AddColumn(#"Added Index", "SplitHere", each if [field1] = v1 then [Indeks] else null),
            
    in

    Thanks again for you great support.

    Best regards





    • Edited by cgkmal Wednesday, March 20, 2019 4:06 AM
    Wednesday, March 20, 2019 3:56 AM
  • I've tried solutions of both of you and this is the result of testing with real size tables.

    Real Table1 has 1844 rows
    Real Table2 has 2061 rows

    Hey, that's not fair! My solution was only designed to work with your sample data. :)

    With Colin's solution happens like this:
    It takes 02:00 min to finish the processing when I execute code for Table1 within "Advance Editor" of Power Query Editor.
    It takes 01:28 min when I do Close & Load

    That's brutal, not to mention embarrassing. Had I real data to work with, I would never have posted my solution because with those numbers, it's not a practical solution.

    With Bill's solution happens like this:
    It takes 00:05 seg to finish the processing when I execute code for Table1 within "Advance Editor" of Power Query Editor.
    It takes 00:57 seg when I do Close & Load

    So, to execute the query, Bill's code runs 24 times faster and 1.5 times faster to Close & Load.

    Okay, that last sentence was uncalled for. :) I haven't figured out how he does it, but Bill often "cheats" on performance. Remember the Volkswagen emission scandal? You have been forewarned! :)   

    Do you know a good site/resource where I can see examples of the expressions like those you use in your steps?

    The best starting point is the Power Query online documentation. You won't be able to do advanced stuff like in this post by reading the documentation, but it will give you some explanation of how the functions work. Also, in the Power Query Editor, if you enter something like = Table.Group in the function bar, details of function will be returned. Both the documentation and Query Editor assistance are inconsistent though. For many functions their is either no explanation nor examples whatsoever. There are various resources with tutorials to learn Power Query.

    I should state categorically that for a beginner, the solutions presented in this post is not a good place to start. The problem is hard for even advanced users to solve.

    fxCheck2 = (t as table) => (What does this mean?)

    or when you use "each", "if, else", "_", "Index", etc.

    Okay, all of that and more is explained in the Power Query Language Specification, which you can find online.

    Is possible to define the fields names or some values in "DESCRIPTION" column as variables? this in order to change easier if in Table1 changes some texts, for example?

    You can do that for your v1, v2, etc. variables. However anything in square brackets is taken literally. So [field1] will look for a column named "field1" and not "DESCRIPTION". However, [DESCRIPTION] represents the value for DESCRIPTION in the current row, and each row is a record. Therefore, another way to express the same thing is using the function Record.Field. So the equivalent expression would be: each if Record.Field(_, field1) = v1 then [Indeks] else null.

    Wednesday, March 20, 2019 2:54 PM
  • Hey, that's not fair! My solution was only designed to work with your sample data. :)

    That's brutal, not to mention embarrassing. Had I real data to work with, I would never have posted my solution because with those numbers, it's not a practical solution.


    So, to execute the query, Bill's code runs 24 times faster and 1.5 times faster to Close & Load.

    Okay, that last sentence was uncalled for. :) I haven't figured out how he does it, but Bill often "cheats" on performance. Remember the Volkswagen emission scandal? You have been forewarned! :)   

    Do you know a good site/resource where I can see examples of the expressions like those you use in your steps?

    The best starting point is the Power Query online documentation. You won't be able to do advanced stuff like in this post by reading the documentation, but it will give you some explanation of how the functions work. Also, in the Power Query Editor, if you enter something like = Table.Group in the function bar, details of function will be returned. Both the documentation and Query Editor assistance are inconsistent though. For many functions their is either no explanation nor examples whatsoever. There are various resources with tutorials to learn Power Query.

    I should state categorically that for a beginner, the solutions presented in this post is not a good place to start. The problem is hard for even advanced users to solve.

    fxCheck2 = (t as table) => (What does this mean?)

    or when you use "each", "if, else", "_", "Index", etc.

    Okay, all of that and more is explained in the Power Query Language Specification, which you can find online.

    Is possible to define the fields names or some values in "DESCRIPTION" column as variables? this in order to change easier if in Table1 changes some texts, for example?

    You can do that for your v1, v2, etc. variables. However anything in square brackets is taken literally. So [field1] will look for a column named "field1" and not "DESCRIPTION". However, [DESCRIPTION] represents the value for DESCRIPTION in the current row, and each row is a record. Therefore, another way to express the same thing is using the function Record.Field. So the equivalent expression would be: each if Record.Field(_, field1) = v1 then [Indeks] else null.

    Hello Colin,

    I am very grateful with you help, support and time you've used to think in my question.

    It was not my intention to embarrass you hehe, only report what happens. I think it could help yourself to enhanced performance, not for this problem but for future cases in my humble opinion.

    Thanks for answer about where to begin to learn about that misterious functions/code use in these solutions provided that are not so simple than press a button hehe. I think I need to look for that in order to understand the logic you both have used and be able to make modifications if needed for my own.

    Thanks again.

    Best regards

    PS: I'm not aware about Volkswagen emission scandal hehe


    • Edited by cgkmal Wednesday, March 20, 2019 4:32 PM
    Wednesday, March 20, 2019 4:31 PM
  • PS: I'm not aware about Volkswagen emission scandal hehe

    Welcome to our planet :)

    So I decided to work with a much larger data set to see where the bottlenecks were located in the code. I just copied down the original tables, so it's not clear whether there will be a major difference in performance with compared to your real data.

    In the following revision, I included the Table2 source in Table1, removed the replacedValue step and the sort step. If the performance is good, you add back those steps and recheck the performance:

    let
        source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        source2 = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
        changedTypeSource2 = Table.TransformColumnTypes(source2,{{"PREFIX", type text}}),
        addedIndex = Table.AddIndexColumn(source, "Index", 0, 1),
        replacedIndexValues = Table.ReplaceValue(
            addedIndex, 
            each [Index],
            each if [DESCRIPTION] = "ID" then [Index] else null,
            Replacer.ReplaceValue,
            {"Index"}
        ),
        filledDown = Table.FillDown(replacedIndexValues,{"Index"}),
        groupedIndex = Table.Group(
            filledDown, 
            {"Index"}, 
            {"Tables", 
              each let 
                       rowCount = Table.RowCount(Table.SelectRows(_, each [DESCRIPTION] = "ADDR")),
                       repeatedValues = if rowCount = 0 then _ else Table.Repeat(Table.FirstN(_, 4), rowCount) & Table.Skip(_, 4)
                   in
                       repeatedValues
            }
        ),
        groupedDescription = Table.TransformColumns(
            groupedIndex, 
            {"Tables", (i) => Table.Group(i, {"DESCRIPTION"}, {"Tables", (j) => Table.AddIndexColumn(j, "Index2")})}
        ),
        combinedTables = Table.Combine(Table.Combine(groupedDescription[Tables])[Tables]),
        changedType = Table.TransformColumnTypes(
            combinedTables, 
            {{"VALUE", type text}, {"Index", type text}, {"Index2", type text}}
        ),
        addedPivotIndex = Table.AddColumn(changedType, "PivotIndex", each [Index] & [Index2]),
        pivotedColumn = Table.Pivot(
            addedPivotIndex, 
            List.Distinct(addedPivotIndex[DESCRIPTION]), "DESCRIPTION", "VALUE"
        ),
        tableToMerge = Table.Buffer(changedTypeSource2),
        mergedTables = Table.AddColumn(
            pivotedColumn, 
            "CODE1", 
            (i) => Table.SelectRows(
                tableToMerge, 
                (j) => try Text.Contains(i[ADDR], j[PREFIX]) or 
                       Text.Contains(j[PREFIX], i[ADDR]) otherwise null
            )
        ),
        expandedColumn = Table.ExpandTableColumn(mergedTables, "CODE1", {"PREFIX", "CODE"}),
        removedColumns = Table.RemoveColumns(expandedColumn,{"Index", "Index2", "PivotIndex", "ADDR", "PREFIX"})
    in
        removedColumns

    Wednesday, March 20, 2019 7:14 PM
  • PS: I'm not aware about Volkswagen emission scandal hehe

    Welcome to our planet :)

    Hehehe

    I've tried your new code and runs in 2 seconds and tried your first code and stil it takes 2 min.

    But in output table the number of rows should be = to number of ADDR values and with your new code it has more rows than Table2.

    For you to give an idea,

    Real Table1 has 1844 rows and 894 ADDR values

    Real Table2 has 2061 rows

    With your new code the output table has 2801 rows and should be 896 (894+1 blank row + header row)

    It seems your code is printing a number of rows more similar than the rows of Table2.

    Best regards and thanks again for your great help and time!


    • Edited by cgkmal Wednesday, March 20, 2019 10:17 PM
    Wednesday, March 20, 2019 10:17 PM
  • Hmmm, can you check to see whether there are duplicates in the output? Need to create more representative source data somehow.

    Edit: Would it be possible for you to send me your two tables, assuming that it's worth your while to have a 2 sec or so refresh?

    Wednesday, March 20, 2019 10:32 PM
  • Hmmm, can you check to see whether there are duplicates in the output? Need to create more representative source data somehow.

    Edit: Would it be possible for you to send me your two tables, assuming that it's worth your while to have a 2 sec or so refresh?

    Hi Colin,

    Below the link with tables in csv file.

    https : // www dot dropbox dot com /s/03z7do9wacc4o8x/sample.csv?raw=1

    PS: I put like this because I receieved error to share links. You just need to remove spaces and "dot" by "."

    Wednesday, March 20, 2019 11:43 PM
  • Hi Guys!

    Please notice, that both the solutions here doesn't provide the exact result Ger described. They has problems of different importance.

    I have to admit, that the solutions are very graceful, I learned several important lessons from them - thanks a lot, Bill and Colin! They a really worth investigation.

    The solution below is far from ideal from any point of view (I did it with do-it-ASAP newest 21-th century approach in mind :-)), but it provides exactly the same result Ger described.

    The problem with Colin's solution is that it provides just wrong results for 2 PLLSJS row. The solution is the most vulnerable for any "structure" data lost - if any of types of Description are missing - it provides wrong results, like for 2 PLLSJS.

    Bill's solution doesn't provide digital code instead of CODE in case of absent row in Table 2, and doesn't left repeated rows empty. While for ID, Name and Type it may be not so important, filling DFRUL with data where it should be empty may be misleading in some cases. Whether these are critical problems - Ger guess :-).

    And it may be difficult for Ger to use user functions, and not only because it is a new construction for him. He will not see the data in case of any non-standard situation with them. And in real-life it may be much more difficult to find the source of a bug, if you have a lot of functional in PQ user functions and not in steps of Queries.

    I have to stress that all that was saying above not to offend anybody, but for the sake of the user. Again - both the solutions are very nice and it is important to learn the ideas inside them for most of PQ professionals. I'm really sorry for any inconvenience.

    It would be great if Ger would lauch the code below on the real data, and provided the timing. Thanks a lot in advance.

    let
            Source = #"Source data",
            
            #"Added Index1" = Table.AddIndexColumn(Source, "Initial Order", 0, 1),
    
            #"Added Custom" = Table.AddColumn(
                #"Added Index1", 
                "Custom", 
                each 
                    if [DESCRIPTION] = "ID" then 
                        [VALUE] 
                    else 
                        null
            ),
    
            #"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
    
            #"Added Custom2" = Table.AddColumn(
                #"Filled Down", 
                "UniqueID", 
                each 
                    if [DESCRIPTION] = "NAME" then 
                        [Custom] & "_" & [VALUE] 
                    else 
                        if [DESCRIPTION] = "ID" then 
                            "" 
                        else 
                            null
            ),
    
            #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Custom"}),
            #"Filled Down1" = Table.FillDown(#"Removed Columns",{"UniqueID"}),
    
            #"Replaced Value" = Table.ReplaceValue(
                #"Filled Down1",
                "",
                null,
                Replacer.ReplaceValue,
                {"UniqueID"}
            ),
    
            #"Filled Up" = Table.FillUp(#"Replaced Value",{"UniqueID"}),
    
            #"Added Custom9" = Table.AddColumn(
                #"Filled Up", 
                "DFRUL", 
                each 
                    if [DESCRIPTION] = "DFRUL" then 
                        [VALUE] 
                    else 
                        ""
            ),
    
            #"Added Custom4" = Table.AddColumn(
                #"Added Custom9", 
                "RRUL", 
                each 
                    if [DESCRIPTION] = "RRUL" then 
                        [VALUE] 
                    else 
                        if [DFRUL] <> "" then 
                            null 
                        else 
                            ""
            ),
    
            #"Added Custom1" = Table.AddColumn(
                #"Added Custom4", 
                "Custom", 
                each 
                    if [DESCRIPTION] = "ADDR" then 
                        [VALUE] 
                    else 
                        null, 
                type text
            ),
    
            #"Added Custom3" = Table.AddColumn(
                #"Added Custom1", 
                "CODE", 
                each 
                    if [Custom] = null then 
                        if Text.Combine({[Custom], [DFRUL], [RRUL]}) <> "" then 
                            null 
                        else 
                            "" 
                    else 
                        let 
                            t = Table.SelectRows(
                                Criterias, 
                                (x)=> 
                                    let 
                                        s=List.Sort(
                                            {x[PREFIX], [Custom]}, 
                                            each Text.Length(_)
                                        ) 
                                    in 
                                        Text.StartsWith(s{1}, s{0})
                            ) 
                        in 
                            if Table.RowCount(t) > 0 then 
                                Table.SingleRow(t)[CODE] 
                            else 
                                [VALUE]
            ),
            
            #"Added Custom5" = Table.AddColumn(
                #"Added Custom3", 
                "Value", 
                each 
                    if [CODE] = "" then 
                        [VALUE] 
                    else 
                        null
            ),
    
            #"Added Custom6" = Table.AddColumn(
                #"Added Custom5", 
                "Description", 
                each 
                    if [CODE] = "" then 
                        [DESCRIPTION] 
                    else 
                        "ADDR"
            ),
    
            #"Removed Columns2" = Table.RemoveColumns(#"Added Custom6",{"Custom", "VALUE"}),
    
            #"Reordered Columns" = Table.ReorderColumns(
                #"Removed Columns2",
                {"UniqueID", "Description", "Value", "DFRUL", "RRUL", "CODE"}
            ),
    
            #"Duplicated Column" = Table.DuplicateColumn(
                #"Reordered Columns", 
                "CODE", 
                "CODE - Copy"
            ),
    
            #"Filled Down3" = Table.FillDown(#"Duplicated Column",{"CODE"}),
    
            #"Merged Columns1" = Table.CombineColumns(
                #"Filled Down3",
                {"CODE", "CODE - Copy"}, 
                each 
                    List.First(
                        List.Sort(
                            _, 
                            each 
                                if _ = null then 
                                    1 
                                else 
                                    if Text.Length(_) > 0 then 
                                        0 
                                    else 
                                        2
                        )
                    ), 
                    "CODE"
            ),
    
            #"Filled Up2" = Table.FillUp(#"Merged Columns1",{"CODE"}),
            #"Filtered Rows1" = Table.SelectRows(#"Filled Up2", each ([DESCRIPTION] <> "ADDR")),
            #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"DESCRIPTION"}),
            #"Filled Up3" = Table.FillUp(#"Removed Columns1",{"RRUL"}),
            #"Removed Duplicates" = Table.Distinct(
                #"Filled Up3", 
                {"UniqueID", "Value", "RRUL", "CODE"}
            ),
    
            #"Reordered Columns1" = Table.ReorderColumns(
                #"Removed Duplicates",
                {"UniqueID", "Description", "Value", "DFRUL", "CODE", "RRUL"}
            ),
    
            #"Grouped Rows1" = Table.Group(
                #"Reordered Columns1", 
                {"UniqueID", "Value", "Description"}, 
                {{"Count", each _, type table}}
            ),
    
            #"Added Custom7" = Table.AddColumn(
                #"Grouped Rows1", 
                "Custom", 
                each 
                    [Value] meta (
                        if [Value] = null then 
                            [ADDR = [Count]] 
                        else 
                            []
                    )
            ),
    
            #"Removed Columns4" = Table.RemoveColumns(#"Added Custom7",{"Count", "Value"}),
    
            #"Pivoted Column" = Table.Pivot(
                #"Removed Columns4", 
                List.Distinct(#"Removed Columns4"[Description]), 
                "Description", 
                "Custom"),
    
            #"Removed Columns3" = Table.RemoveColumns(#"Pivoted Column",{"UniqueID"}),
    
            #"Added Custom8" = Table.AddColumn(
                #"Removed Columns3", 
                "Custom", 
                each Value.Metadata([ADDR])[ADDR]?
            ),
    
            #"Removed Columns5" = Table.RemoveColumns(#"Added Custom8",{"ADDR"}),
    
            #"Expanded Custom" = Table.ExpandTableColumn(
                #"Removed Columns5", 
                "Custom", 
                {"Initial Order", "DFRUL", "CODE", "RRUL"}, 
                {"Initial Order", "DFRUL", "CODE", "RRUL"}
            ),
    
            #"Grouped Rows" = Table.Group(
                #"Expanded Custom", 
                {"ID", "NAME", "TYPE"}, 
                {
                    {
                        "Min", 
                        each 
                            List.Min(
                                [Initial Order]
                            ), 
                        type number
                    }, 
                    {
                        "Data", 
                        each _, 
                        type table
                    }
                }
            ),
    
            #"Expanded Data" = Table.ExpandTableColumn(
                #"Grouped Rows", 
                "Data", 
                {"Initial Order", "DFRUL", "CODE", "RRUL"}, 
                {"Initial Order", "DFRUL", "CODE", "RRUL"}
            ),
    
            Custom1 = Table.FromRecords(
                Table.TransformRows(
                    #"Expanded Data", 
                    each 
                        Record.TransformFields(
                            _, 
                            List.Transform(
                                {"ID", "NAME", "TYPE"}, 
                                (fName)=> 
                                    {
                                        fName, 
                                        (fValue)=> if [Initial Order] <> [Min] then 
                                            "" 
                                        else 
                                            fValue
                                    }
                            )
                        )
                )
            ),
    
            #"Sorted Rows" = Table.Sort(Custom1,{{"Initial Order", Order.Ascending}}),
            #"Removed Columns6" = Table.RemoveColumns(#"Sorted Rows",{"Min", "Initial Order"})
        in
            #"Removed Columns6"



    Thursday, March 21, 2019 11:58 AM
  • Hello Andrey,

    Very appreciated to take some of your time for this problem I posted.

    I don't see in your code any reference to Table1 nor Table2, so I assume you take as input both tables as one.

    With this assumption I selected all the range from A1 to E37 for sample tables in original post and created (Ctrl + T) one table (TableX), then I did Data--> From Table and I selected TableX.

    Now in Power Query Advance Editor I enter your code, changing only the source line to this:

    let
            Source = Excel.CurrentWorkbook(){[Name="TableX"]}[Content],
    
    .
    .
    .
            

    and I get several errors:

    The first in step "Added column3" says

    "Expression.Error: The name 'Criterias' wasn't recognized.  Make sure it's spelled correctly."

    and then it comes the error below for rest of steps

    "Expression.Error: The field 'CODE' already exists in the record.
    Details:
        Name=CODE
        Value=

    "

    I'm not sure if I enter your code in correct way or what I did wrong. Please your advice.

    Thanks again.

    Best regards


    • Edited by cgkmal Thursday, March 21, 2019 2:08 PM
    Thursday, March 21, 2019 2:06 PM
  • Sorry - I didn't describe the full solution.
    Create 2 Queries:
    "Source data" for Table 1
    "Criterias" for Table 2
    Populate the Queries with your appropriate data.

    Create empty Query, open Advanced Editor, click in the text area, press Ctrl-A, and paste my code there.
    Close Advanced Editor - you'll see the results.

    Please feel free to ask any questions.


    Thursday, March 21, 2019 6:29 PM
  • Hi cgkmal,

    There is a flaw in my logic, but unlike Andrey, neither Bill nor I had the benefit of working with a full dataset.

    Anyway, there are reasons why more than 894 will be returned (but not as many as I've produced previously!)

    For one thing, in your sample output, where ADDR does not exist with an ID, the row is shown with ADDR = null. There are two such rows belonging to the two appearances of ID = 5. The number of rows thus increase to 896.

    Another example is where there are duplicates in ADDR and/or PREFIX. For example, there are 2 rows where ADDR = 55031 and 2 rows where PREFIX = 55031 (with different codes). Thus for each ADDR = 55031, both PREFIX will be returned. If only one value of PREFIX is to be returned, what logic should be used to make the determination? Another example: The are 5 values of ADDR = 57300 and 2 values of PREFIX = 57300

    Beyond the above examples, it occurred to me that I don't really know what the following means, beyond exact matches:

    that shows the CODE that is THE MOST similar compared with each number of field "ADDR" based on criterias of "TABLE 2

    I've made certain assumptions about the meaning of the above statement, which I am now questioning.

    For example, should ADDR = 1307 match PREFIX = 130770200? Should ADDR = 13188545 match PREFIX = 1318854?

     


    Thursday, March 21, 2019 9:56 PM
  • Hi Colin! I do not have access to the full dataset either - BUT - it's a great idea, it would be cool to have it, isn't it?

    The task is interesting, but having ability not only to implement it in some way, but also to polish the performance would be much better. And Ger would have much more useful result, right?

    Such an approach is widespread in Data Science world (Kaggle.com, etc.). It's possible to send the set privately to participants...

    What do you think, guys (all the three :-))? I would spend some time for it :-).

    Thursday, March 21, 2019 10:56 PM
  • Hi Colin! I do not have access to the full dataset either - BUT - it's a great idea, it would be cool to have it, isn't it?

    The task is interesting, but having ability not only to implement it in some way, but also to polish the performance would be much better. And Ger would have much more useful result, right?

    Such an approach is widespread in Data Science world (Kaggle.com, etc.). It's possible to send the set privately to participants...

    What do you think, guys (all the three :-))? I would spend some time for it :-).

    Sorry for my delay to answer, but I don't know why I'm not notified when somebody answers me :(

    Hello Andrey, thanks for your kindness.

    I already shared a sample with the number of rows of the real file.

    Below again the link with tables in csv file.

    https : // www dot dropbox dot com /s/03z7do9wacc4o8x/sample.csv?raw=1


    PS: I put like this because I receieved error to share links. Only need to remove spaces and replace "dot" by "."





    • Edited by cgkmal Friday, March 22, 2019 1:49 AM
    Friday, March 22, 2019 1:31 AM
  • Hi Andrey,

    I was able to test your code this time and is very close to the first output expected in my first post. The difference between my 1rst output and yours is that yours shows all CODEs and my 1rst desired output shows the CODEs only once.

    Why I'm talking about "my 1rst desired output"? because after first solution provided by Colin, I saw that to make easier the things, the output could change a little bit and changing in that way the output estructure, this "2nd desired output", would be more useful to use it as input of a Pivot Table.

    I must say Bill's solution shows correctly the "2nd output" estructure.

    The 2nd output table for which Bill's solution shows correct output is this (you can see it in previous posts):

    ID NAME TYPE DFRUL ADDR RRUL CODE
    0 JFMSC UHELQ F4 10012002 P1 ABX1
    0 JFMSC UHELQ F4 723 P1 ABX1
    2 PLLSJS UHELQ P3
    4 AAAARR UHELQ T7 35531156 P1 POWQ
    4 AAAARR UHELQ T7 72358 P1 ABX1
    4 AAAARR UHELQ T7 86401 K9 Not Found
    0 PPROOA RRHN P1 43001 T8 ABX1
    0 PPROOA RRHN P1 7459001 D4 UWEER
    0 PPROOA RRHN P1 430457 W2 ABX1
    0 PPROOA RRHN P1 745913 P1 UWEER
    0 PPROOA RRHN P1 74598001 Y5 UWEER

    Then, I think this output requires less steps in your solution, but since my poor knowledge, I'm still not able to modify you code to obtain this output. May you help me to change your code to get this new output please?

    Thanks in advance.


    • Edited by cgkmal Friday, March 22, 2019 7:23 AM
    Friday, March 22, 2019 6:39 AM
  • Hi Colin,

    For one thing, in your sample output, where ADDR does not exist with an ID, the row is shown with ADDR = null. There are two such rows belonging to the two appearances of ID = 5. The number of rows thus increase to 896.

    // You're right. The thing is there are 2 TYPES in sample.csv (MLRR and RRHN). Blocks that begin with ID could be part of the 2 TYPES. Then I could have a TABLE1 where ID=5 only has one TYPE=MLRR or one TYPE=RRHN or both.

    Another example is where there are duplicates in ADDR and/or PREFIX. For example, there are 2 rows where ADDR = 55031 and 2 rows where PREFIX = 55031 (with different codes). Thus for each ADDR = 55031, both PREFIX will be returned. If only one value of PREFIX is to be returned, what logic should be used to make the determination? Another example: The are 5 values of ADDR = 57300 and 2 values of PREFIX = 57300

    // About the why there are 2 ADDR=55031 is because this ADDR belongs to ID=4 and ID=4 has the 1 TYPEs (MLRR and RRHN)
    Regarding repeated values of 55031 in PREFIX column with different CODEs, the query could select the first CODE. In this case PREFIX=55031 is related with CODE=ABTBRT and ATMBRM, then query could select the first CODE that appears in column that is ABTBRT.
    Regarding the case of the 5 ADDR=57300 and 2 PREFIX=57300, the query should take the first CODE as in case for 55031.


    Beyond the above examples, it occurred to me that I don't really know what the following means, beyond exact matches:

    that shows the CODE that is THE MOST similar compared with each number of field "ADDR" based on criterias of "TABLE 2

    I've made certain assumptions about the meaning of the above statement, which I am now questioning.

    For example, should ADDR = 1307 match PREFIX = 130770200? Should ADDR = 13188545 match PREFIX = 1318854?
    // Yes in both cases.
    Normally the ADDR values have larger length than PREFIX numbers, so if there are the prefixes 123, 122, 121 and ADDR= 1229001, then the CODE to select is that related with PREFIX 122.
    If happens that ADDR length is shorter than all PREFIX, for example ADD=1307 and PREFIXES=1307, 1307111, 1308000, then CODE to select would be that related with PREFIX 1307 and if PREFIXES=1307111, 1308000, then CODE to select would be that related with PREFIX 1307111 because is the most similar to ADDR=1307. In this case ADDR could be a "PREFIX" in order to look for the actual PREFIX in table2 that is most similar.

    I hope I answered your doubts.

    Thanks for the help.

    Friday, March 22, 2019 7:21 AM
  • Sorry guys, I was a little bussy. So, thanks Colin and Andrey for stepping in.

    Back to discussion.
    Do our solutions meet your needs?
    Maybe you want something else that we should present.
    I still do not understand how you achieve the result in your first post here. I mean, the last two lines on the last table in Your first post (why RRUL's are P1). And do you want the same layout of data as on last table in first post?

    Friday, March 22, 2019 9:56 AM
  • Hi Ger!

    Regarding the situation with CODEs in the first variant.

    1. My code intencially shows all the pairs of CODE + DFRUL + RRUL. Is that incorrect?

    2. Does "pairing" of CODE with DFRUL / RRUL important, or I can mix any CODE, DFRUL and RRUL in a row?

    3. If I have 2 CODE and 5 RRULs, I can ignore any of RRULs but 2, and I can pick up any of 2 from 5?

    I'll provide code for variant 2 tomorrow.


    Friday, March 22, 2019 1:32 PM
  • Sorry guys, I was a little bussy. So, thanks Colin and Andrey for stepping in.

    Back to discussion.
    Do our solutions meet your needs?
    Maybe you want something else that we should present.
    I still do not understand how you achieve the result in your first post here. I mean, the last two lines on the last table in Your first post (why RRUL's are P1). And do you want the same layout of data as on last table in first post?

    Hi Bill,

    Your post was marked as answered, so I assumed that your code met the requirements. My subsequent work on this was intended as an exercise in optimization. In the past, I would simply reject a slow refreshing solution. Lately, however, I've decided to closely examine the code I've written to see if I could remove roadblocks. In the current case, I could only do so with a full dataset.  As it turns out, only a couple of tweaks were needed, but then realized that I didn't have the right logic in a merge step. Now that that issue has been clarified, and assuming that I implemented the logic correctly, I'm down to a refresh of 2sec.

    Andrey doesn't think that your code is correct (and gives his reason). I have no comment on his assessment - it's up to the OP to decide.

    Friday, March 22, 2019 6:06 PM
  • Hi cgkmal,

    I've taken into consideration your clarifications. Please let me know if the following code meets your requirements. If not, let me know where the issues lie - I could probably tweak the code further to get it exactly right. Performance should be similar to my previous effort that returned too many rows.

    let
        source1 = Excel.CurrentWorkbook(){[Name="Table_1"]}[Content],
        source2 = Excel.CurrentWorkbook(){[Name="Table_2"]}[Content],
        addedIndex = Table.AddIndexColumn(source1, "Index", 0, 1),
        replacedIndexValues = Table.ReplaceValue(
            addedIndex, 
            each [Index],
            each if [DESCRIPTION] = "ID" then [Index] else null,
            Replacer.ReplaceValue,
            {"Index"}
        ),
        filledDown = Table.FillDown(replacedIndexValues,{"Index"}),
        groupedIndex = Table.Group(
            filledDown, 
            {"Index"}, 
            {"Tables", 
              each let 
                       rowCount = Table.RowCount(Table.SelectRows(_, each [DESCRIPTION] = "ADDR")),
                       repeatedValues = if rowCount = 0 then _ else Table.Repeat(Table.FirstN(_, 4), rowCount) & Table.Skip(_, 4)
                   in
                       repeatedValues
            }
        ),
        groupedDescription = Table.TransformColumns(
            groupedIndex, 
            {"Tables", (i) => Table.Group(i, {"DESCRIPTION"}, {"Tables", (j) => Table.AddIndexColumn(j, "Index2")})}
        ),
        combinedTables = Table.Combine(Table.Combine(groupedDescription[Tables])[Tables]),
        changedType = Table.TransformColumnTypes(
            combinedTables, 
            {{"VALUE", type text}, {"Index", type text}, {"Index2", type text}}
        ),
        addedPivotIndex = Table.AddColumn(changedType, "PivotIndex", each [Index] & [Index2]),
        pivotedColumn = Table.Pivot(
            addedPivotIndex, 
            List.Distinct(addedPivotIndex[DESCRIPTION]), "DESCRIPTION", "VALUE"
        ),
        tableToMerge = Table.Buffer(source2),
        mergedTables = Table.AddColumn(
            pivotedColumn, 
            "Code Table", 
            (i) => Table.First(Table.SelectRows(
                tableToMerge, 
                (j) => try if i[ADDR] = j[PREFIX] then true
                           else if Text.StartsWith(i[ADDR], j[PREFIX]) and i[ADDR] = j[PREFIX] then true
                           else if Text.StartsWith(j[PREFIX], i[ADDR]) then true
                           else false
                       otherwise null
            )
        )),
        removedColumns = Table.RemoveColumns(mergedTables,{"Index", "Index2", "PivotIndex", "ADDR"}),
        expandedCodeTable = Table.ExpandRecordColumn(removedColumns, "Code Table", {"CODE"}, {"CODE"})
    in
        expandedCodeTable

    Friday, March 22, 2019 6:36 PM
  • Hello Bill/Colin/Andrey, I hope you're fine!


    As mentioned previously, Bill's solution it works for the second output for what I've checked so far.

    We can forget about first output on my first post since I realize is more complex to get, more confusing and useless to apply as input of a Pivot Table.

    In order to try to answer the doubts of all of you I'm sharing a big sample file that I've just made almost manually. The CODE related with each ADDR is obtained with an Array Formula.


    In the output I show the Fields I want in the output, this is

    ID, NAME, TYPE, DFRUL, ADDR, RRUL, CODE 

    But to the rigth I put another column (PREFIX) in order for you to have an idea which PREFIX was matched for an especific ADDR. In this sample there are several ADDR that are NOT FOUND in TABLE2.

    Below the link for the file, only remove spaces, replace dot with ".", mmm with www and "\" with "/"

    Sorry, but they still don't allowed me to share links.

     https : \\ mmm dot dropbox dot com \ s\ 3m7f1up3y9qyp2q \ bigsample.xlsx ? raw = 1


    Thanks so much for your help.


    Best regards
    • Edited by cgkmal Friday, March 22, 2019 7:11 PM
    Friday, March 22, 2019 7:07 PM
  • Hi cgkmal,

    I've taken into consideration your clarifications. Please let me know if the following code meets your requirements. If not, let me know where the issues lie - I could probably tweak the code further to get it exactly right. Performance should be similar to my previous effort that returned too many rows.



    Hi Colin,

    I've tested your new code with my las sample file that I shared in my previous post. This time it shows the correct information and correct number of rows, the only thing is that shows different order than the output in the file bigsample.xlsx

    The field ADDR should appear in output and order of ADDR in output should be the same as input Table1.

    Thanks for the help again.

    Best regards


    • Edited by cgkmal Saturday, March 23, 2019 5:14 AM
    Saturday, March 23, 2019 5:09 AM
  • Sorry guys, I was a little bussy. So, thanks Colin and Andrey for stepping in.

    Back to discussion.
    Do our solutions meet your needs?
    Maybe you want something else that we should present.
    I still do not understand how you achieve the result in your first post here. I mean, the last two lines on the last table in Your first post (why RRUL's are P1). And do you want the same layout of data as on last table in first post?

    Hi Bill,

    I've tested your code with Table1 and Table2 of bigsample.xlsx I shared previously. The information of all fields and order of rows is correct but it seems for all ADDR values the CODE appears like "Not Found".

    Below the link for the file bigsample.xlsx, only remove spaces, replace dot with ".", mmm with www and "\" with "/"

    https : \\ mmm dot dropbox dot com \ s\ 3m7f1up3y9qyp2q \ bigsample.xlsx ? raw = 1

    Sorry, but they don't allowed me to share links.

    Thanks a lot. Best regards

    Saturday, March 23, 2019 5:19 AM
  • Hi Ger!

    I optimized my initial approach. Please see the code below. The code formatting is more suitable for viewing in Power BI than to see right here.

    Based on your bigsample, it seems that my approach is the slowest among the 3, but it would be interesting to know timing of the full dataset.

    I'll try to play with some 4-th approach to solve the task after you provide the performance of the current solution.

    Just in case - Pivot Table can provide you with the view like your first variant of result table, if you have Excel new enough.

    let
            Source = #"Source data",
        #"Added Index1" = Table.AddIndexColumn(Source, "Index", 0, 1),
    
            #"Added Custom" = Table.AddColumn(
                #"Added Index1", 
                "Main Key", 
                each 
                    if [DESCRIPTION] = "ID" then 
                        [Index]
                    else 
                        null,
            type number),
        #"Added Custom10" = Table.AddColumn(#"Added Custom", "Last notADDR", each if [DESCRIPTION] <> "ADDR" and [DESCRIPTION] <> "RRUL" then [Index] else null),
        #"Filled Down" = Table.FillDown(#"Added Custom10",{"Main Key", "Last notADDR"}),
        #"Added Custom2" = Table.AddColumn(#"Filled Down", "Key", each [Main Key] + (if [DESCRIPTION] = "RRUL" then [Index] - [Last notADDR] -2 else if [DESCRIPTION] = "ADDR" then [Index] - [Last notADDR] - 1 else 0)),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Index", "Main Key", "Last notADDR"}),
        #"Pivoted Column1" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[DESCRIPTION]), "DESCRIPTION", "VALUE"),
        #"Added Custom3" = Table.AddColumn(#"Pivoted Column1", "CODE", each if [ADDR] = null then null else let t = Table.SelectRows(Criterias, (x)=> 
    	    let s=List.Sort({x[PREFIX], [ADDR]}, each Text.Length(_)) in Text.StartsWith(s{1}, s{0})) 
        in if Table.RowCount(t) > 0 then Table.First(Table.Sort(t, (y)=> Number.Abs(Text.Length([ADDR]) - Text.Length(y[CODE]))))[CODE] 
    	   else "Not Found"),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"Key", "ADDR"}),
        #"Filled Down1" = Table.FillDown(#"Removed Columns1",{"ID", "NAME", "TYPE", "DFRUL"})
    in
        #"Filled Down1"

    • Edited by Andrey Minakov Sunday, March 24, 2019 12:02 AM
    • Marked as answer by cgkmal Wednesday, March 27, 2019 8:19 PM
    Saturday, March 23, 2019 10:36 PM
  • Hello Andrey, 

    Thanks for share this new solution. I haven't test perfomance with a clock but it took less than 5 seconds with the bigsample.xlsx

    I'll try to test with real file, but I think the performance wouldn't be an issue if is less than 10 seconds.

    I see that all values in your output match my output in bigsample.xlsx, but I'm not sure why in your code there is only one CODE that seems to be wrong.

    You can see in your output for row 55 that has this record set.

    ID,NAME,TYPE,DFRUL,RRUL,CODE
    3,NAME_11,WXYZ,SIMPLE,VV5,D02HN2

     For this row the ADDR related is 57300, but,

    For PREFIX=57300 there are 2 CODES, LCOCOO and LTICOI
    For PREFIX=5730066900 there are 3 CODES, MCMGTM, D02HN2 and YTCPRC

    In this case the CODE more similar to ADDR=57300 is PREFIX 57300, then the CODE in output should be LCOCOO and not D02HN2.

    I'll try to tests times asap and let you know.

    PS: I made a questions several days ago in how to get the first variant of the output with Pivot Table and I asked if it was possible or only with a VBA macro and the answer I received was. "Maybe you can do it with Power Query" and I didn´t know what  was Power Query at that moment, then I asked how to do it in PQ hehe. It would be interesting to know how to do the first variant with Pivot Table, if not too much complicated hehe.

    Thanks so much for your help and all of you guys.


    • Edited by cgkmal Sunday, March 24, 2019 4:47 AM
    Sunday, March 24, 2019 2:55 AM
  • Hi Ger!

    Below is the updated solution. It was mistake in the previous code which led to wrong data only in Excel (in Power BI all was OK :-), at least for the particular row, mentioned by you).

    In general, I compiled the solution in order to be as less vulnerable to problems with data, as possible. The only constrains to data are:

    1. Field sets must have ID field, which must be the first field of set.

    2. all the RRUL and ADDR have to be in pairs,

    3. Duplicates of RRUL/ADDR pairs inside one ID are acceptable or absent.

    Don't forget to assess the vulnerability of other solutions.

    I also compiled the solution in a way to correctly find the closest value in all possible variants of ADDR and PREFIX. By the way - there is one case, not covered in your bigsample - when PREFIX is shorter then ADDR but not equal to it. If there are such cases - my solution handles them correctly but demands some performance overhead for this particular situation.

    Regarding Pivot Tables - I didn't mean that it's possible to solve your task with it, I meant that for tasks, solvable by Pivot Tables, you can configure the look of the result like your first variant, i.e. with empty rows for repeatable values.

    Regarding possibility to implement the solution through PT - I also think that's impossible w/o VBA, first of all  because Lookup functions cannot fit your demands.

    let
            Source = #"Source data",
        #"Added Index1" = Table.AddIndexColumn(Source, "Index", 0, 1),
    
        #"Added Custom" = Table.AddColumn(#"Added Index1", "Main Key", each if [DESCRIPTION] = "ID" then [Index] else null, type number),
    
        #"Added Custom10" = Table.AddColumn(#"Added Custom", "Last notADDR", each 
            if [DESCRIPTION] <> "ADDR" and [DESCRIPTION] <> "RRUL" then [Index] else null),
    
        #"Filled Down" = Table.FillDown(#"Added Custom10",{"Main Key", "Last notADDR"}),
    
        #"Added Custom2" = Table.AddColumn(#"Filled Down", "Key", each [Main Key] + (
            if [DESCRIPTION] = "RRUL" then [Index] - [Last notADDR] - 2 
                else if [DESCRIPTION] = "ADDR" then [Index] - [Last notADDR] - 1 else 0)),
    
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Index", "Main Key", "Last notADDR"}),
    
        #"Pivoted Column1" = Table.Pivot(#"Removed Columns", 
            List.Distinct(#"Removed Columns"[DESCRIPTION]), "DESCRIPTION", "VALUE"),
    
        #"Added Custom3" = Table.AddColumn(#"Pivoted Column1", "CODE", each if [ADDR] = null then null else let t = Table.AddIndexColumn(Table.SelectRows(Criterias, (x)=> 
    	    let s=List.Sort({x[PREFIX], [ADDR]}, each Text.Length(_)) in Text.StartsWith(s{1}, s{0})), "Index")
                in if Table.RowCount(t) > 0 then Table.First(Table.Sort(t, (y)=> Number.BitwiseShiftLeft(Number.Abs(Text.Length([ADDR]) - Text.Length(y[PREFIX])), 32) + y[Index]))[CODE] 
    	        else "Not Found"),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"Key", "ADDR"}),
        #"Filled Down1" = Table.FillDown(#"Removed Columns1",{"ID", "NAME", "TYPE", "DFRUL"})
    in
        #"Filled Down1"

    PS. I forgot to mention (thanks Colin for your update), that there is a controversy in your demands, Ger. You wrote somewhere upper that if there are equal prefixes meet the criteria, the first CODE should be used. But seems your output is based (at least in case of 34609 PREFIX) on some other principles, plus I support Colin's questions as well.


    Sunday, March 24, 2019 3:12 PM
  • Hi cgkmal,

    I can reproduce your bigsample output exactly, with the exception of 4 cases. In all of these cases, your output is based on approximate matches instead of exact matches:

    There are exact matches for the following values:

    ADDR/PREFIX = 31653
    ADDR/PREFIX = 52554999
    ADDR/PREFIX = 57300
    ADDR/PREFIX = 1303319

    However, instead your output returns:

    ADDR = 31653, PREFIX = 3165308411
    ADDR = 52554999, PREFIX = 525549997
    ADDR = 57300, PREFIX = 5730066900
    ADDR = 1303319, PREFIX = 13033197

    Of course, the codes are different for the approximate matches.

    After you provide the logic for choosing these approximate matches, I might be able to adjust my output to match all of your output values.

    Sunday, March 24, 2019 9:44 PM

  • Below I show the possibilities to compare ADDR and PREFIX values.

    Regarding the logic would be like this:

    If in ADDR column there are the values:
    12345, 456, 7890235,897

    And in PREFIX column these:
    1123, 123, 12344, 456, 45601, 456321, 789023, 7890235, 897023, 89967

    Then the PREFIX more similar to each ADDR would be:
       ADDR           PREFIX
    • 12345     --> 123          (PREFIX shorter than ADDR. This is the most probable event,
                                             that the PREFIX be shorter than ADDR)

    • 456          -->  456         (Found exact match between PREFIX and ADDR,
                                              having 2 more PREFIXes with larger length)

    • 7890235 --> 7890235  (Found exact match between PREFIX and ADDR,
                                               having 1 more PREFIX with shorter length)

    • 897          --> 897023    (ADDR shorter than PREFIX. This is the less probable event,
                                              that the ADDR be shorter than PREFIX)

    • If in PREFIX column are found more than one value that match any of the above possibilities, then select as PREFIX the first found.

    So, if your code found exact matches, that is the correct output and in my bigsample.xlsx is wrong, probably because I had to did the output almost manually and maybe the array formula used in my sample doesn't analyses all these possibilities.

    I think there no more possibilities.

    Thanks for your help.




    • Edited by cgkmal Monday, March 25, 2019 6:22 AM
    Monday, March 25, 2019 12:35 AM
  • My solution fits that demand.
    Monday, March 25, 2019 1:20 PM
  • Ok, my solution is still lengthy, but should meet the requirements. With bigsample, the refresh time in Excel is negligible, but with your full tables (896 rows), the refresh time is four times slower. We've gone from 2 sec to a whopping 8 sec. That's the price we pay for correctness I suppose. :)

    let    
        source1 = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        source2 = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
        addedIndex = Table.AddIndexColumn(source1, "Index"),
        replacedIndexValues = Table.ReplaceValue(
            addedIndex, 
            each [Index],
            each if [DESCRIPTION] = "ID" then [Index] else null,
            Replacer.ReplaceValue,
            {"Index"}
        ),
        filledDown = Table.FillDown(replacedIndexValues,{"Index"}),
        groupedIndex = Table.Group(
            filledDown, 
            {"Index"}, 
            {"Tables", 
              each let 
                       rowCount = Table.RowCount(Table.SelectRows(_, each [DESCRIPTION] = "ADDR")),
                       repeatedValues = if rowCount = 0 then _ else Table.Repeat(Table.FirstN(_, 4), rowCount) & Table.Skip(_, 4)
                   in
                       repeatedValues
            }
        ),
        groupedDescription = Table.TransformColumns(
            groupedIndex, 
            {"Tables", (i) => Table.Group(i, {"DESCRIPTION"}, {"Tables", (j) => Table.AddIndexColumn(j, "Index2")})}
        ),
        combinedTables = Table.Combine(Table.Combine(groupedDescription[Tables])[Tables]),
        addedPivotIndex = Table.AddColumn(combinedTables, "PivotIndex", each [Index] + [Index2]),
        pivotedColumn = Table.Pivot(
            addedPivotIndex, 
            List.Distinct(addedPivotIndex[DESCRIPTION]), "DESCRIPTION", "VALUE"
        ),
        tableToMerge = Table.Buffer(source2),
        mergedTables = Table.AddColumn(
            pivotedColumn, 
            "CODE", 
            (i) => Table.SelectRows(
                       tableToMerge, 
                       (j) => try 
                                 if i[ADDR] = j[PREFIX] then true
                                 else if Text.StartsWith(i[ADDR], j[PREFIX]) then true
                                 else if Text.StartsWith(j[PREFIX], i[ADDR]) then true
                                 else false
                              otherwise null
                       )
        ),
        transformedRows = Table.FromRecords( 
            Table.TransformRows(
                mergedTables,
                (row) =>
                Record.TransformFields(
                    row,
                    {
                       "CODE", 
                       each if Table.RowCount(row[CODE]) > 1 then
                                Table.Min(Table.SelectRows(
                                    row[CODE], 
                                    (j) =>  j[PREFIX] >= row[ADDR]
                                ), "PREFIX")
                             else Table.Min(row[CODE], "PREFIX")
                    }
                )
            )
        ),
        removedColumns = Table.RemoveColumns(transformedRows,{"Index", "Index2", "PivotIndex"}),
        recordValues = Table.TransformColumns(removedColumns, {"CODE", each try Record.Field(_, "CODE") otherwise "NOT FOUND"})
    in
        recordValues

    For easy testing, I simply aligned my output table with your table. Then in a separate cell (lined up with the first value in both tables), I entered the formula =G2=O2, where G2 is the first CODE value in my table and O2 is the first CODE value in your table. Then I dragged the formula down. The only FALSE values that were returned were exact matches in my table and approximate matches in your table (the four values that I mentioned in my previous post).

     
    Monday, March 25, 2019 3:05 PM
  • Hi Colin :-)

    Here is my approach (should be fast enough). Still have no time to clean the code (so, still draft). There are several differences compared to Your result table. I do not want to decide which one is the expected result.

    let
        fxCheck2 = (t as table) =>
            let
                First4 = Table.FirstN(t, 4),
                Custom2 = Table.Skip(t, 4),
                #"Added Index1" = Table.AddIndexColumn(Custom2, "Index", 0, 1),
                #"Integer-Divided Column" = Table.TransformColumns(#"Added Index1", {{"Index", each Number.IntegerDivide(_, 2), Int64.Type}}),
                #"Grouped Rows1" = Table.Group(#"Integer-Divided Column", {"Index"}, {{"tbl", each Table.PromoteHeaders(Table.Transpose(_[[DESCRIPTION], [VALUE]])), type table}}),
                Custom3 = Table.Combine(#"Grouped Rows1"[tbl]),
                Custom1 = Table.FromRows({First4[VALUE] & {Custom3}}, First4[DESCRIPTION] & {"tbl"} ),
                #"Expanded {0}" = Table.ExpandTableColumn(Custom1, "tbl", {"ADDR", "RRUL"}, {"ADDR", "RRUL"})
            in
                #"Expanded {0}",
            
    
        fxExact = (a as nullable text) =>
            let
                Step1 = Table.SelectRows(PrefTbl, each [PREFIX] = a ),
                IsEmpty = if Table.IsEmpty(Step1) then fxAprox(a) else Table.First(Step1)
            in
                IsEmpty,
    
        fxAprox = (a as nullable text) =>
            let
                Step1 = Table.SelectRows(PrefTbl, each Text.StartsWith(a, [PREFIX])),
                IsEmpty = if Table.IsEmpty(Step1) then fxAprox2(a) else Table.First(Step1)
            in
                IsEmpty,
    
        fxAprox2 = (a as nullable text) =>
            let
                Step1 = Table.SelectRows(PrefTbl, each Text.StartsWith([PREFIX], a )),
                IsEmpty = if Table.IsEmpty(Step1) then [CODE = "Not Found", PREFIX = "Not Found"] else Table.Last(Step1)
            in
                IsEmpty,
    
    
        Table2 = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
        ChType = Table.TransformColumnTypes(Table2,{{"PREFIX", type text}, {"CODE", type text}}),
        CleanedText = Table.TransformColumns(ChType,{{"PREFIX", Text.Clean, type text}, {"CODE", Text.Clean, type text}}),
        #"Removed Duplicates" = Table.Distinct(CleanedText),
        #"Inserted Text Length" = Table.AddColumn(#"Removed Duplicates", "Length", each Text.Length([PREFIX]), Int64.Type),
        SortedRows = Table.Sort(#"Inserted Text Length",{{"Length", Order.Ascending}, {"PREFIX", Order.Ascending}}),
        RemovedColumns = Table.RemoveColumns(SortedRows,{"Length"}),
        PrefTbl = Table.Buffer(RemovedColumns),
    
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"DESCRIPTION", type text}, {"VALUE", type text}}),
        #"Cleaned Text" = Table.TransformColumns(#"Changed Type",{{"DESCRIPTION", Text.Clean, type text}, {"VALUE", Text.Clean, type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Cleaned Text", "Indeks", 0, 1),
        #"Added Conditional Column" = Table.AddColumn(#"Added Index", "SplitHere", each if [DESCRIPTION] = "ID" then [Indeks] else null),
        #"Filled Down" = Table.FillDown(#"Added Conditional Column",{"SplitHere"}),
        #"Grouped Rows" = Table.Group(#"Filled Down", {"SplitHere"}, {{"tbl", each fxCheck2(_[[DESCRIPTION],[VALUE]]), type table }}, 0),
        Custom1 = Table.Combine(#"Grouped Rows"[tbl]),
        #"Changed Type1" = Table.TransformColumnTypes(Custom1,{{"ADDR", type text}}),
        #"Trimmed Text" = Table.TransformColumns(#"Changed Type1",{{"ADDR", Text.Trim, type text}}),
        #"Added Custom" = Table.AddColumn(#"Trimmed Text", "Niestandardowe", each if [ADDR] = null then [CODE = null] else fxExact([ADDR]) ),
        #"Expanded {0}" = Table.ExpandRecordColumn(#"Added Custom", "Niestandardowe", {"CODE", "PREFIX"}, {"CODE", "PREFIX"})
    in
        #"Expanded {0}"

    Monday, March 25, 2019 10:26 PM
  • Hi Bill,

    Don't think that I've ever seen you solve any problem with more than about ten lines of code. Losing your edge? :D

    "There are several differences compared to Your result table. I do not want to decide which one is the expected result."

    You don't need to decide. An expected result is provided in an Excel workbook called bigsample. You may have missed the following link posted by cgkmal previously:

    https://www.dropbox.com/s/3m7f1up3y9qyp2q/bigsample.xlsx

    Monday, March 25, 2019 11:42 PM
  • Hello guys,

    Thanks so much for the help in this question. I never received such great help from 3 experts and with the kindness and passion to help others that you don't know like you have done to me.

    I will try your 3 solutions again and check performance and values and let you know.

     I hope this thread helps other in the future too.

    Best regards

    Tuesday, March 26, 2019 2:10 PM
  • Hello again guys,

    I've tested the 3 solutions that each of you have kindly provided.

    I've fix the output because even mine had issues hehe. After that I tested your solutions and these are the results.

    Regarding perfomance:

    Is fine enough for the 3 solutions. Below times aprox I took.

    Bill_Output  --> 1.00 seg 
    Colin_Output      --> 1.40 seg
    Andrey_Output   --> 2.46 seg

    Regarding the output:

    Andrey_Output   -->   0 differences 
    Colin_Output      -->  11 differences 
    Bill_Output  -->  12 differences  

    The differences in Bill's and Colin's output for what I see is regarding the condition when there are 2 o more exact matched between ADDR and PREFIX. When this happens, the CODE to select should be the first in column CODE, sorted ascending, for example.

    For ADDR=54079 there are the following 2 PREFIXES with the same value 54079

    54079 G01AR1
    54079 GTMARM

    In Bill's and Colin's output is selected the CODE GTMARM, but sorting ascending column PREFIX and then sorting ascending column CODE results in that the first CODE for ADDR=54079 is G01AR1.

    I think is only that condition is missing, after that all output should be the same for the 3 solutions.

    I share with you the bigsample_v1.xlsx, where you can see the comparisons for each solution. The link is:

    https : // www . dropbox . com /s/2mi2wwr85lnzc69/bigsample_v1.xlsx?raw=1

    * Only remove the spaces from link

    Many thanks again for such awesome help, I've learned from you and I'll learn more when I could dominate/master in better way the techniques you've used.

    Best regard



    • Edited by cgkmal Wednesday, March 27, 2019 4:55 AM
    Wednesday, March 27, 2019 4:54 AM
  • Hi, Ger! You.re welcome! Thank you for the info regarding the perfomance results. And thank Bill and Colin for their interesting solutions!
    Wednesday, March 27, 2019 10:21 AM
  • First, I'd like to congratulate Andrey for a perfect score and a clever solution.

    I must confess that I "cheated" a bit with my solution. Being weary of what appeared to be ever changing requirements, I simply tailored my solution to match what turns out to be an incorrect reference output. It's was really my bad for ignoring the requirements, which were consistent, and bowing to the reference output values instead.

    So by changing how the prefix value is selected in the case of duplicates, I believe that I now match the real output :)

    let    
        source1 = Excel.CurrentWorkbook(){[Name="Table9"]}[Content],
        source2 = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
        addedIndex = Table.AddIndexColumn(source1, "Index"),
        replacedIndexValues = Table.ReplaceValue(
            addedIndex,
            each [Index],
            each if [DESCRIPTION] = "ID" then [Index] else null,
            Replacer.ReplaceValue,
            {"Index"}
        ),
        filledDown = Table.FillDown(replacedIndexValues,{"Index"}),
        groupedIndex = Table.Group(
            filledDown,
            {"Index"},
            {"Tables",
              each
              let
                  rowCount = Table.RowCount(Table.SelectRows(_, each [DESCRIPTION] = "ADDR")),
                  repeatedValues = if rowCount = 0 then _ else Table.Repeat(Table.FirstN(_, 4), rowCount) & Table.Skip(_, 4)
              in
                  repeatedValues
            }
        ),
        groupedDescription = Table.TransformColumns(
            groupedIndex,
            {"Tables", (i) => Table.Group(i, {"DESCRIPTION"}, {"Tables", (j) => Table.AddIndexColumn(j, "Index2")})}
        ),
        combinedTables = Table.Combine(Table.Combine(groupedDescription[Tables])[Tables]),
        addedPivotIndex = Table.AddColumn(combinedTables, "PivotIndex", each [Index] + [Index2]),
        pivotedColumn = Table.Pivot(
            addedPivotIndex,
            List.Distinct(addedPivotIndex[DESCRIPTION]), "DESCRIPTION", "VALUE"
        ),
        tableToMerge = Table.Buffer(source2),
        mergedTables = Table.AddColumn(
            pivotedColumn,
            "CODE",
            (i) =>
            Table.SelectRows(
                tableToMerge,
                (j) =>
                try
                    if i[ADDR] = j[PREFIX] then true
                    else if Text.StartsWith(i[ADDR], j[PREFIX]) then true
                    else if Text.StartsWith(j[PREFIX], i[ADDR]) then true
                    else false
                otherwise null
            )
        ),
        removedColumns = Table.RemoveColumns(mergedTables,{"Index", "Index2", "PivotIndex"}),
        transformedRows = Table.FromRecords(
            Table.TransformRows(
                removedColumns,
                (rowOuter) =>
                Record.TransformFields(
                    rowOuter,
                    {"CODE",
                       each
                       if Table.RowCount(rowOuter[CODE]) > 1 then
                           Table.First(Table.SelectRows(rowOuter[CODE], (rowInner) =>  rowInner[PREFIX] >= rowOuter[ADDR]))
                       else if Table.RowCount(rowOuter[CODE]) = 1 then Table.First(rowOuter[CODE])
                       else if Table.RowCount(rowOuter[CODE]) = 0 then
                          if rowOuter[ADDR] <> null then "NOT FOUND"
                          else null
                       else null
                    }
                )
            )
        ),
        codeValues = Table.TransformColumns(transformedRows, {"CODE", each try Record.Field(_, "CODE") otherwise _})
    in
        codeValues

    Pretty sure that it would be an easy mod for Bill also. Remember that his last posted code was still in draft. :)

    • Marked as answer by cgkmal Wednesday, March 27, 2019 8:18 PM
    Wednesday, March 27, 2019 4:48 PM
  • Hello Colin,

    Apologies for any inconvenience regarding the change of the expected output from my side. Thank you for the patience you have had with me, usually in other forums they are not so kind and patient.

    When I asked the question, my desired output was one, after Colin's first response and when I saw a video of the correct form of a table to apply to a PivotTable, I changed the output. Until then there should have been only 1 change, but when I almost manually made the example with the reference values, I was wrong in some values and Colin helped me to see thise errors. In fact, testing the 3 solutions I tuned my desired output.

    I confirm that the output with the new Colin's solution has zero differences, that is, a correct output. :)

    Now I have an issue, I don't know how to put the question as "solved" but assign the solution to all of you, not only one person.I think all of you deserves the recognition!!!

    Thanks a lot


    • Edited by cgkmal Wednesday, March 27, 2019 5:39 PM
    Wednesday, March 27, 2019 5:36 PM
  • Glad to know that I got it right - hopefully with no worse performance.

    Now I have an issue, I don't know how to put the question as "solved" but assign the solution to all of you, not only one person.

    I believe than you can mark multiple posts as answered, or unmark posts previously marked as answered.

    Bill's corrected solution hopefully will be forthcoming. You can mark that as answered too.

    Wednesday, March 27, 2019 6:39 PM