none
Really tough Power Query List Generate Problem RRS feed

  • Question

  • Hi - I put this question up on the MS PowerBI forums but wasn't able to get the help I needed, and I know from previous searches this is where the likes of Imke, Marcel and the other PQ geniuses hang out so maybe I'll get lucky.  I don't HAVE to do this with List.Generate but I just have the feeling this is a good way of solving it - pity I haven't fully got to grips with List.Generate yet. Here is the problem - thanks in advance:

    Hi

     

    I'm struggling with a problem I have tried to solve myself but I am now running out of time, so I turn to this community brain !

     

    SET-UP

     

    I have a list of points on a network, some of which intersect with each other.  These interactions can be shown in two columns as follows:

     

    ABCD | EFGH

    AABB | HGYT

    TTYF | UYWR

    HGYT | PQLU

    WOWI | PLAS

    SSEW | PQLU

     

    ... and so on.

     

    There are hundreds of these pairings and there are two important things to consider.

     

    Firstly, points connect with each other and as an example (above), we can see that point AABB connects to HGYT and then (further down the list) we can see that point HGYT connects to point PQLU.

     

    Secondly, each point may or may not "terminate".  So, the above points would have the following status:

     

    ABCD | T

    EFGH |

    AABB |

    HGYT| 

    TTYF | 

    UYWR | T

    PQLU | 

    SSEW | T

    WOWI | T

    PLAS |

     

     

    If they "terminate", they have a "T" in the column next to them and if they do not terminate they have a blank (currently this status is in a separate list, and instead of blank it could be anything).  Actually, for my current attempts to solve the problem I have used 1 for "terminate" and 0 for "not terminate".

     

    PROBLEM

     

    I need to look at each point in a list and track its connections until it "terminates".  So, for a very simple example, taking point AABB, we can see that it connects to HGYT, then refer to our status list and can see that HGYT does NOT terminate, so we continue to consider where HGYT connects to, and see that PQLU does NOT terminate.  PQLU also connects to SSEW, which does terminate.  In this simple case, we have the "paths" for AABB: which is AABB | HGYT and HGYT | PQLU.  

     

    RULES

     

    We don't need to consider whether the starting point has a "T" flag ("T" for terminate) or not, since we always want to include the first path (in the case above, AABB | HGYT.

     

     

    We need to keep going down all paths until we hit a "T">

     

    There will sometimes be mutiples branches from the starting point AND along the way.  We don't get crazy numbers of permutations but we could branch 6 times in one direction and 3 times down another path before hitting a "T".

     

    My progress so far has been to lookup the termination status of each point in the pairs (using 1 for "T" and 0 for "non T"), like this:

     

    ABCD | EFGH | 1 | 0

    AABB | HGYT | 0 | 0

    TTYF | UYWR | 0 | 1

     

    ... and so on.

     

    I can then take the current point I am interested in (AABB), define that as a variable, and then use that to filter each column of points so I only get the other points where AABB is listed.

     

    My problem, is then moving on to any subsequent connections (so, going down the HGYT branch, etc).

     

    I thought about using a looping function or trying to work with List.Generate but the truth is, I need some help at this stage.

     

    I'd like to return a list of "pairs of points" for each starting point: so for the example above, for considering the point AABB I'd like to return:

     

    AABB | AABB | HGYT

    AABB | HGYT | PQLU

     

    where the first column contains the point we are working with and the next two columns are the path (containing the two points in the first list I showed at the start).

    Wednesday, March 28, 2018 7:49 PM

Answers

  • I think I solve it - basically I add index to Start & End nodes to cover multiple successors. The chain created in the end has artificial index every 5th character, but this is easy to clean

    NodesRelation

    let
        Source = #table({"Start","End"},{{"ABCD","EFGH"},{"AABB","HGYT"}, {"HGYT","UYWR"},{"TTYF","UYWR"},{"HGYT","PQLU"},{"WOWI","PLAS"},{"SSEW","PQLU"},{"PQLU","ASDF"}}),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Start", type text}, {"End", type text}}),
        Start_Select = Table.SelectColumns(#"Changed Type",{"Start"}),
        Start_Unique = Table.Distinct(Start_Select),
        Start_Merge = Table.NestedJoin(Start_Unique,{"Start"},#"Changed Type",{"Start"},"StartRef",JoinKind.LeftOuter),
        Start_Index_Add = Table.AddColumn(Start_Merge, "Custom", each Table.AddIndexColumn([StartRef], "Index", 1, 1)),
        Start_Expand = Table.ExpandTableColumn(Start_Index_Add, "Custom", {"End", "Index"}, {"End", "Index"}),
        Start_New = Table.AddColumn(Start_Expand, "NewStart", each [Start]&Number.ToText([Index])),
        Start_Clean = Table.SelectColumns(Start_New,{"Start", "End", "NewStart"}),
        End_Select = Table.SelectColumns(Start_Clean,{"End"}),
        End_Unique = Table.Distinct(End_Select),
        End_Merge = Table.NestedJoin(End_Unique,{"End"},Start_Clean,{"Start"},"EndRef",JoinKind.Inner),
        End_Index_Add = Table.AddColumn(End_Merge, "Custom", each Table.AddIndexColumn([EndRef], "Index", 1, 1)),
        End_Expand = Table.ExpandTableColumn(End_Index_Add, "Custom", {"Index"}, {"Index"}),
        End_New = Table.AddColumn(End_Expand, "NewEnd", each [End] & Number.ToText([Index])),
        End_Clean = Table.SelectColumns(End_New,{"End","NewEnd"}),
        Final_Ref_Start = Start_Clean,
        Final_Merge = Table.NestedJoin(Final_Ref_Start,{"End"},End_Clean,{"End"},"Custom",JoinKind.LeftOuter),
        Final_Expand = Table.ExpandTableColumn(Final_Merge, "Custom", {"NewEnd"}, {"NewEnd_temp"}),
        Final_NewEnd = Table.AddColumn(Final_Expand, "NewEnd", each if [NewEnd_temp] = null then [End] & "1" else [NewEnd_temp]),
        Final_Clean = Table.SelectColumns(Final_NewEnd,{"Start", "End", "NewStart", "NewEnd"})
    in
        Final_Clean

    fnFindNextNode

    (chain as text) => 
    let
        nodeLength = 5,
        EndNode = Text.End(chain,nodeLength),
        PositionOfLastNode = List.PositionOf(NodesRelations[NewStart],EndNode),
        End = if PositionOfLastNode = -1 then "" else NodesRelations{PositionOfLastNode}[NewEnd]
    in
        End

    Output

    let
        Source = NodesRelations,
        #"Added Custom2" = Table.AddColumn(Source, "Chain", each fnCreateChain([NewStart]&[NewEnd]))
    in
        #"Added Custom2"



    Wednesday, April 4, 2018 1:33 PM

All replies

  • This looks like a parent-child pattern.

    Have you checked this blogpost?: http://www.thebiccountant.com/2017/02/14/dynamically-flatten-parent-child-hierarchies-in-dax-and-powerbi/


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Wednesday, March 28, 2018 9:07 PM
    Moderator
  • Thank you for replying. Yes, I bookmarked that for when I understand Power Query better but I couldn't adapt it to suit my needs right now (I'm finding it hard to make the leap to more complex solutions in PQ and was hoping that seeing the solution to this problem will also help me to bridge what seems like a chasm)  
    Wednesday, March 28, 2018 10:16 PM
  • Please apply this function to your data and check the result. Then tell me what is different to your expected results.

    If that doesn't work, please post sample data can be used to determine before and after, as I don't understand your description above. The problem with the sample data above for me is that: I don't know how many source tables are involved and that it contains data that is not relevant to the end result you've posted here, so I'm having difficulties to follow along.

    Thanks.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Thursday, March 29, 2018 5:33 AM
    Moderator
  • Thank you again - I will try and do what you requested tomorrow and will post what I find.  In the meantime, I will try to offer a better explanation. There is one input table and one output table. 

    INPUT DATA:

    ABCD | EFGH | 1 | 0

    AABB | HGYT | 0 | 0

    HGYT | PQLU | 0 | 0

    TTYF | UYWR | 0 | 1

    PQLU | EDFG | 0 | 0

    YYTT | EDFG | 0 | 0

    YYTT | GFHD | 0 | 1

    SSEW | PQLU | 0 | 0

    FWQA | SSEW | 1 | 0


    Column 1 is 1st node in node pair and 3rd col indicates whether the node in col 1 terminates (1) or does not terminate (0). Likewise, column 2 is second node in node pair and 4th column indicates whether the node in column 3 terminates or not.

    I have a list of hundreds of node pairs which describe the network topography and this is one way I could present the input data. 

    For each node THAT DOES NOT TERMINATE (that has status "0") I need to find a string of "paths" to describe the connections until it hits a node that has termination status "1" (which is the end of the line for that part of the network). 

    So above, node AABB would have this output as a row in a table:

    AABB HGYT PQLU EDFG YYTT SSEW

    which means AABB is the starting point (or starting node in consideration) and the other nodes describe the "path" or nodes that are in the chain until it gets to SSEW and GFHD (which both terminate). 

    In this example AABB goes through HGYT PQLU EDFG YYTT, before terminating at GFHD. AABB's path also splits off at PQLU, which goes to SSEW before terminating at FWQA.

    We need an output line for every node that has status "0".

    Hope that is more clear. 



    • Edited by PQ_slave Thursday, March 29, 2018 8:29 PM
    Thursday, March 29, 2018 7:37 PM
  • Hi

    Cant see in the last sample, how is "In this example AABB goes through HGYT PQLU EDFG YYTT". There is no "EDFG | YYTT" pair.

    Or it is just swapped accidently? Or connection could be considered "both ways", i.e. YYTT | EDFG means that EDFG connected to YYTT and we should move right to left?


    Maxim Zelensky Excel Inside

    Monday, April 2, 2018 1:58 PM
  • Maxim,

    "Cant see in the last sample, how is "In this example AABB goes through HGYT PQLU EDFG YYTT". There is no "EDFG | YYTT" pair."

    Since the dataset contains hundreds of pairs, I suspect that there are many pairs that appear as in this example. It is therefore impossible to apply a function that outputs a materialized path (as in Imke's function, or my own Path function, which duplicates the DAX PATH function in Power Query). To realize a materialized path, pairs would have to be swapped such that the first column has no duplicates, and second column would have to be blank at the end of the path (signifying termination - without the need for termination flags). The data is clearly not set up with this "parent-child" hierarchy in mind.

    Monday, April 2, 2018 2:55 PM
  • Maxim, thank you for your response.

    "Or connection could be considered "both ways", i.e. YYTT | EDFGmeans that EDFG connected to YYTT and we should move right to left?"

    Exactly !  The pairings in the input table just show that one node is connected to another node - it doesn't matter which one appears first.

    I don't think I explained my problem very well at any point, so I apologise for that. 

    I have been trying to figure out how I could adapt my input data to match Imke's dynamic parent-child hierarchy but I got the flu over the weekend so didn't spend much time on it yet.  If you don't mind, I'll add my last thought in my reply to Colin's post below.


    • Edited by PQ_slave Monday, April 2, 2018 5:02 PM
    Monday, April 2, 2018 3:04 PM
  • Hi Colin - thanks for your comment.

    Last night I was trying to figure out how I could adapt my data and if Imke's solution might in fact work.  My main thoughts were:

     * I remember someone suggesting that a similar problem could provide the paths even if one person reported to two or more people higher up the hierarchy - that is essential for my problem because one node could be connected to several other nodes

     * I was wondering if I could just come up with a value for [NodeKey] that is arbitrary and still use Imke's solution.  I thought I could have one column (or a list) of all of the individual nodes -- sort them A-Z -- assign an index column against that and say that is the NodeKey for each node -- then calculate the parent key for each node, adding duplicate rows where a node has more than one connection

     * I also thought I could mimic what happens in the example on Imke's hierarchy for Bill and Annabel - have parent key blank if that node terminates (has state "1").  I thought this would perhaps give the same outcome

    So, I think I agree - at the very least, I must adapt the data :D

    I welcome any further thoughts because I still feel a distinct confusion about how to use List.Generate to solve this but I can't help thinking I am making this more complicated than it needs to be.  After all, I see that it is easy to look at the problem and solve it by hand - but I don't know how to feed that process to List.Generate to implement it properly.

    Monday, April 2, 2018 3:22 PM
  • Drat !  That didn't work as planned.

    I reshaped the input data like this and ran it with Imke's solution:

    
    NodeKey Node ParentKey
    1 AABB 7
    2 ABCD 4
    3 EDFG 8
    3 EDFG 12
    4 EFGH 2
    5 FWQA 9
    6 GFHD 12
    7 HGYT 1
    7 HGYT 8
    8 PQLU 7
    8 PQLU 3
    8 PQLU 9
    9 SSEW 5
    9 SSEW 8
    10 TTYF 11
    11 UYWR 10
    12 YYTT 3
    12 YYTT 6

    and Power Query didn't do anything for quite a long time so I interrupted it.  It's then that I noticed that I can't have duplicates.

    I then removed duplicates, just to see if it would work but it doesn't even like the weird ParentKey relationship.

    Just to check my machine works OK, I entered Imke's data and it returned the same results instantly, so I know that my data (and the problem) does not match Imke's solution.

    Edit: also tried it by making ParentKey blank where termination state = "1"
    Monday, April 2, 2018 4:50 PM
  • not full solution, but I think it's a step in right direction - the way I see the problem is to have recursive concatenation of the nodes to form a chain

    my approach has few asumptions, i.e.:

    • each relation has 2 elements, start and end, that are different from each other
    • each node has 4 characters - to separate last node in a chain, can be replaced with a delimiter
    • one node can have multiple predecessesors, but only single successor - funtion only lookups the first occurence of the node

    input table 

    NodesRelations

    let
        Source = #table({"Start","End"},{{"ABCD","EFGH"},{"AABB","HGYT"}, {"HGYT","UYWR"},{"TTYF","UYWR"},{"HGYT","PQLU"},{"WOWI","PLAS"},{"SSEW","PQLU"},{"PQLU","ASDF"}})
    in
        Source

    function to find next node for a chain
    fnFindNextNode

    (prmChain as text) => 
    let
        nodeLength = 4,
        EndNode = Text.End(chain,nodeLength),
        ChainSoFar = Text.Start(chain,Text.Length(chain)-nodeLength),
        PositionOfLastNode = List.PositionOf(NodesRelations[Start],EndNode),
        End = if PositionOfLastNode = -1 then "" else NodesRelations{PositionOfLastNode}[End]
    in
        End

    function creating chains (recursive)
    fnCreateChain

    (chain as text)=>
    if fnFindNextNode(chain) = "" then
        chain    
    else 
        @fnCreateChain(chain & fnFindNextNode(chain))

    output table

    Output

    let
        Source = NodesRelations,
        #"Added Custom2" = Table.AddColumn(Source, "Chain", each fnCreateChain([Start]&[End]))
    in
        #"Added Custom2"

    please let me know if that helps


    Monday, April 2, 2018 11:18 PM
  • I see this problem as a network(graph) traversal problem. With Sources as the unique Starting nodes and Sinks as the nodes with a termination flag. 

    A few details I might have missed.

    1. Is the assumption that there are no cyclical relationships?

    2. and all of the paths have at least one termination path?

    If that is the case then recursion is your your friend. Can you share a slightly bigger sample with termination flags appended to the dataset?

    Tuesday, April 3, 2018 6:48 AM
  • Hi Marcin

    Thanks for your ideas.  I don't fully understand your solution because "chain" is undefined for fnFindNextNode - it looks like it would come from fnCreateChain but Power Query gives an error.  I'm not sure if I am missing something.

    But, I did like your approach so I need a bit more time to try and work something with it, unless you beat me to it in the meantime.

    Tuesday, April 3, 2018 1:45 PM
  • sorry, I was using parameter earlier, and forgot to replace it, try the following for fnFindNextNode

    (chain as text) => 
    let
        nodeLength = 4,
        EndNode = Text.End(chain,nodeLength),
        PositionOfLastNode = List.PositionOf(NodesRelations[Start],EndNode),
        End = if PositionOfLastNode = -1 then "" else NodesRelations{PositionOfLastNode}[End]
    in
        End

    Tuesday, April 3, 2018 4:15 PM
  • Hi Ogor

    Thanks very much for your input. 

    1.  There could be cyclical or circular connections  - for instance you could have:

      ABCD---EFGH---IJKL---MNOP---QRST---IJKL

    But I don't think this is a problem (at least for the way I was hoping it would get solved) because the connections only appear in the list once:

      ABCD EFGH

      EFGH IJKL

      IJKL MNOP

      MNOP QRST

      QRST IJKL

    2.  Apologies, I don't know what you mean by termination path, and I don't actually know what the whole network (visually) looks like.  But, same for above cyclical relationships, I feel like we don't need to know if all paths have at least one termination path.  I know that the full list describes ALL of the connections that we care about.

    Unfortunately I don't have the full dataset to hand but how would you use recursion to find the (correct) path AABB HGYT PQLU EDFG YYTT SSEW for node AABB ?  Here is the dataset (copied from above):

    ABCD   EFGH  1 0
    AABB   HGYT  0 0
    HGYT   PQLU  0 0
    TTYF   UYWR  0 1
    PQLU   EDFG  0 0
    YYTT   EDFG  0 0
    YYTT   GFHD  0 1
    SSEW   PQLU  0 0
    FWQA 
     SSEW  1 0
    Tuesday, April 3, 2018 4:57 PM
  • Thanks - I should have realised that.

    Maybe you or someone can develop it further - when I fed it the dataset in my response to Igor above it didn't like it and only returned the concatenation of the pairs.

    Start End Chain
    ABCD   EFGH  ABCD  EFGH 
    AABB   HGYT  AABB  HGYT 
    HGYT   PQLU  HGYT  PQLU 
    TTYF   UYWR  TTYF  UYWR 
    PQLU   EDFG  PQLU  EDFG 
    YYTT   EDFG  YYTT  EDFG 
    YYTT   GFHD  YYTT  GFHD 
    SSEW   PQLU  SSEW  PQLU 
    FWQA   SSEW  FWQA  SSEW 

    Tuesday, April 3, 2018 5:02 PM
  • hmm, strange - for the table you provided

    let
        Source = #table({"Start","End"},{{"ABCD","EFGH"},{"AABB","HGYT"}, {"HGYT","PQLU"},{"TTYF","UYWR"},{"PQLU","EDFG"},{"YYTT","EDFG"},{"YYTT","GFHD"},{"SSEW","PQLU"},{"FWQA","SSEW"}})
    in
        Source
    and Output table as
    let
        Source = NodesRelations,
        #"Added Custom" = Table.AddColumn(Source, "Chain", each fnCreateChain([Start]))
    in
        #"Added Custom"
    I get following:

    could you post syntax of your queries?
    Also, please note that row 3 is subchain of row 2, and row 5 is a subchain of rows 2&3, so it may make sense to only run the script for 'starting nodes' - the one that do not ocur in [End] column

    Also, regarding cyclical chains - I think it's about sth like
    AABB-HGYT
    HGYT-AABB
    which forms infinite chain AABB-HGYT-AABB-HGYT-AABB-HGYT-AABB-HGYT...
    Tuesday, April 3, 2018 8:28 PM
  • :|    OK  !  I only just understood what your script does !  I like where it is going.

    I could see that adding the "chain" for certain connected nodes might give me the result - because as a last step, I could remove duplicates.

    Your solution doesn't take account of the "state" of the node but maybe I can take account of this by altering my input table appropriately i.e. somehow ignore a node with state "1"

    As far as the difference in my output, you are probably right, it was maybe to do with how I loaded it (?)

    Tuesday, April 3, 2018 9:52 PM
  • I think I solve it - basically I add index to Start & End nodes to cover multiple successors. The chain created in the end has artificial index every 5th character, but this is easy to clean

    NodesRelation

    let
        Source = #table({"Start","End"},{{"ABCD","EFGH"},{"AABB","HGYT"}, {"HGYT","UYWR"},{"TTYF","UYWR"},{"HGYT","PQLU"},{"WOWI","PLAS"},{"SSEW","PQLU"},{"PQLU","ASDF"}}),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Start", type text}, {"End", type text}}),
        Start_Select = Table.SelectColumns(#"Changed Type",{"Start"}),
        Start_Unique = Table.Distinct(Start_Select),
        Start_Merge = Table.NestedJoin(Start_Unique,{"Start"},#"Changed Type",{"Start"},"StartRef",JoinKind.LeftOuter),
        Start_Index_Add = Table.AddColumn(Start_Merge, "Custom", each Table.AddIndexColumn([StartRef], "Index", 1, 1)),
        Start_Expand = Table.ExpandTableColumn(Start_Index_Add, "Custom", {"End", "Index"}, {"End", "Index"}),
        Start_New = Table.AddColumn(Start_Expand, "NewStart", each [Start]&Number.ToText([Index])),
        Start_Clean = Table.SelectColumns(Start_New,{"Start", "End", "NewStart"}),
        End_Select = Table.SelectColumns(Start_Clean,{"End"}),
        End_Unique = Table.Distinct(End_Select),
        End_Merge = Table.NestedJoin(End_Unique,{"End"},Start_Clean,{"Start"},"EndRef",JoinKind.Inner),
        End_Index_Add = Table.AddColumn(End_Merge, "Custom", each Table.AddIndexColumn([EndRef], "Index", 1, 1)),
        End_Expand = Table.ExpandTableColumn(End_Index_Add, "Custom", {"Index"}, {"Index"}),
        End_New = Table.AddColumn(End_Expand, "NewEnd", each [End] & Number.ToText([Index])),
        End_Clean = Table.SelectColumns(End_New,{"End","NewEnd"}),
        Final_Ref_Start = Start_Clean,
        Final_Merge = Table.NestedJoin(Final_Ref_Start,{"End"},End_Clean,{"End"},"Custom",JoinKind.LeftOuter),
        Final_Expand = Table.ExpandTableColumn(Final_Merge, "Custom", {"NewEnd"}, {"NewEnd_temp"}),
        Final_NewEnd = Table.AddColumn(Final_Expand, "NewEnd", each if [NewEnd_temp] = null then [End] & "1" else [NewEnd_temp]),
        Final_Clean = Table.SelectColumns(Final_NewEnd,{"Start", "End", "NewStart", "NewEnd"})
    in
        Final_Clean

    fnFindNextNode

    (chain as text) => 
    let
        nodeLength = 5,
        EndNode = Text.End(chain,nodeLength),
        PositionOfLastNode = List.PositionOf(NodesRelations[NewStart],EndNode),
        End = if PositionOfLastNode = -1 then "" else NodesRelations{PositionOfLastNode}[NewEnd]
    in
        End

    Output

    let
        Source = NodesRelations,
        #"Added Custom2" = Table.AddColumn(Source, "Chain", each fnCreateChain([NewStart]&[NewEnd]))
    in
        #"Added Custom2"



    Wednesday, April 4, 2018 1:33 PM
  • Hi Marcin - that's great to hear.

    I can't wait to test it out.  I'll try it with my bigger data set as soon as I can get it together.

    We'll have to see about getting it to take in to account the "state" of each node so that the chain stops when it gets to a node with state "1".

    Thank you again


    Wednesday, April 4, 2018 7:22 PM
  • Hi PQ_slave,

    it's been a while since we last heard from you: Have you been able to test the solutions out?

    Please mark the suitable solution as answer or provide more details of what's still missing. Thanks!


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Sunday, April 15, 2018 8:10 AM
    Moderator
  • Imke, thanks for asking for feedback about the help that has been offered to solve my query. I feel terrible for not having been able to respond. Unfortunately I seem to have had the Australian flu, which seemed like normal (horrible) flu at the beginning and  just when I started feeling better it then turned really bad (I've gone from really healthy and active to extremely tired and inactive).

    Especially to Marcin, and also to anybody else who put forward some help, I am very sorry for not having responded. It is still a problem I need to solve and when I am back on track physically and have caught up on the other pressing things that I haven't attended to I will come back to the thread and explain what I have found from the solutions put forward. 

    Saturday, April 21, 2018 11:23 AM