locked
Using M to selectively evaluate data beyond the current record based on current-record data points RRS feed

  • Question

  • I need to figure out a way of performing operations on a limited subsets of my data in order to populate a new column with useful information (i.e., the data has to come from a record in the same data file other than the row on which it will be deposited in the new column). The relevant subset of data to evaluate is dependent on a data point in a field in the current data row; I can't figure out how to do this in M.

    The code below is a start to my end solution, but I am getting hung up on the #"Added Conditional Column2" step. The objective for this newly created column is to put a 'Y' in it if the row contains the smallest value in the "Distance" column--but not the entire column; rather, only the column for a subset of the table, specifically the rows of the table where the "Type" column = 'EOM balance' and the "Grouping Table.Activity target EOM row reference value" column equals whatever value is present in that column on the current row (a row-sensitive subset).

    I'm working with the following sample data:

    Row Type	Some data field	Index	EOM row reference value	Activity target EOM row reference value
    BOM balance	Bill	8		
    New account	Melva	9		
    Closed account	Scott	10		
    EOM balance	Rob	14	14.444	
    EOM balance	Steve	14	14.222	
    EOM balance	Jonathan	14	14.333	
    EOM balance	Blaine	9	9.111	
    EOM balance	Patty	9	9.333	
    EOM balance	Barbara	9	9.555	
    EOM balance	John	9	9.111	
    Activity	Neil	9		9.333
    Activity	Tom	9		9.444
    Activity	Jeremy	9		9.777
    Activity	Jason	10		10.555
    Activity	Kate	14		14.111
    


    And here is my M code:

    let
        Source = Csv.Document(File.Contents("H:\Misc\Power Query experiment\Data.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]),
        #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Row Type", type text}, {"Some data field", type text}, {"Index", type number}, {"EOM row reference value", type number}, {"Activity target EOM row reference value", type number}}),
        #"Sorted Rows" = Table.Sort(#"Changed Type",{{"EOM row reference value", Order.Ascending}}),
        #"Added Custom" = Table.AddColumn(#"Sorted Rows", "Grouping Index", each 1),
        #"Grouped Rows" = Table.Group(#"Added Custom", {"Grouping Index"}, {{"Grouping Table", each _, type table [Row Type=text, Some data field=text, Index=number, EOM row reference value=number, Activity target EOM row reference value=number, Grouping Index=number]}, {"Distribution Table", each _, type table [Row Type=text, Some data field=text, Index=number, EOM row reference value=number, Activity target EOM row reference value=number, Grouping Index=number]}}),
        #"Expanded Grouping Table" = Table.ExpandTableColumn(#"Grouped Rows", "Grouping Table", {"Row Type", "Some data field", "Index", "EOM row reference value", "Activity target EOM row reference value", "Grouping Index"}, {"Grouping Table.Row Type", "Grouping Table.Some data field", "Grouping Table.Index", "Grouping Table.EOM row reference value", "Grouping Table.Activity target EOM row reference value", "Grouping Table.Grouping Index"}),
        #"Removed Columns" = Table.RemoveColumns(#"Expanded Grouping Table",{"Grouping Index", "Grouping Table.Grouping Index"}),
        #"Expanded Distribution Table" = Table.ExpandTableColumn(#"Removed Columns", "Distribution Table", {"Index", "EOM row reference value"}, {"Distribution Table.Index", "Distribution Table.EOM row reference value"}),
        #"Filtered Rows" = Table.SelectRows(#"Expanded Distribution Table", each ([Distribution Table.EOM row reference value] <> null)),
        #"Added Conditional Column" = Table.AddColumn(#"Filtered Rows", "Distribution Table.Index Targeted", each if [Distribution Table.Index] = [Grouping Table.Index] then [Distribution Table.Index] else null),
        #"Reordered Columns" = Table.ReorderColumns(#"Added Conditional Column",{"Grouping Table.Row Type", "Grouping Table.Some data field", "Grouping Table.Index", "Grouping Table.EOM row reference value", "Grouping Table.Activity target EOM row reference value", "Distribution Table.Index", "Distribution Table.Index Targeted", "Distribution Table.EOM row reference value"}),
        #"Filtered Rows1" = Table.SelectRows(#"Reordered Columns", each ([Distribution Table.Index Targeted] <> null)),
        #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Distribution Table.Index", "Distribution Table.Index Targeted"}),
        #"Added Conditional Column1" = Table.AddColumn(#"Removed Columns1", "Distance", each if [Grouping Table.Row Type] <> "Activity"
                                                                                          then null
                                                                                          else if [Grouping Table.Activity target EOM row reference value] = [Distribution Table.EOM row reference value]
                                                                                             then 0
                                                                                             else if [Grouping Table.Activity target EOM row reference value] > [Distribution Table.EOM row reference value]
                                                                                                then [Grouping Table.Activity target EOM row reference value] - [Distribution Table.EOM row reference value]
                                                                                                else 1 + [Distribution Table.EOM row reference value] - [Grouping Table.Activity target EOM row reference value]),
        #"Changed Type2" = Table.TransformColumnTypes(#"Added Conditional Column1",{{"Distance", type number}}),
    
        #"Added Conditional Column2" = Table.AddColumn(#"Changed Type2", "Keep", each if [Distance] = List.Min(Table.SelectRows(#"Changed Type2",
                                                                                                                                each [Grouping Table.Activity target EOM row reference value] = [Grouping Table.Activity target EOM row reference value])[Distance])
                                                                                                then "Y"
                                                                                                else "N"),
        #"Changed Type1" = Table.TransformColumnTypes(#"Added Conditional Column2",{{"Keep", type text}})
    
    in
        #"Changed Type1"

    Please help.

    Wednesday, May 15, 2019 7:51 PM

Answers

  • Thanks Aleksei. Sorry the data was unclear. For the last five rows (the ones of “Row Type” = Activity, the numbers in the penultimate column actually belong in the last column. So you have different “Row Type” data rows in the flat file, and depending on which type it is, different columns will be populated. Looking at the raw data for a moment and trying to envision what we are trying to do with it, the “Index” column denotes the different customer accounts I have in my data. There are only four: 8, 9, 10, and 14. You can also see that each one of these has one or more rows, and these rows may be of various different types, as indicated in “Row Type”. We are trying to add a column that provides another data point for the Activity type rows (the new column’s values can just be null for all of the other row types). Each new data point will come from the “EOM row reference column”, and we will only need to focus on the EOM balance type rows in order to find it. As for the exact row that will contain the needed data, I will try to explain it logically and then give you examples: (1) if there are one or more EOM balance type rows that contains in the “EOM row reference value” the exact same value as the current Activity type row has in the “Activity target EOM row reference value”, then we want that value in our new column (for instance, row 11 would have 9.333 in the new column, because there is an exact match for this value); otherwise (2) if there are any EOM balance type rows that contain in the “EOM row reference value” values that are less than the current Activity type row has in the “Activity target EOM row reference value”, but that have the same value in the “Index” column that the current Activity type row has in the “Index” column, then we want the largest of those values in the our new column (for instance, row 12 would also have 9.333 in the new column, because although there is no exact match for 9.444, there are smaller values than this on applicable rows, and the largest of those is 9.333); otherwise (3) we want the smallest value in the “EOM row reference value” column of any EOM balance type rows that have the same “Index” column value as the current Activity type row (for instance, row 16 would have 14.222 in the new column, because although there is no exact match for 14.111 and there are no smaller values on applicable rows, there are nonetheless some EOM balance type rows with the same index as this row, and the smallest of these is 14.222). (For now, I’m going to ignore dealing with what I want to do with rows of “Row Type” = Activity where there are no rows of “Row Type” = EOM balance that bear the same Index.) In order to get there, if you walk through some of the transformations, you’ll see that I ‘double grouped’ the data and then expanded the resulting table columns, essentially “squaring” my number of rows so that I could get those data points discussed above together on the same rows; after deleting rows and columns that are just in the way, I proceeded to address the question of which row in a given block should be kept—I.e, which one or ones fits the criteria above. To do that, I reduced the entire criteria set down to one measurement I put in a new column called “Distance”. You can eyeball the distance column for each set of rows to see which is the smallest and therefore should be kept; but I need to do more than eyeball, I need the “Keep” column to evaluate each set of related rows and make that determination for me, flagging it with a “Y”. But it seems like the way I’ve written the code, the “Keep” column seems to be evaluating all of the rows together instead of in separate blocks, so all but the ones with the single lowest “Distance” value (zero) are receiving a “N”.
    • Marked as answer by Alex MSTN Thursday, May 16, 2019 1:43 PM
    Wednesday, May 15, 2019 11:07 PM

All replies

  • Hi Alex,

    There are some obscure (for me) parts in your code. First of all, I guess, there are missing values in the last column (Activity target...) of your sample data:

    For this reason the values in Distance column are only nulls or errors.

    Besides, this part of #"Added Conditional Column2" step is unclear:

    [Grouping Table.Activity target EOM row reference value] = [Grouping Table.Activity target EOM row reference value]

    I guess, you should specify the result you try to figure out. It could speed up finding the solution. I'm pretty sure, there is more simple way to solve the problem.

    Wednesday, May 15, 2019 9:52 PM
  • Thanks Aleksei. Sorry the data was unclear. For the last five rows (the ones of “Row Type” = Activity, the numbers in the penultimate column actually belong in the last column. So you have different “Row Type” data rows in the flat file, and depending on which type it is, different columns will be populated. Looking at the raw data for a moment and trying to envision what we are trying to do with it, the “Index” column denotes the different customer accounts I have in my data. There are only four: 8, 9, 10, and 14. You can also see that each one of these has one or more rows, and these rows may be of various different types, as indicated in “Row Type”. We are trying to add a column that provides another data point for the Activity type rows (the new column’s values can just be null for all of the other row types). Each new data point will come from the “EOM row reference column”, and we will only need to focus on the EOM balance type rows in order to find it. As for the exact row that will contain the needed data, I will try to explain it logically and then give you examples: (1) if there are one or more EOM balance type rows that contains in the “EOM row reference value” the exact same value as the current Activity type row has in the “Activity target EOM row reference value”, then we want that value in our new column (for instance, row 11 would have 9.333 in the new column, because there is an exact match for this value); otherwise (2) if there are any EOM balance type rows that contain in the “EOM row reference value” values that are less than the current Activity type row has in the “Activity target EOM row reference value”, but that have the same value in the “Index” column that the current Activity type row has in the “Index” column, then we want the largest of those values in the our new column (for instance, row 12 would also have 9.333 in the new column, because although there is no exact match for 9.444, there are smaller values than this on applicable rows, and the largest of those is 9.333); otherwise (3) we want the smallest value in the “EOM row reference value” column of any EOM balance type rows that have the same “Index” column value as the current Activity type row (for instance, row 16 would have 14.222 in the new column, because although there is no exact match for 14.111 and there are no smaller values on applicable rows, there are nonetheless some EOM balance type rows with the same index as this row, and the smallest of these is 14.222). (For now, I’m going to ignore dealing with what I want to do with rows of “Row Type” = Activity where there are no rows of “Row Type” = EOM balance that bear the same Index.) In order to get there, if you walk through some of the transformations, you’ll see that I ‘double grouped’ the data and then expanded the resulting table columns, essentially “squaring” my number of rows so that I could get those data points discussed above together on the same rows; after deleting rows and columns that are just in the way, I proceeded to address the question of which row in a given block should be kept—I.e, which one or ones fits the criteria above. To do that, I reduced the entire criteria set down to one measurement I put in a new column called “Distance”. You can eyeball the distance column for each set of rows to see which is the smallest and therefore should be kept; but I need to do more than eyeball, I need the “Keep” column to evaluate each set of related rows and make that determination for me, flagging it with a “Y”. But it seems like the way I’ve written the code, the “Keep” column seems to be evaluating all of the rows together instead of in separate blocks, so all but the ones with the single lowest “Distance” value (zero) are receiving a “N”.
    • Marked as answer by Alex MSTN Thursday, May 16, 2019 1:43 PM
    Wednesday, May 15, 2019 11:07 PM