none
Power query grouping based on criteria RRS feed

  • General discussion

  • Hi,

    Thank you in Advance.

    Need data output based on criteria cells by using power query.

    Input: 6602

    Available Criteria (5 column): 1600;2705;5155;11970;3870

    Required Output (5 column):  1600; 2705; 2297;0;0

    Remarks: 

    1. output total equals to input.

    2. Since i could not paste image, i have used text explanation.  consider ";" as column separator.

    3. We have very big table to achieve results based on power query logic.  One record i have explained above.

    Regards

    sps_dani

    Tuesday, April 10, 2018 6:02 AM

All replies

  • Hey,

    Would you mind posting some sample dataset? around 5 to 10 rows of data should be enough. What I have on top of my head is simply a new column that adds the values from all 5 columns and then it filters that new column based on the "user input value" so it only shows the records that match that condition.

    Let me know if this is what you're aiming at.

    Tuesday, April 10, 2018 6:55 AM
  • Hi

    Thanks for your immediate response.

    Please find the three records;  All the inputs are available in database. We need only output based on data.

    Input

    Part no|InputQty|Coln.A|Coln.B|Coln.C|Coln.D|Coln.E|ResultA|ResultB|ResultC|ResultD|ResultE

    A1|6602|1600|2705|5155|11970|3870|1600|2705|2297|0|0

    A2|4300|1600|2705|5155|11970|3870|1600|2700|0|0|0

    A3|500|1600|2705|5155|11970|3870|500|0|0|0|0

    Remark:

    1. Input Value 6602; Need output value 6602 by result in 5 columns 1600;2705;2297;0;0.

    2. Coln.A to Coln.E is the criteria

    System does not allow to paste image of my table.  Sorry!

    Tuesday, April 10, 2018 7:33 AM
  • You could also paste tables as html:

    Part no InputQty Coln.A Coln.B Coln.C Coln.D Coln.E ResultA ResultB ResultC ResultD ResultE
    A1 6602 1600 2705 5155 11970 3870 1600 2705 2297 0 0
    A2 4300 1600 2705 5155 11970 3870 1600 2700 0 0 0
    A3 500 1600 2705 5155 11970 3870 500 0 0 0 0

    when you say that the "output value 6602 by result in 5 columns", you mean that the sum of columns A through E should be equal to 6602? or what type of operation or logic are you trying to use?

    Based on the data provided, none of the 3 records have a case where the InputQty is equal to the sum of Columns A through E.

    Perhaps you're trying to use an optimization or min/max scenario? in that case, solver would be a better option for this. You could make things happen in Power Query, but I'm not sure that would be the most efficient way to do it.

    I guess what I'm trying to say is that I'm not entirely sure what you're after. Could you please submit a sample of how your output table should look like?

    Tuesday, April 10, 2018 8:28 AM
  • @sps_dani

    Not sure why you need to do this with Power Query (very easy solution exist w/Excel...).

    The blue Table is named InputTable in the workbook. Query:

    let
        fnSetMaxValue = (recList as list, colNum as number) as number =>
            let
                // recList{0} contains the Part Num
                qty = recList{1},
                valList = List.Range(recList, 2, colNum),
                valSum = List.Sum(valList),
                prevSum = if colNum = 1
                          then 0
                          else List.Sum(List.Range(recList, List.Count(recList)-colNum+1, colNum-1)),
                Result = if valSum > qty
                         then qty - prevSum
                         else valList{colNum-1}
            in
                Result,
    
    
        Source = Excel.CurrentWorkbook(){[Name="InputTable"]}[Content],
        SourceChanged = Table.TransformColumnTypes(Source,{{"Part Num", type text}, {"InputQty", Int64.Type},
            {"Val1", Int64.Type}, {"Val2", Int64.Type}, {"Val3", Int64.Type}, {"Val4", Int64.Type}, {"Val5", Int64.Type}}),
    
        Result1 = Table.AddColumn(SourceChanged, "Result 1", each fnSetMaxValue(Record.ToList(_), 1), Int64.Type),
        Result2 = Table.AddColumn(Result1, "Result 2", each fnSetMaxValue(Record.ToList(_), 2), Int64.Type),
        Result3 = Table.AddColumn(Result2, "Result 3", each fnSetMaxValue(Record.ToList(_), 3), Int64.Type),
        Result4 = Table.AddColumn(Result3, "Result 4", each fnSetMaxValue(Record.ToList(_), 4), Int64.Type),
        Result5 = Table.AddColumn(Result4, "Result 5", each fnSetMaxValue(Record.ToList(_), 5), Int64.Type),
    
        OutputTable = Table.SelectColumns(Result5,{"Part Num", "InputQty", "Result 1", "Result 2", "Result 3", "Result 4", "Result 5"})
    in
        OutputTable


    • Edited by Lz._ Tuesday, April 10, 2018 4:53 PM typo
    Tuesday, April 10, 2018 3:24 PM
  • Thank you, Thunderlight! 

    I'm quite slow when it comes to understanding new scenarios just by reading them, so your post is extremely helpful.

    I think that I got it now? (hopefully!)

    Here's my take in the event that you might have more columns and not just those 5:

    let
        sps_dani=  (MaxValue as number, TotalSteps as number, Values as list ) as list =>
     List.Generate( ()=>
     [Counter= 1, RunningTotal= List.Sum( List.FirstN( Values, Counter)), Value = if RunningTotal > MaxValue then MaxValue else Values{Counter-1}  ],
    each [Counter] <= TotalSteps,
    each [ Counter= [Counter]+1, RunningTotal=List.Sum( List.FirstN( Values, Counter)), Value= if Temporal <0 then 0 else Temporal, Temporal= if RunningTotal > MaxValue then MaxValue - List.Sum(List.FirstN( Values, Counter-1)) else Values{Counter-1}],
    each [Column= [Counter], Output= [Value]]),
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY+5EcAgDAR7ISZAQuIJXQdD/21YJ7A944iE4zS7PGOEi0IMrJqSJRUPrkktlBRB1CumuXmIITOayCj52NPHy7ucaa1tTV5NegfnqpPrWZjxYnGOdyAgAMBfijPlu9I+4fQ/bC2+lTDnDQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Part no" = _t, InputQty = _t, Coln.A = _t, Coln.B = _t, Coln.C = _t, Coln.D = _t, Coln.E = _t, Coln.F = _t]),
        #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Part no", "InputQty"}, "Attribute", "Value"),
        #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"InputQty", Int64.Type}, {"Value", Int64.Type}}),
        #"Grouped Rows" = Table.Group(#"Changed Type", {"Part no", "InputQty"}, {{"Count", each Table.RowCount(_), type number}, {"Grouped", each _, type table}}),
        #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Values", each Table.Column([Grouped],"Value")),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Grouped"}),
        #"Invoked Custom Function" = Table.AddColumn(#"Removed Columns", "Query1", each sps_dani([InputQty], [Count], [Values])),
        #"Removed Columns1" = Table.RemoveColumns(#"Invoked Custom Function",{"Values", "Count"}),
        #"Expanded Query1" = Table.ExpandListColumn(#"Removed Columns1", "Query1"),
        #"Expanded Query2" = Table.ExpandRecordColumn(#"Expanded Query1", "Query1", {"Column", "Output"}, {"Column", "Output"}),
        #"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Expanded Query2", {{"Column", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Expanded Query2", {{"Column", type text}}, "en-US")[Column]), "Column", "Output"),
        #"Changed Type1" = Table.TransformColumnTypes(#"Pivoted Column",{{"5", Int64.Type}, {"4", Int64.Type}, {"3", Int64.Type}, {"2", Int64.Type}, {"1", Int64.Type}})
    in
        #"Changed Type1"

    The best function for this specific case would be the List.Generate since you need to not only recursively work on the results of a prior work, but also store them after they are calculated which is exactly the goal of List.Generate.

    The List.Generate function might seem scary at first and I agree. It kinda is at first, but is actually quite straightforward. Here's a few videos that explain that formula in the event that you need it:

    Video 1

    Video 2

    Wednesday, April 11, 2018 9:14 AM
  • Hey Miguel

    I was sure there was a clever solution with List.Generate (thanks for the videos) but I'm not at that stage yet...
    I figured out how to get the output column names as "Blabla 1", "Blabla 2"... instead of 1, 2, ... and transformed the Output column to Int64.Type after Query 2 has been expanded (to avoid the Change type).

    What I couldn't figure out, inside the function, is how to handle the case where the InputQty/MaxValue is null. If it is this outputs an error (I understand why). If you have time I'm curious to understand how to return a null value (or 0) instead

    Thanks again

    Wednesday, April 11, 2018 1:25 PM
  • Another option, based on iteratively adding columns to the table.

    let
           //Custom function - transforms input list into subtotals of totalQty
           List.CalculateSubtotals = (list as list, totalQty as number) =>
           let
               replacedNulls = List.ReplaceValue(list, null, 0, Replacer.ReplaceValue),
               runningTotal = List.Skip(List.Accumulate(replacedNulls, {0}, (state, curr) => state & {List.Last(state) + curr})),
               transformedInputList = List.Transform( List.Positions(list), (curr) => try if runningTotal{curr} <= totalQty then list{curr} else totalQty - runningTotal{curr - 1} otherwise totalQty                                  ),
               result = List.Transform(transformedInputList, (curr) => if curr < 0 or curr = null then 0 else curr)
           in
               result,
    
        //Main code
        Source = Table.Buffer(Excel.CurrentWorkbook(){[Name="Table1"]}[Content]),
        ColumnNames = {"Coln.A", "Coln.B", "Coln.C", "Coln.D", "Coln.E"},
        AddedColumnNames = {"ResultA", "ResultB", "ResultC", "ResultD", "ResultE"},
        //Add a new column for each item in AddedColumnNames with corresponding value calculated in List.CalculateSubtotals function
        AddedColumns = List.Accumulate(List.Positions(ColumnNames), Source, (state, current) => Table.AddColumn(state, AddedColumnNames{current}, 
                            (i) => try List.CalculateSubtotals(Record.FieldValues(Record.SelectFields(i, ColumnNames)), Record.Field(i, "InputQty")){current} otherwise null)
                       ),
        RemovedColumns = Table.RemoveColumns(AddedColumns, ColumnNames)
    in
        RemovedColumns



    Wednesday, April 11, 2018 7:43 PM
  • Hey Miguel

    I was sure there was a clever solution with List.Generate (thanks for the videos) but I'm not at that stage yet...
    I figured out how to get the output column names as "Blabla 1", "Blabla 2"... instead of 1, 2, ... and transformed the Output column to Int64.Type after Query 2 has been expanded (to avoid the Change type).

    What I couldn't figure out, inside the function, is how to handle the case where the InputQty/MaxValue is null. If it is this outputs an error (I understand why). If you have time I'm curious to understand how to return a null value (or 0) instead

    Thanks again

    hey!

    If the input of a function is a null then it'll give an error for missing arguments. I wouldn't recommend making your function accept nulls, but if you want it to accept nulls then you can change the first part to be something like this:

    (MaxValue as nullable number, TotalSteps as number, optional Values as list ) as list =>

    you can define your arguments or variables like "nullable" or "optional" and then your function will be able to accept nulls. Of course, you'd need to arrange your code for those cases where 'null' is the input for one of the arguments and usually "try otherwise" is something that comes in handy.

    Again, I wouldn't recommend making your function accept nulls. Instead, I'd create a step that replaces nulls for the number 0.

    Wednesday, April 11, 2018 10:59 PM
  • If the input of a function is a null then it'll give an error for missing arguments. I wouldn't recommend making your function accept nulls, but if you want it to accept nulls then you can change the first part to be something like this:

    (MaxValue as nullable number, TotalSteps as number, optional Values as list ) as list =>

    you can define your arguments or variables like "nullable" or "optional" and then your function will be able to accept nulls. Of course, you'd need to arrange your code for those cases where 'null' is the input for one of the arguments and usually "try otherwise" is something that comes in handy.

    Again, I wouldn't recommend making your function accept nulls. Instead, I'd create a step that replaces nulls for the number 0

    Hi Miguel,

    I get the impression that Thunderlight was referring to nulls with respect to the specific solution. A global statement is too general to be useful in addressing the issue at hand.

    Specifically, the InputQty value drives the entire calculation, so it should never be null. If it is null, then converting to zero is not helpful because calculating the result columns based on zero input qty is meaningless. In this case, all of the result columns should be null or display errors.

    In the case where any value in Coln.A, Coln.B, etc., is null, it makes sense to convert the null values to zeros.

    Thursday, April 12, 2018 3:36 AM
  • My 5c to this task (do not sure if it is optimal, just tried another solution). Based on the table provided

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        BatchesOnly = Table.ColumnNames(Table.RemoveColumns(Source, {"Part no","InputQty"})),
        AddedBatches = Table.AddColumn(
            Source, 
            "batches", 
            each Record.FieldValues(Record.SelectFields(_,BatchesOnly))
            ),
        AddedResults = Table.AddColumn(
            AddedBatches, 
            "Custom", 
            each List.Accumulate(
                [batches], 
                [
                    rem=[InputQty],
                    flag=false,
                    result ={}
                    ], 
                (s,c) => 
                    if s[flag] then 
                        [rem=0, flag = true, result = s[result] & {0}] 
                    else 
                        if s[rem]>c then 
                            [rem=s[rem]-c, flag = false, result=s[result] & {c}] 
                        else 
                            [rem=s[rem], flag = true, result=s[result] & {s[rem]}]
                )[result]
            ),
        RemovedOtherColumns = Table.SelectColumns(AddedResults,{"Part no", "InputQty", "Custom"}),
        BatchesToRecords = Table.TransformColumns(RemovedOtherColumns,{"Custom", each Record.FromList(_, BatchesOnly)}),
        Expanded = Table.ExpandRecordColumn(BatchesToRecords, "Custom", BatchesOnly, BatchesOnly)
    in
        Expanded


    Maxim Zelensky Excel Inside

    Thursday, April 12, 2018 11:12 AM
  • Hi

    Thanks for your immediate response.

    Please find the three records;  All the inputs are available in database. We need only output based on data.

    Input

    Part no|InputQty|Coln.A|Coln.B|Coln.C|Coln.D|Coln.E|ResultA|ResultB|ResultC|ResultD|ResultE

    A1|6602|1600|2705|5155|11970|3870|1600|2705|2297|0|0

    A2|4300|1600|2705|5155|11970|3870|1600|2700|0|0|0

    A3|500|1600|2705|5155|11970|3870|500|0|0|0|0

    Remark:

    1. Input Value 6602; Need output value 6602 by result in 5 columns 1600;2705;2297;0;0.

    2. Coln.A to Coln.E is the criteria

    System does not allow to paste image of my table.  Sorry!


    Hey Dani If any of the suggested solutions work for you, please mark them as the answer for this question. If we missed te mark, let us know.
    Saturday, April 14, 2018 3:51 PM
  • I've compared all of the provided solutions for 1000 rows, and these are my preliminary observations:

    - Refresh times are similar for all solutions (Miguel's solution is only slightly slower than the others)
    - Miguel's solution works only if values in the Part no. column are unique. However, when the Part numbers are unique, they are ordered incorrectly. Therefore, this solution would have to be considered as the least compliant, unless the issue can be addressed
    - My solution outputs the correct result column names (observing the extra detail), but this issue can be fixed easily in the other solutions.
    - My solution handles nulls in the Coln columns and the InputQty column. However, it is not clear if these columns can have empty (null) values. At any rate, I used just one step to accommodate this possibility.
    - Thunderlight's solution can be improved by collapsing the five result steps into a single step using an iterative process (thus avoiding unnecessary code duplication)
    - I've simplified the custom function in my solution somewhat. Original code was too laborious for my liking.

    I'm impressed by the efforts of all contributors. :)

    Sunday, April 15, 2018 7:27 PM
  • - Miguel's solution works only if values in the Part no. column are unique. However, when the Part numbers are unique, they are ordered incorrectly. Therefore, this solution would have to be considered as the least compliant, unless the issue can be addressed

    - My solution outputs the correct result column names (observing the extra detail), but this issue can be fixed easily in the other solutions.

    I feel like the way that you've written this feels more of a sales pitch of your solution rather than just pure feedback lol. Specially since the OP hasn't provided any feedback to any of us regarding if we did it right or not. 

    My code and Maxim's are the only one that take in consideration N amount of columns and not just the ones provided, so one could argue that they are more dynamic or flexible in that sense.

    When you say that they are ordered incorrectly, what exactly do you mean? I'm not able to replicate any issues.

    EDIT: wait, were you trying to say that WHEN there are duplicates on Part No that it'll cause trouble? it might! - it depends on the combination of PartNo and InputQty. If you have dupes on that combo then for sure it'll throw some errors, but I really wanted to drive most of the solution via the UI and just use that sps_dani function as the only "advanced coding" technique.
    Sunday, April 15, 2018 8:46 PM
  • I feel like the way that you've written this feels more of a sales pitch of your solution rather than just pure feedback lol.

    It certainly wasn't my intention, but I see how it could be perceived as such. For the record, I do really like the other solutions. They are clever, and the OP should choose a solution that best meets the requirements.

    Specially since the OP hasn't provided any feedback to any of us regarding if we did it right or not

    Well, except for your solution, everyone else's solution worked with my 1000 row dataset. After copying the original 3-row table down to row 1001 in Excel, the 'Part no' column values are repeated every three rows. Your solution is the only one that returned only 3 rows in the result instead of 1000. After changing the 'Part no' values to be unique, only your solution resulted in the part number values being out of order. Somehow, I do not believe that the OP would consider these results to be "right".

    My code and Maxim's are the only one that take in consideration N amount of columns and not just the ones provided, so one could argue that they are more dynamic or flexible in that sense.

    In my work, I follow my client's requirements precisely. Therefore, given that the OP indicated "Available Criteria (5 columns) and Required Output (5 column)", I have no reason to exceed that requirement. However, I am obliged to output the required column names. It is not possible to use a variable number of columns with very specific result column names. In essence, what you've done is give the OP something that wasn't asked for (variable number of columns with numbered column names) , instead of something that was requested (specific column names).

    "...but I really wanted to drive most of the solution via the UI and just use that sps_dani function as the only "advanced coding" technique."

    And that's fine, if the solution outputs the correct results. It's much easier to test this type of stuff in Excel than in Power BI Desktop.

    Sunday, April 15, 2018 10:47 PM
  • I did imagine that you tried copying the data and that it resulted in that issue. I actually ran a test with unique part no prior to posting this and didn't find any issues in terms of sorting. Furthermore, the group by function shouldnt try to order your results and should use the original order of the table for the grouped values, so order issues shouldn't happen or there might be a bigger issue here with Power Query. I can understand your reasoning behind your approach. Is completely valid, but i guess that my approach is different as usually my customers rely on me to look at their situation, study it and provide a solution based on changing or dynamic variables or simply a solution that is beyond what their original expectation was - usually in terms of how dynamic or scalable is since we live in a world where new requirements pop up every single day. Edit: in English, my clients call this future proofing or future ready. We could exchange many replies here, but until we get any info from the OP were basically just guessing. I just hope that the OP sees the responses and that he/she can have an easy time adjusting the solutions to his specific dataset with the correct field names and everything.

    • Edited by Miguel Escobar Sunday, April 15, 2018 11:51 PM Got the name!
    Sunday, April 15, 2018 11:34 PM
  • Paz, hermano.

    I could easily change my solution to accommodate variable columns with minimal modification. My point simply is that you cannot meet the condition of having specific column names and variable names simultaneously. For example, in your solution, you could add a change column name step to replace 1, 2, 3 etc. with a specific list like {"ResultA", "ResultB"..., "ResultE"}. But here you are replacing the "variable" names with a specific fixed list. Anytime there are new columns, you have to manually add a new name to the fixed list of replacement names, thus defeating the point of the variable columns. An important point is that column names exposed to a user must be meaningful, otherwise there is no context for the meaning of the values in a column. For example, what does "1" represent as a column name? What does it mean if the table is loaded to the data model? Usually, solutions that are based on variable columns have actual, meaningful column names (e.g. when using Table.UnpivotOtherColumns, or Table.Combine).

    As for the problems I discovered when attempting to use your solution, there are a few things I observed:

    1) The Expand Query1 step expands the table rows 5-fold, which would explain why the query runs a bit slower than the other solutions. As the original table gets larger and larger, the additional rows added will cause the solution to run slower and slower. The other solutions maintain the original row count throughout. 

    2) The Pivoted Column step has the effect of sorting the 'Part no' column in ascending order. The values I'm using in the column are A1-A1000. The Pivoted Column step orders this column as A1, A10, A100, A1000, A101, A102, etc. The other solutions do not suffer from this problem.

    3) If I change 'Part no' values to a repeating list of A1, A2, and A3, the sort order introduced by the 'Pivoted Column' step puts all A1 at the top, followed by all A2, and all A3. More insidiously, the number of result rows are less than 1000, which means that some values that you are grouping are being combined (due to duplicate of 'Part no' and 'InputQty' in some rows). This is definitely a bad result. The other solutions do not have this problem.

    Monday, April 16, 2018 1:30 AM