none
SUMIFS in Power Query RRS feed

  • Question

  • I have a Base table like this (about 35000 rows)

    Employee ID Schedule ID StartDate (in 2017 year) EndDate  (in 2017 year) Month Monthly Salary
    2719 №3 01.01.2017 31.01.2017 01.01.2017 18000
    2719 №3 01.02.2017 28.02.2017 01.02.2017 18000
    2719 №3 01.03.2017 31.03.2017 01.03.2017 18000
    2719 №3 01.04.2017 30.04.2017 01.04.2017 18000
    2719 №3 01.05.2017 24.05.2017 01.05.2017 18000
    2719 №3 25.05.2017 31.05.2017 01.05.2017 20000
    2719 №3 01.06.2017 30.06.2017 01.06.2017 20000
    2719 №3 01.07.2017 31.07.2017 01.07.2017 20000
    2719 №3 01.08.2017 07.08.2017 01.08.2017 20000
    2719 №2 08.08.2017 31.08.2017 01.08.2017 25000
    2719 №2 01.09.2017 30.09.2017 01.09.2017 25000
    2719 №2 01.10.2017 31.10.2017 01.10.2017 25000
    2719 №2 01.11.2017 30.11.2017 01.11.2017 25000
    2719 №2 01.12.2017 31.12.2017 01.12.2017 25000

    And Schedule table like this (showing for each day and schedule ID the daily proportion of working hours on schedule (in months)) (about 48000 rows (365 days*131 Schedule ID))

    Schedule ID Month Date The daily proportion of working hours on schedule (in month)
    №2 1 01.01.2017 0
    №3 1 01.01.2017 0
    №2 1 02.01.2017 0
    №3 1 02.01.2017 0
    №2 1 03.01.2017 0
    №3 1 03.01.2017 0
    №2 1 04.01.2017 0
    №3 1 04.01.2017 0
    №2 1 05.01.2017 0
    №3 1 05.01.2017 0
    №2 1 06.01.2017 0
    №3 1 06.01.2017 0
    №2 1 07.01.2017 0
    №3 1 07.01.2017 0
    №2 1 08.01.2017 0
    №3 1 08.01.2017 0
    №2 1 09.01.2017 0,058823529
    №3 1 09.01.2017 0
    №2 1 10.01.2017 0,058823529
    №3 1 10.01.2017 0,082089552
    №2 1 11.01.2017 0,058823529
    №3 1 11.01.2017 0,082089552
    №2 1 12.01.2017 0,058823529
    №3 1 12.01.2017 0


    I need to calculate for each employee the weighted average salary for the each month, taking into account its changes and work schedules.

    This I try to make in power query (get data from Schedule table) This I need to get
    Sum of the daily proportion for the period in row key (Employee Number & Month) Sum by key Monthly Salary for row corrected
    1 2719;42736 1 18000
    1 2719;42767 1 18000
    1 2719;42795 1 18000
    1 2719;42826 1 18000
    0,75 2719;42856 1 13500
    0,25 2719;42856 1 5000
    1 2719;42887 1 20000
    1 2719;42917 1 20000
    0,220430108 2719;42948 1,0018509 4400,457465
    0,781420765 2719;42948 1,0018509 19499,42817
    1 2719;42979 1 25000
    1 2719;43009 1 25000
    1 2719;43040 1 25000
    1 2719;43070 1 25000

    In Excel I can do this by making the key (Employee Number & Month) and aggregate sum of the daily proportion from Schedule table by this key. Then i need product Salary with  sum of the daily proportion and devide by sum by key in previous step.

    Sum of the daily proportion for the period in row = SUMIFS(Schedule[The daily proportion of working hours on schedule (in month)];Schedule[Schedule ID];[@[Schedule ID]];Schedule[Date];">="&[@[StartDate (in 2017 year)]];Schedule[Date];"<="&[@[EndDate  (in 2017 year)]])

    key (Employee ID & Month) = [@[Employee ID]]&";"&[@Month]

    Sum by key = SUMIF([key (Employee ID & Month)];[@[key (Employee ID & Month)]];[Sum of the daily proportion for the period in row])

    Monthly Salary for row corrected =[@[Monthly Salary]]/[@[Sum by key]]*[@[Sum of the daily proportion for the period in row]]

    After importing the specified 2 tables, I used the following code:

    let
        Source = Excel.CurrentWorkbook(){[Name="Base"]}[Content],
        #"Deleted columns" = Table.RemoveColumns(Source,{"Sum of the daily proportion for the period in row", "Sum by key", "Monthly Salary for row corrected"}),
        #"Changed type" = Table.TransformColumnTypes(#"Deleted columns",{{"Employee ID", Int64.Type}, {"Schedule ID", type text}, {"StartDate (in 2017 year)", type date}, {"EndDate  (in 2017 year)", type date}, {"Month", type date}, {"Monthly Salary", Int64.Type}, {"key (Employee ID & Month)", type text}}),
        #"Renamed Columns" = Table.RenameColumns(#"Changed type",{{"key (Employee ID & Month)", "key"}}),
        #"Index added" = Table.AddIndexColumn(#"Renamed Columns", "Index", 1, 1),
        #"Index added next" = Table.AddIndexColumn(#"Index added", "Index.1", 2, 1),
        #"Index added prev" = Table.AddIndexColumn(#"Index added next", "Index.2", 0, 1),
        #"Merged queries" = Table.NestedJoin(#"Index added prev",{"Index"},#"Index added prev",{"Index.2"},"Index added prev",JoinKind.LeftOuter),
        #"Expanded index next" = Table.ExpandTableColumn(#"Merged queries", "Index added prev", {"key"}, {"Next key"}),
        #"Merged queries1" = Table.NestedJoin(#"Expanded index next",{"Index.2"},#"Expanded index next",{"Index"},"Expanded index next",JoinKind.LeftOuter),
        #"Expanded index prev" = Table.ExpandTableColumn(#"Merged queries1", "Expanded index next", {"key"}, {"Prev key"}),
        #"Sorted rows" = Table.Sort(#"Expanded index prev",{{"Index", Order.Ascending}}),
        #"Sum by key"= Table.AddColumn(#"Sorted rows", 
                   "Sum by key", 
    			   each 
    			   if ([key]=[Next key] or [key]=[Prev key]) then 
    			   let SchedID=[Schedule ID], StartDate=[#"StartDate (in 2017 year)"], EndDate=[#"EndDate  (in 2017 year)"] in List.Sum(Table.SelectRows(Table.Buffer(Schedule), each [Date]>=StartDate and [Date]<=EndDate and [Schedule ID]=SchedID)[#"The daily proportion of working hours on schedule (in month)"]) else 1
    			   , Int64.Type),
        #"Deleted columns1" = Table.RemoveColumns(#"Sum by key",{"Index", "Index.1", "Index.2", "Next key", "Prev key", "Schedule ID", "StartDate (in 2017 year)", "EndDate  (in 2017 year)"}),
        #"Sum by key Month" = Table.Group(
                              Table.SelectColumns(Table.Buffer(#"Deleted columns1"),
                                    {"key","Sum by key"}),
                              "key", {"Sum DP", each List.Sum([#"Sum by key"]), 
                          Int64.Type}),
        #"Merged queries2" = Table.NestedJoin(#"Deleted columns1",{"key"},#"Sum by key Month",{"key"},"Deleted columns1",JoinKind.LeftOuter),
        #"Expanded Sum DP" = Table.ExpandTableColumn(#"Merged queries2", "Deleted columns1", {"Sum DP"}, {"Sum DP"}),
        #"User column added" = Table.AddColumn(#"Expanded Sum DP", "Proportion of Monthly Salary", each [Monthly Salary]*[Sum by key]/[Sum DP]),
        #"Grouped rows" = Table.Group(#"User column added", {"Employee ID", "Month"}, {{"Weighted monthly salary", each List.Sum([Proportion of Monthly Salary]), type number}})
    in
       #"Grouped rows"
     

    On small amounts of data it works, but on my 35,000*48,000 lines execution takes more than 30 minutes.
    Problem with the block:

        #"Sum by key"= Table.AddColumn(#"Sorted rows", 
                   "Sum by key", 
    			   each 
    			   if ([key]=[Next key] or [key]=[Prev key]) then 
    			   let SchedID=[Schedule ID], StartDate=[#"StartDate (in 2017 year)"], EndDate=[#"EndDate  (in 2017 year)"] in List.Sum(Table.SelectRows(Table.Buffer(Schedule), each [Date]>=StartDate and [Date]<=EndDate and [Schedule ID]=SchedID)[#"The daily proportion of working hours on schedule (in month)"]) else 1
    			   , Int64.Type)

    How can I speed up the derivation of the sum of the coefficients from the Schedule table for the period specified in each row of the Base table?

    I ineffectually tried to use the instructions from the page:

    sumifs-in-power-query-rolling-12-months





    • Edited by inetguru Friday, January 12, 2018 9:08 AM
    Thursday, January 11, 2018 2:36 PM

Answers

  • Please check this different approach. It will join with the Schedule-table instead of referencing and filtering it in every row. I believe that makes the huge performance difference here:

    let
        Source = Excel.CurrentWorkbook(){[Name="Base"]}[Content],
        #"Deleted columns" = Table.RemoveColumns(Source,{"Sum of the daily proportion for the period in row", "Sum by key", "Monthly Salary for row corrected"}),
        #"Changed type" = Table.TransformColumnTypes(#"Deleted columns",{{"Employee ID", Int64.Type}, {"Schedule ID", type text}, {"StartDate (in 2017 year)", type date}, {"EndDate  (in 2017 year)", type date}, {"Month", type date}, {"Monthly Salary", Int64.Type}, {"key (Employee ID & Month)", type text}}),
        #"Renamed Columns" = Table.RenameColumns(#"Changed type",{{"key (Employee ID & Month)", "key"}}),
        #"Added Custom" = Table.AddColumn(#"Renamed Columns", "Dates", each {Number.From([#"StartDate (in 2017 year)"])..Number.From([#"EndDate  (in 2017 year)"])}),
        #"Expanded Dates" = Table.ExpandListColumn(#"Added Custom", "Dates"),
        #"Changed Type" = Table.TransformColumnTypes(#"Expanded Dates",{{"Dates", type date}}),
        #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Schedule ID", "Dates"},Schedule,{"Schedule ID", "Date"},"Schedule",JoinKind.LeftOuter),
        #"Expanded Schedule1" = Table.ExpandTableColumn(#"Merged Queries", "Schedule", {"The daily proportion of working hours on schedule (in month)"}, {"Sum of Daily"}),
        #"Grouped rows" = Table.Group(#"Expanded Schedule1", {"key", "StartDate (in 2017 year)", "Schedule ID", "Month", "Employee ID"}, {{"MontlySalary", each List.Min([Monthly Salary]), type number}, {"SumOfDaily", each List.Sum([Sum of Daily]), type number}}),
        #"Inserted Multiplication" = Table.AddColumn(#"Grouped rows", "SalaryShare", each [MontlySalary] * [SumOfDaily], type number),
        #"Grouped Rows" = Table.Group(#"Inserted Multiplication", {"Month", "key"}, {{"Sum DP", each List.Sum([SumOfDaily]), type number}, {"All", each _, type table}}),
        #"Expanded All" = Table.ExpandTableColumn(#"Grouped Rows", "All", {"Employee ID", "SalaryShare", "Schedule ID", "StartDate (in 2017 year)", "SumOfDaily"}, {"Employee ID", "SalaryShare", "Schedule ID", "StartDate (in 2017 year)", "SumOfDaily"}),
        #"Inserted Multiplication1" = Table.AddColumn(#"Expanded All", "Salary", each [Sum DP] * [SalaryShare], type number),
        #"Grouped Rows1" = Table.Group(#"Inserted Multiplication1", {"Month", "Employee ID"}, {{"Salary", each List.Sum([Salary]), type number}})
    in
        #"Grouped Rows1"
    
    


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!


    Saturday, January 13, 2018 7:58 AM
    Moderator

All replies

  • I tried to change the Shedule table - in the first column only unique dates (01.01.2017-31.12.2017), the remaining columns with the name = ShedID  and the values in the rows for this Shedule.

    The code for #"Sum by key" changed to this

     #"Sum by key"= Table.AddColumn(#"Sorted rows", 
                   "Sum by key", 
    			   each 
    			   if ([key]=[Next key] or [key]=[Prev key]) then 
    			   let SchedID=[Schedule ID], StartDate=[#"StartDate (in 2017 year)"], EndDate=[#"EndDate  (in 2017 year)"] in List.Sum(Table.Column(
                                                      Table.SelectRows(Table.Buffer(Schedule), each [Date]>=StartDate and [Date]<=EndDate),SchedID))
                                           else 1
    			   , Int64.Type),
    Unfortunately this did not speed up the calculation (

    Friday, January 12, 2018 8:59 AM
  • Hi there,

    any chance you can provide a file with some sample data, as I'm having difficulties to follow the desired logic here.

    Thanks,


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Friday, January 12, 2018 10:18 AM
    Moderator
  • https:// docviewer.yandex. ru/view/2338717/?*=G%2F%2FNYYOHqlpsrjknfvZXY87ft%2FJ7InVybCI6InlhLW1haWw6Ly8xNjQzODEzODYzOTkwNDQ3MDcvMS4yIiwidGl0bGUiOiJzYW1wbGUgc3VtaWYgcG93ZXIgcXVlcnkucmFyIiwidWlkIjoiMjMzODcxNyIsInl1IjoiMzkxMzA2ODEwMTQ4MzQ1MzcxNiIsIm5vaWZyYW1lIjpmYWxzZSwidHMiOjE1MTU3NTgxMDExODd9

    Friday, January 12, 2018 12:53 PM
  • Friday, January 12, 2018 1:02 PM
  • Please check you link, this is what I get:


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Friday, January 12, 2018 3:55 PM
    Moderator
  • Sorry, try this link please SumifPQSample
    Friday, January 12, 2018 8:46 PM
  • Please check this different approach. It will join with the Schedule-table instead of referencing and filtering it in every row. I believe that makes the huge performance difference here:

    let
        Source = Excel.CurrentWorkbook(){[Name="Base"]}[Content],
        #"Deleted columns" = Table.RemoveColumns(Source,{"Sum of the daily proportion for the period in row", "Sum by key", "Monthly Salary for row corrected"}),
        #"Changed type" = Table.TransformColumnTypes(#"Deleted columns",{{"Employee ID", Int64.Type}, {"Schedule ID", type text}, {"StartDate (in 2017 year)", type date}, {"EndDate  (in 2017 year)", type date}, {"Month", type date}, {"Monthly Salary", Int64.Type}, {"key (Employee ID & Month)", type text}}),
        #"Renamed Columns" = Table.RenameColumns(#"Changed type",{{"key (Employee ID & Month)", "key"}}),
        #"Added Custom" = Table.AddColumn(#"Renamed Columns", "Dates", each {Number.From([#"StartDate (in 2017 year)"])..Number.From([#"EndDate  (in 2017 year)"])}),
        #"Expanded Dates" = Table.ExpandListColumn(#"Added Custom", "Dates"),
        #"Changed Type" = Table.TransformColumnTypes(#"Expanded Dates",{{"Dates", type date}}),
        #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Schedule ID", "Dates"},Schedule,{"Schedule ID", "Date"},"Schedule",JoinKind.LeftOuter),
        #"Expanded Schedule1" = Table.ExpandTableColumn(#"Merged Queries", "Schedule", {"The daily proportion of working hours on schedule (in month)"}, {"Sum of Daily"}),
        #"Grouped rows" = Table.Group(#"Expanded Schedule1", {"key", "StartDate (in 2017 year)", "Schedule ID", "Month", "Employee ID"}, {{"MontlySalary", each List.Min([Monthly Salary]), type number}, {"SumOfDaily", each List.Sum([Sum of Daily]), type number}}),
        #"Inserted Multiplication" = Table.AddColumn(#"Grouped rows", "SalaryShare", each [MontlySalary] * [SumOfDaily], type number),
        #"Grouped Rows" = Table.Group(#"Inserted Multiplication", {"Month", "key"}, {{"Sum DP", each List.Sum([SumOfDaily]), type number}, {"All", each _, type table}}),
        #"Expanded All" = Table.ExpandTableColumn(#"Grouped Rows", "All", {"Employee ID", "SalaryShare", "Schedule ID", "StartDate (in 2017 year)", "SumOfDaily"}, {"Employee ID", "SalaryShare", "Schedule ID", "StartDate (in 2017 year)", "SumOfDaily"}),
        #"Inserted Multiplication1" = Table.AddColumn(#"Expanded All", "Salary", each [Sum DP] * [SalaryShare], type number),
        #"Grouped Rows1" = Table.Group(#"Inserted Multiplication1", {"Month", "Employee ID"}, {{"Salary", each List.Sum([Salary]), type number}})
    in
        #"Grouped Rows1"
    
    


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!


    Saturday, January 13, 2018 7:58 AM
    Moderator
  • Thank you very much for your help. Your decision is very interesting and I hope that it will help to speed up the processing of my data set (I can check it on Monday). I learned a few new opportunities for myself. In particular, 1) assigning dates through the numbers {Number.From ([# "StartDate (in 2017 year)"]) .. Number.From ([# "EndDate (in 2017 year)"])}
    2) join tables into several fields Table.NestedJoin (# "Changed Type", {"Schedule ID", "Dates"}, Schedule, {"Schedule ID", "Date"}, "Schedule", JoinKind.LeftOuter)
    3) use grouping of all fields {"All", each _, type table}
    Every time I learn from you something new, many thanks, Imke.
    Saturday, January 13, 2018 9:58 PM