none
Power Query - Sum_If_Column_Name_Contains <string> RRS feed

  • Question

  • Hi,

    I have a case with reading "Files from folders" where each file's content might differ a bit, say:

    File 1 columns:

    - Store Name

    - Date

    - No of claims

    - Sales Orange

    - Sales Black

    - Sales Blue

    and file 2 columns:

    - Store Name

    - Date

    - No of claims

    - Sales Orange

    - Sales Green    <----

    - Sales Blue

    I have the "file consolidation" (append) part in place; but I never know what new "Sales" columns might appear in future files.

    Therefore I need a function that can be used in creating a new column that sums the numbers from the sales columns (ie., the columns with names containing the string "Sales") whatever other string content that column name might have. (Maybe even matching a string pattern, allowing for wildcards)

    (I guess that as a prep for that I might also need et step that sets/forces the content type of these columns to, say, decimal).

    Help anyone ?





    Wednesday, December 7, 2016 12:09 PM

Answers

  • See the below modifications to my previous script. In the new script, "Sales" is the containing string. Step additions/changes shown in bold

    let
        PreviousStepName = ...
        PivotedColumnList = List.Select(Table.ColumnNames(PreviousStepName), each Text.Contains(_, "Sales")),
        AddedIndex = Table.AddIndexColumn(PreviousStepName, "Index", 0, 1),
        UnpivotedColumns = Table.Unpivot(AddedIndex, PivotedColumnList, "Attribute", "Value"),
        GroupedRows = Table.Group(UnpivotedColumns, {"Index"}, {{"Sales", each List.Sum([Value]), type number}}),
        MergedTotal = Table.Join(AddedIndex,"Index",GroupedRows,"Index"),
        RemovedIndex = Table.RemoveColumns(MergedTotal,{"Index"})
    in
        RemovedIndex

    Friday, December 9, 2016 4:15 AM
  • Yet another possibility, which doesn't involve unpivoting and subsequently joining tables. If you have a fairly large appended set of tables, you should try each solution to see if there's any significant performance difference.

    let
        PreviousStepName = ...
        SummedColumnList = List.Select(Table.ColumnNames(PreviousStepName), each Text.Contains(_, "Sales")),
        AddedIndex = Table.AddIndexColumn(PreviousStepName, "Index", 0, 1),
        AddedColumn = Table.AddColumn(
                          AddedIndex,
                          "Sales",
                          each List.Sum(List.Transform(SummedColumnList, (curr)=>Table.Column(AddedIndex, curr){[Index]}))
                      ),
        RemovedIndex = Table.RemoveColumns(AddedColumn,{"Index"})
    in
        RemovedIndex


    Friday, December 9, 2016 2:52 PM

All replies

  • If you already have an appended table, you're halfway there.

    1) In the Query Editor, select the "fixed" columns, i.e. Store Name, Date, and Number of Claims.
    2) Select Transform-->Unpivot Columns-->Unpivot Other Columns. All of the sale color names should be in a column named "Attribute," and all of the corresponding sales values should be in a column named "Value." You can change these default names by modifying the expression in the formula bar.

    How do you want to sum the sale numbers after unpivoting? Sum entire column, by sales color, etc.? 


    Wednesday, December 7, 2016 3:26 PM
  • I'm halfway there, I do have the input files (and future input files) appended together in a table. And there might pop a say 'Sales white' column up in the future, and I will get that if it comes. I want to sum the sales columns per row, ie. a new column that per row takes the values for each column that contain the string 'Sales ' in its name, and add those values together. A kind of sum_if_columnname_contains per record in the table.
    Thursday, December 8, 2016 7:49 AM
  • Would something like the following work for you? From your appended table, you can add all but one of the steps from the Query Editor UI

    1) Add an index column (default name is "Index")
    2) Follow steps 1 & 2 from previous comment
    3) Group by the index column
    4) Join the AddedIndex table with the GroupedRows table. This step was added manually in the Advanced  Editor
    5) Remove the index column

    let .... AddedIndex = Table.AddIndexColumn(PreviousStepName, "Index", 0, 1), UnpivotedOtherColumns = Table.UnpivotOtherColumns(AddedIndex, {"Store Name", "Date", "Number of Claims", "Index"}, "Attribute", "Value"), GroupedRows = Table.Group(UnpivotedOtherColumns, {"Index"}, {{"Sales", each List.Sum([Value]), type number}}), MergedTotal = Table.Join(AddedIndex,"Index",GroupedRows,"Index"), RemovedIndex = Table.RemoveColumns(MergedTotal,{"Index"}) in RemovedIndex



    Thursday, December 8, 2016 5:51 PM
  • Yes, thanks, but I would like to avoid any hardcoded column names, allowing for flexible data inputs.

    In the real case, the loading of data does not have hardcoded column names but in the simplified case below (where I have come to) there are three relevant columns (Brand A, Brand B, and Brand C).

    let
        Source = Excel.Workbook(File.Contents("...\ttt.xlsx"), null, true),
        Sheet2_Sheet = Source{[Item="Sheet2",Kind="Sheet"]}[Data],
        #"Promoted Headers" = Table.PromoteHeaders(Sheet2_Sheet),
        #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"T", type text}, {"U", type text}, {"Claim", Int64.Type}, {"Brand A", Int64.Type}, {"Brand B", Int64.Type}, {"Brand C", Int64.Type}}),

        //Create list of column names containing string "Brand "
        tbl = Table.FromList(Table.ColumnNames(#"Changed Type"), Splitter.SplitByNothing(), null, null, ExtraValues.Error),
        Filteredtbl = Table.SelectRows(tbl, each Text.Contains([Column1], "Brand ")),
        mList = Filteredtbl[Column1],

        //Now I just need something like:

        AddCust = Table.AddColumn(#"Changed Type", "SumBrands", SUMOFCOLUMNS(mList))

    in

    AddCust


    Thursday, December 8, 2016 9:55 PM
  • See the below modifications to my previous script. In the new script, "Sales" is the containing string. Step additions/changes shown in bold

    let
        PreviousStepName = ...
        PivotedColumnList = List.Select(Table.ColumnNames(PreviousStepName), each Text.Contains(_, "Sales")),
        AddedIndex = Table.AddIndexColumn(PreviousStepName, "Index", 0, 1),
        UnpivotedColumns = Table.Unpivot(AddedIndex, PivotedColumnList, "Attribute", "Value"),
        GroupedRows = Table.Group(UnpivotedColumns, {"Index"}, {{"Sales", each List.Sum([Value]), type number}}),
        MergedTotal = Table.Join(AddedIndex,"Index",GroupedRows,"Index"),
        RemovedIndex = Table.RemoveColumns(MergedTotal,{"Index"})
    in
        RemovedIndex

    Friday, December 9, 2016 4:15 AM
  • Yet another possibility, which doesn't involve unpivoting and subsequently joining tables. If you have a fairly large appended set of tables, you should try each solution to see if there's any significant performance difference.

    let
        PreviousStepName = ...
        SummedColumnList = List.Select(Table.ColumnNames(PreviousStepName), each Text.Contains(_, "Sales")),
        AddedIndex = Table.AddIndexColumn(PreviousStepName, "Index", 0, 1),
        AddedColumn = Table.AddColumn(
                          AddedIndex,
                          "Sales",
                          each List.Sum(List.Transform(SummedColumnList, (curr)=>Table.Column(AddedIndex, curr){[Index]}))
                      ),
        RemovedIndex = Table.RemoveColumns(AddedColumn,{"Index"})
    in
        RemovedIndex


    Friday, December 9, 2016 2:52 PM