none
Power Query / Weekly growth - Import from folder RRS feed

  • Question

  • Dear all,

    I'm new in using Power Query as a tool. Here for you I have an issue that would like to have help for.

    I'm now trying to pull out data from different data sources to get an optimal overview about weekly performances. Data are being pulled out from the same folders, where I add more files on a weekly basis, effectively; but I want to do more. 

    Here below  a simplified example of what I'm trying to do. I would like to add in-between columns to - for example - calculate the weekly growth by simply subtracting CW02 to CW03 values. Given that this can be easily done for the data that I have already with me, do you have any ideas how I can automatically produce this when new data - and so columns - will be added to my folder and updated on the query?

    

    Thank You!


    Lorenzo

    Wednesday, January 22, 2020 2:46 PM

Answers

  • Hi Lorenzo

    Hope you're going well. With no news, following proposal assumes CSVs stored in a folder looking like:

    let
        Source = Folder.Files("PathToCsvFolder"),
        BinariesOnly = Table.SelectColumns(Source,{"Content"}),
        BinsToTables = Table.TransformColumns(BinariesOnly,
            {"Content", each
                let
                    getData = Csv.Document(_,[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])
                in
                    Table.PromoteHeaders(getData, [PromoteAllScalars=true]),
                type table
            }
        ),
        CombinedTables = Table.Combine(BinsToTables[Content]),
        UnpivotedNonParts = Table.UnpivotOtherColumns(CombinedTables, {"Part#"}, "Week", "Value"),
        ValueAsNumber = Table.TransformColumnTypes(UnpivotedNonParts,{{"Value", type number}}),
            weekColumns = List.Buffer(List.Skip(Table.ColumnNames(PivotedWeek))),
        PivotedWeek = Table.Pivot(ValueAsNumber, List.Distinct(ValueAsNumber[Week]), "Week", "Value", List.Sum),
        AddedDiffColumns = List.Accumulate(List.Skip(weekColumns), PivotedWeek,
            (tblState,curWeek)=>
                let
                    prevWeek = weekColumns{List.PositionOf(weekColumns,curWeek)-1}
                in
                    Table.AddColumn(tblState, (prevWeek & "-" & curWeek), each
                        Record.Field(_,curWeek) - Record.Field(_,prevWeek), type number
                    )
        ),
            outColumnsOrder = {"Part#"} & List.Sort(List.Skip(Table.ColumnNames(AddedDiffColumns))),
        ReorderedColumns = Table.SelectColumns(AddedDiffColumns, outColumnsOrder)
    in
        ReorderedColumns

    Result after 4 weeks where Part3 did not exist before CW04:

    Feel free to adapt at your convenience… Corresponding workbook avail. here


    Thursday, February 13, 2020 12:57 PM

All replies

  • Hi Lorenzo

    In principle this sounds doable. However, additional information is required:

    1. What kind (.xls, .csv...) of source file do you use? If Excel files do you get the data from a Worksheet or a Table?
    2. How do you already identify, within your folder, which file is Week1, which file is Week2... (i.e. per there filename, per date last modified…)?
    3. Apart from the [CWxx] column in each file, what other column(s) do you have?
    4. Is the following query output representative of what you would expect (Part3 did not exist before CW04)? If not please upload a picture of the expected output

    Wednesday, January 22, 2020 5:43 PM
  • Hi Lorenzo

    Hope you're going well. With no news, following proposal assumes CSVs stored in a folder looking like:

    let
        Source = Folder.Files("PathToCsvFolder"),
        BinariesOnly = Table.SelectColumns(Source,{"Content"}),
        BinsToTables = Table.TransformColumns(BinariesOnly,
            {"Content", each
                let
                    getData = Csv.Document(_,[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])
                in
                    Table.PromoteHeaders(getData, [PromoteAllScalars=true]),
                type table
            }
        ),
        CombinedTables = Table.Combine(BinsToTables[Content]),
        UnpivotedNonParts = Table.UnpivotOtherColumns(CombinedTables, {"Part#"}, "Week", "Value"),
        ValueAsNumber = Table.TransformColumnTypes(UnpivotedNonParts,{{"Value", type number}}),
            weekColumns = List.Buffer(List.Skip(Table.ColumnNames(PivotedWeek))),
        PivotedWeek = Table.Pivot(ValueAsNumber, List.Distinct(ValueAsNumber[Week]), "Week", "Value", List.Sum),
        AddedDiffColumns = List.Accumulate(List.Skip(weekColumns), PivotedWeek,
            (tblState,curWeek)=>
                let
                    prevWeek = weekColumns{List.PositionOf(weekColumns,curWeek)-1}
                in
                    Table.AddColumn(tblState, (prevWeek & "-" & curWeek), each
                        Record.Field(_,curWeek) - Record.Field(_,prevWeek), type number
                    )
        ),
            outColumnsOrder = {"Part#"} & List.Sort(List.Skip(Table.ColumnNames(AddedDiffColumns))),
        ReorderedColumns = Table.SelectColumns(AddedDiffColumns, outColumnsOrder)
    in
        ReorderedColumns

    Result after 4 weeks where Part3 did not exist before CW04:

    Feel free to adapt at your convenience… Corresponding workbook avail. here


    Thursday, February 13, 2020 12:57 PM
  • Hi folks. Could anyone please explain what exactly the following code does (or how it works)? 

            weekColumns = List.Buffer(List.Skip(Table.ColumnNames(PivotedWeek))),
        PivotedWeek = Table.Pivot(ValueAsNumber, List.Distinct(ValueAsNumber[Week]), "Week", "Value", List.Sum),
        AddedDiffColumns = List.Accumulate(List.Skip(weekColumns), PivotedWeek,
            (tblState,curWeek)=>
                let
                    prevWeek = weekColumns{List.PositionOf(weekColumns,curWeek)-1}
                in
                    Table.AddColumn(tblState, (prevWeek & "-" & curWeek), each
                        Record.Field(_,curWeek) - Record.Field(_,prevWeek), type number
                    )
        ),
            outColumnsOrder = {"Part#"} & List.Sort(List.Skip(Table.ColumnNames(AddedDiffColumns))),

    And is there an special way/steps to build that piece of code easyly? 

    And when exactly I shoud use List.Buffer?

    If I understand how It works I could adapt it to some tasks I do. 

    Thanks you very much.

    Monday, February 24, 2020 4:02 AM
  • Hi Anthony

    • IMHO it's not realistic to explain exactly everything, you have to do your home work :) and refer to the PQ documentation to understand what each function does, at least in principal
    • No way to easily build these steps, the only one - in the part you highlighted - that can be done with the UI is the 2nd step (Table.Pivot), everything else must be coded in the Advanced Editor
    • There's no real guidelines on when to use or not Table/List.Buffer. In principal this is done to speed up things as the Table or List is loaded in memory. However, there are situations where loading a large Table in memory can be counterproductive. Search this site about this and you'll find some scenarios & explain.
      In the above case I buffered the list of column names as it shouldn't be large (we're talking about yearly weeks so say max. 54 + the Part# column) and because (that's the most important point) we refer to that List n times => List.Accumulate(...), otherwise this wouldn't have made sense to Buffer it

    Will be glad to try answering your specific questions if you have any

    Nice day...

    Monday, February 24, 2020 6:28 AM
  • Hello Lz. Thanks for your reply and for your feedback.

    Will read the documentation and try to understand how the functions works.

    On this piece of code I have a couple of questions.

            (tblState,curWeek)=>
                let
                    prevWeek = weekColumns{List.PositionOf(weekColumns,curWeek)-1}
                in
                    Table.AddColumn(tblState, (prevWeek & "-" & curWeek), each
                        Record.Field(_,curWeek) - Record.Field(_,prevWeek), type number
                    )


    What exactly is (tblState,curWeek)=> or what is used for: tblSate and curWeek?

    I understand the value of "prevWeek" comes from

    "= weekColumns{List.PositionOf(weekColumns,curWeek)-1}"
    But, where comes the value of "curWeek" or how is determinated?

    Thanks for your attention and the patience with me.

    Cheers,

    Thursday, February 27, 2020 1:56 AM
  • Hi Anthony

    RE: (tblState,curWeek)=>
    You need to take in this in the context of the List.Accumulate function. In the example they provide they use state and current. state is the accumulator of the function while current contains/is the current value in the list. 
    tblState & curWeek are names I've chosen, I could have used state and current or abcd and def... it's really up to you but you must adjust later. Shortened version of the doc. example:

    List.Accumulate({4, 6}, 0, (state, current) => state + current)0 is the "seed", the initial value if you prefer
    On the 1st iteration, current=4, state=0 (the seed/initial value) +4 (the current list value)
    On the 2nd iteration, current=6, state=4 (result of previous 0+4) +6 (the current list value)
    Result = 10

    Once you've understood this you will have the answer to your 2nd question (But, where comes the value of "curWeek" or how is determinated?)

    More to come in a next post...

    Thursday, February 27, 2020 4:44 AM
  • Anthony

    You will quickly discover that the Power Query doc. isn't enough if you want to understand things a bit. I would also recommend:

    Back on your previous question re. List.Accumulate. The seed (initial "value") isn't necessarily a Primitive value, it can also be i.e. a Structured data values like a Table. Take the following example:

    let
        myTable = Table.FromRecords({[A=10],[A=20]}, type table [A=number]),
        AddTwoColumns = List.Accumulate({"Anthony","Lz."}, myTable,
            (myTableState,currentListItem) =>
                Table.AddColumn(myTableState, currentListItem, each [A] + Number.Random(), type number)
        )
    in
        AddTwoColumns

    myTable (the seed) is a Structured data values, a Table:

    On the 1st iteration List.Accumulate takes the 1st item in the list ("Anthony"), adds a column (to the seed) named with that value, and performs a calculation to "fill" that column, giving something like:

    the key thing to understand is that myTableState (the function accumulator) accumulated the seed (myTable) and what's been asked (add a column) so it "contains" exactly what's shown in the above picture

    On the 2nd iteration it adds another column (name = 2nd item in the list => "Lz.") to the accumulator (myTableState). Once complete the accumulator (myTableState) looks like:

    Hope this clarifies & helps. Nice day….

    Thursday, February 27, 2020 5:53 AM
  • Hello Lz._ Thank you very much for your replies, those are clear and really helpful. Will read more about Power Query in the sites linked by you, and also will the play with the examples that you posted, think this will clear the little doubts that still have.

    Have a nice weekend.
    Friday, February 28, 2020 10:22 PM