none
Newest file with a specific name fragment as source RRS feed

  • Question

  • Hello everyone, i have a question. It might be simple for You. but not for me. I would like to create a power query, which source is newest file with specific name fragment from a specific folder. Let me explain: i have a folder with diffrent data dumps called for ex:

    inventory_dump_01-03-17

    inventory_dump_03-03-17

    memo_dump_02-17

    memo_dump_03-17

    etc.

    I would like to get as source newest inventory_dump and newest memo_dump. How to do this? :) I will be grateful for any help.

    Wednesday, August 30, 2017 7:04 PM

Answers

  • Here's one way to do it:

    let
        Source = {"inventory_dump_01-03-17",
    "inventory_dump_03-03-17",
    "memo_dump_02-17",
    "memo_dump_03-17"},
        #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
        #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "FileName"}}),
        #"Inserted Last Characters" = Table.AddColumn(#"Renamed Columns", "Inventory Dump", each if Text.Contains([FileName], "inventory_dump") then Text.End([FileName], 8) else null , type text),
        #"Inserted Last Characters1" = Table.AddColumn(#"Inserted Last Characters", "Memo Dump", each if Text.Contains([FileName], "memo_dump") then Text.End([FileName], 4) else null, type text),
        #"Added Custom" = Table.AddColumn(#"Inserted Last Characters1", "Date", each if [Inventory Dump] = null then Date.From("01-"&[Memo Dump]) else Date.From([Inventory Dump])),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Inventory Dump", "Memo Dump"}),
        #"Added Custom1" = Table.AddColumn(#"Removed Columns", "Category", each if Text.Contains([FileName], "memo_dump") then "memo_dump" else "inventory_dump"),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Custom1",{"FileName"}),
        #"Grouped Rows" = Table.Group(#"Removed Columns1", {"Category"}, {{"AllRows", each _, type table}}),
        #"Added Custom2" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Max([AllRows], "Date")),
        #"Removed Columns2" = Table.RemoveColumns(#"Added Custom2",{"AllRows"})
    in
        #"Removed Columns2"
    The important part about this code is the use of a custom column to create a value that will help me use the Group By feature. After that, the Table.Max is your friend and will grab the record with the newest date.


    Wednesday, August 30, 2017 8:55 PM
  • You can use the following procedure:

    1) Use the Folder.Files function to import the file binaries and attributes (similar to the attributes you see in a file manager)
    2) Extract the text "inventory_dump" and "memo_dump" from the "Name" column into a new column
    3) Group by the new column so that "inventory_dump" rows and "memo_dump" rows are separated into "sub-tables"
    4) For each of the "sub-tables," get the record that represents the max row, based on the "Date modified" column
    5) Expand the resulting records, selecting only the columns you need (the "Content" column is mandatory)
    6) Add a new column to get the binary data from the "Content" column. The function you use will depend on the type of binary data in the "Content" column. If the data is delimited text, you will use Csv.Document([Content]...). If the data is text with no delimiter, you will use Lines.FromBinary([Content]).

    See the code below (which excludes step 6).

    let
        Source = Folder.Files("<Insert Folder Path Here>"),
        AddedCustom = Table.AddColumn(Source, "Custom", each Text.BeforeDelimiter([Name]], "_", 1)),
        GroupedRows = Table.Group(AddedCustom, {"Custom"}, {{"Table", each _, type table}}),
        FilteredRows = Table.TransformColumns(GroupedRows, {"Table", each Table.Max(_, "Date modified")}),
        ExpandedTable = Table.ExpandRecordColumn(FilteredRows, "Table", {"Content", "Name"})
    in
        ExpandedTable


    Thursday, August 31, 2017 5:53 PM

All replies

  • Hey,

    How do you define which one is the newest inventory_dump and newest memo_dump file? is it by the date modified or perhaps the date accessed field? or does it have to do with the actual file name? in that case, how can you tell which one is the newest?

    Wednesday, August 30, 2017 8:24 PM
  • Newest files are files with newest date modified; actual file name can be helpful but also could sometimes have a typos. In this case newest files are filest with newest date modified, which is consistent with names.
    Wednesday, August 30, 2017 8:30 PM
  • Here's one way to do it:

    let
        Source = {"inventory_dump_01-03-17",
    "inventory_dump_03-03-17",
    "memo_dump_02-17",
    "memo_dump_03-17"},
        #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
        #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "FileName"}}),
        #"Inserted Last Characters" = Table.AddColumn(#"Renamed Columns", "Inventory Dump", each if Text.Contains([FileName], "inventory_dump") then Text.End([FileName], 8) else null , type text),
        #"Inserted Last Characters1" = Table.AddColumn(#"Inserted Last Characters", "Memo Dump", each if Text.Contains([FileName], "memo_dump") then Text.End([FileName], 4) else null, type text),
        #"Added Custom" = Table.AddColumn(#"Inserted Last Characters1", "Date", each if [Inventory Dump] = null then Date.From("01-"&[Memo Dump]) else Date.From([Inventory Dump])),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Inventory Dump", "Memo Dump"}),
        #"Added Custom1" = Table.AddColumn(#"Removed Columns", "Category", each if Text.Contains([FileName], "memo_dump") then "memo_dump" else "inventory_dump"),
        #"Removed Columns1" = Table.RemoveColumns(#"Added Custom1",{"FileName"}),
        #"Grouped Rows" = Table.Group(#"Removed Columns1", {"Category"}, {{"AllRows", each _, type table}}),
        #"Added Custom2" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Max([AllRows], "Date")),
        #"Removed Columns2" = Table.RemoveColumns(#"Added Custom2",{"AllRows"})
    in
        #"Removed Columns2"
    The important part about this code is the use of a custom column to create a value that will help me use the Group By feature. After that, the Table.Max is your friend and will grab the record with the newest date.


    Wednesday, August 30, 2017 8:55 PM
  • Seems ok, but what if I have a lot more files? Can i use wildcards or sth in source names?
    Wednesday, August 30, 2017 9:56 PM
  • You can use the following procedure:

    1) Use the Folder.Files function to import the file binaries and attributes (similar to the attributes you see in a file manager)
    2) Extract the text "inventory_dump" and "memo_dump" from the "Name" column into a new column
    3) Group by the new column so that "inventory_dump" rows and "memo_dump" rows are separated into "sub-tables"
    4) For each of the "sub-tables," get the record that represents the max row, based on the "Date modified" column
    5) Expand the resulting records, selecting only the columns you need (the "Content" column is mandatory)
    6) Add a new column to get the binary data from the "Content" column. The function you use will depend on the type of binary data in the "Content" column. If the data is delimited text, you will use Csv.Document([Content]...). If the data is text with no delimiter, you will use Lines.FromBinary([Content]).

    See the code below (which excludes step 6).

    let
        Source = Folder.Files("<Insert Folder Path Here>"),
        AddedCustom = Table.AddColumn(Source, "Custom", each Text.BeforeDelimiter([Name]], "_", 1)),
        GroupedRows = Table.Group(AddedCustom, {"Custom"}, {{"Table", each _, type table}}),
        FilteredRows = Table.TransformColumns(GroupedRows, {"Table", each Table.Max(_, "Date modified")}),
        ExpandedTable = Table.ExpandRecordColumn(FilteredRows, "Table", {"Content", "Name"})
    in
        ExpandedTable


    Thursday, August 31, 2017 5:53 PM
  • Seems ok, but what if I have a lot more files? Can i use wildcards or sth in source names?

    The source step was just some dummy data to play with, but you can change that source step. Is completely up to you. You can change the Source step for any Data Source function of your choice like SharePoint.Files, Folder.Files or others. 

    You could come up with several ways to get to your solution, but I'd say that the Group By feature and the Table.Max functions are quite handy for your case.

    Thursday, August 31, 2017 10:14 PM
  • Hi Szmotsu,

    did you make it work? If so, please mark the answer that helped you.

    Re you question on wildcards: There is no proper regex-support in Power Query, but for your case you can apply a Text-filter on the column "name" and use Text.Contains and fill in a fragment of the full string to cover misspelling.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Sunday, September 10, 2017 5:05 AM
    Moderator