none
Import From Folder: How to Extract the File Name in a Custom Column. RRS feed

  • Question

  • Hello All

    Here´s what we´re trying to do:

    We have a folder with csv files named like this:

    Sales_2013-02-05.csv
    Sales_2013-02-04.csv
    Sales_2013-02-03.csv
    Sales_2013-02-02.csv
    Sales_2013-02-01.csv

    And in the csv files there are the sales columns but not the date column.
    So we want to extract the date from the file name.

    I´ve tried entering = Source[Name] in a custom column, but it adds a "LIST" link, and on a click on expand, it adds ALL file names from the folder in each row, instead of just the needed one.

    If we could get the proper file name in each row (from where they got extracted), we could split the column and get the date from there. But I don´t know how put the filename there properly.

    Can you help?


    Tuesday, July 15, 2014 10:15 PM

Answers

  • This isn't entirely straightforward, but it's definitely possible. What you need to do is to apply all of your transforms to each individual file instead of the combined files. I do that as follows:

    1) Use Folder.Files as generated by the GUI to look at the list of my files.
    2) Pick one file and do all the transformations to it that I want to apply to all of the files. Sometimes, this just amounts to letting the autodetection figure out the column names and types.
    3) Go into the advanced editor and edit my code so that the transformations from step 2 are applied to all files. This involves creating a new function and then applying that function to the content in each row.
    4) Expand the tables created in step 3.

    As an example, I have some files with names that match the ones you suggested. After steps 1 + 2, my query looks like the following:

    let
        Source = Folder.Files("d:\testdata\files"),
        #"d:\testdata\files\_Sales_2013-02-01 csv" = Source{[#"Folder Path"="d:\testdata\files\",Name="Sales_2013-02-01.csv"]}[Content],
        #"Imported CSV" = Csv.Document(#"d:\testdata\files\_Sales_2013-02-01 csv",null,",",null,1252),
        #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
        #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
    in
        #"Changed Type"

    For step 3, I need to take steps 3-5 of my query and convert them into a function. As a check, I can apply that function to the same file that I chose in step 2. The result looks like this:

    let
        Source = Folder.Files("d:\testdata\files"),
        Loader = (file) =>
            let
                #"Imported CSV" = Csv.Document(file,null,",",null,1252),
                #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
                #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
            in
                #"Changed Type",
        #"d:\testdata\files\_Sales_2013-02-01 csv" = Source{[#"Folder Path"="d:\testdata\files\",Name="Sales_2013-02-01.csv"]}[Content],
        Loaded = Loader(#"d:\testdata\files\_Sales_2013-02-01 csv")
    in
        Loaded

    Now I apply the same function to all of the rows, transforming the existing "Content" column into a new value:

    let
        Source = Folder.Files("d:\testdata\files"),
        Loader = (file) =>
            let
                #"Imported CSV" = Csv.Document(file,null,",",null,1252),
                #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
                #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
            in
                #"Changed Type",
        Transformed = Table.TransformColumns(Source, {"Content", Loader})
    in
        Transformed

    Finally, I need to expand out the columns in the table, which I can do by clicking on the expand icon next to the Content column header. The resulting query looks like this:

    let
        Source = Folder.Files("d:\testdata\files"),
        Loader = (file) =>
            let
                #"Imported CSV" = Csv.Document(file,null,",",null,1252),
                #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
                #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
            in
                #"Changed Type",
        Transformed = Table.TransformColumns(Source, {"Content", Loader}),
        #"Expand Content" = Table.ExpandTableColumn(Transformed, "Content", {"One", "Two", "Three"}, {"Content.One", "Content.Two", "Content.Three"})
    in
        #"Expand Content"

    From here, you should be able to get to what you want.

    Wednesday, July 16, 2014 2:56 AM

All replies

  • This isn't entirely straightforward, but it's definitely possible. What you need to do is to apply all of your transforms to each individual file instead of the combined files. I do that as follows:

    1) Use Folder.Files as generated by the GUI to look at the list of my files.
    2) Pick one file and do all the transformations to it that I want to apply to all of the files. Sometimes, this just amounts to letting the autodetection figure out the column names and types.
    3) Go into the advanced editor and edit my code so that the transformations from step 2 are applied to all files. This involves creating a new function and then applying that function to the content in each row.
    4) Expand the tables created in step 3.

    As an example, I have some files with names that match the ones you suggested. After steps 1 + 2, my query looks like the following:

    let
        Source = Folder.Files("d:\testdata\files"),
        #"d:\testdata\files\_Sales_2013-02-01 csv" = Source{[#"Folder Path"="d:\testdata\files\",Name="Sales_2013-02-01.csv"]}[Content],
        #"Imported CSV" = Csv.Document(#"d:\testdata\files\_Sales_2013-02-01 csv",null,",",null,1252),
        #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
        #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
    in
        #"Changed Type"

    For step 3, I need to take steps 3-5 of my query and convert them into a function. As a check, I can apply that function to the same file that I chose in step 2. The result looks like this:

    let
        Source = Folder.Files("d:\testdata\files"),
        Loader = (file) =>
            let
                #"Imported CSV" = Csv.Document(file,null,",",null,1252),
                #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
                #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
            in
                #"Changed Type",
        #"d:\testdata\files\_Sales_2013-02-01 csv" = Source{[#"Folder Path"="d:\testdata\files\",Name="Sales_2013-02-01.csv"]}[Content],
        Loaded = Loader(#"d:\testdata\files\_Sales_2013-02-01 csv")
    in
        Loaded

    Now I apply the same function to all of the rows, transforming the existing "Content" column into a new value:

    let
        Source = Folder.Files("d:\testdata\files"),
        Loader = (file) =>
            let
                #"Imported CSV" = Csv.Document(file,null,",",null,1252),
                #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
                #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
            in
                #"Changed Type",
        Transformed = Table.TransformColumns(Source, {"Content", Loader})
    in
        Transformed

    Finally, I need to expand out the columns in the table, which I can do by clicking on the expand icon next to the Content column header. The resulting query looks like this:

    let
        Source = Folder.Files("d:\testdata\files"),
        Loader = (file) =>
            let
                #"Imported CSV" = Csv.Document(file,null,",",null,1252),
                #"First Row as Header" = Table.PromoteHeaders(#"Imported CSV"),
                #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"One", Int64.Type}, {"Two", type text}, {"Three", type text}})
            in
                #"Changed Type",
        Transformed = Table.TransformColumns(Source, {"Content", Loader}),
        #"Expand Content" = Table.ExpandTableColumn(Transformed, "Content", {"One", "Two", "Three"}, {"Content.One", "Content.Two", "Content.Three"})
    in
        #"Expand Content"

    From here, you should be able to get to what you want.

    Wednesday, July 16, 2014 2:56 AM
  • Hi Curt

    Thanks for your reply.

    Although I´m not a coder, I´m trying to make it work since your reply with no luck.
    I got stuck in step 3.

    Here´s my code so far (steps 1 & 2):

    let
         Source = Folder.Files("C:\1Powerquery\files"),
        #"C:\1Powerquery\files\_all_earnings_2014-05-01 csv" = Source{[#"Folder Path"="C:\1Powerquery\files\",Name="all_earnings_2014-05-01.csv"]}[Content],
        ImportedCSV = Csv.Document(#"C:\1Powerquery\files\_all_earnings_2014-05-01 csv",null,"#(tab)",null,1252),
        ChangedType = Table.TransformColumnTypes(ImportedCSV,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}}),
        FirstRowAsHeader = Table.PromoteHeaders(ChangedType),
        RemovedColumns = Table.RemoveColumns(FirstRowAsHeader,{"Company", "Report Time", "Current Price", "Price % Change", ""})
    in
        RemovedColumns

    But got completely stuck from there, not knowing what to do. Everything I try, I get a different error.
    Is it too much to ask for a video or a bit more detailed instructions?

    Thanks in advance.


    • Edited by NicoPer Wednesday, July 16, 2014 10:38 PM
    Wednesday, July 16, 2014 10:37 PM
  • I obviously can't test it, but here's what I end up with. You'll still need to expand your "Content" column.

    let
        Source = Folder.Files("C:\1Powerquery\files"),
        Loader = (content) =>
            let
                ImportedCSV = Csv.Document(content,null,"#(tab)",null,1252),
                ChangedType = Table.TransformColumnTypes(ImportedCSV,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}}),
                FirstRowAsHeader = Table.PromoteHeaders(ChangedType),
                RemovedColumns = Table.RemoveColumns(FirstRowAsHeader,{"Company", "Report Time", "Current Price", "Price % Change", ""})
            in
                RemovedColumns,
        Transformed = Table.TransformColumns(Source, {"Content", Loader})
    in
        Transformed

    Wednesday, July 16, 2014 10:54 PM
  • * Perfect!!*

    Thanks a million.

    :)

    Wednesday, July 16, 2014 11:56 PM
  • Curt,

    Thanks for the detailed response. What could I do, if I wanted to  combine all sheets named "Sheet5" in every workbook in a folder, and pull the file name for each of them in a custom column. Is this possible?

    Tuesday, May 12, 2015 3:11 PM
  • I have been using Chris Webb's solution for a while. I made the custom function to put file names with something he referenced for me.

    What I am seeking is kind of a combination of the two though. I am looking into the Miguel Escobar's work now. I don't think Have seen what he has done yet, thanks!

    Thursday, May 14, 2015 7:23 PM