none
Combining Files From Folder in Power Query RRS feed

  • Question

  • I'll try to be as short and concise as possible.

    I have a series of 10 files with sales data segmented by store branch (each branch having its own column and column header). The basic skeleton is the same for each file in the series, but as old stores close and new ones open during the year, the number of columns may increase or decrease, bumping the column address for specific stores up or down over the course of the file series.

    Now, currently I combine files using a script (for some reason combine binaries does not work in Excel 2016 which I'm using). 

    When I combine them, it uses the first file as a static template, which requires subsequent files to have the exact same number of columns and in the same order.

    Is there a script or power query feature I can use to map each unique column header in a file with its exact matches in subsequent files, regardless if the column address changes, and also ensures that all unique column headers that appear in the file series are accounted for?


    Monday, November 18, 2019 12:14 PM

Answers

  • Hi

    If your files to combine consist of sheet within workbooks where the 1st row contains the column headers, replace the bolded+italic part with your info. and try:

    let
        Source = Folder.Files("PathToFilesToCombine"),
        TargetName = "SheetName",
        
        FilteredOutHiddenFiles = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
        BinariesOnly = Table.SelectColumns(FilteredOutHiddenFiles, {"Content"}),
        ExtractedData = Table.TransformColumns(BinariesOnly,
            {"Content", each
                let
                    ExcelContent = Excel.Workbook(_, null, true),
                    SelectedTarget = Table.SelectRows(ExcelContent,
                        each ([Kind] = "Sheet") and ([Item] = TargetName)),
                    DataOnly = Table.SelectColumns(SelectedTarget, {"Data"}),
                    ExpandedData = Table.ExpandTableColumn(DataOnly, "Data", Table.ColumnNames(DataOnly[Data]{0}))
                in
                    Table.PromoteHeaders(ExpandedData, [PromoteAllScalars = true]),
                Table.Type
            }
        ),
        CombinedTables = Table.Combine(ExtractedData[Content])
    in
        CombinedTables

    If they consist of Table within workbooks, same as above after replacing steps TargetName & ExtractedData:

    …
        TargetName = "TableName",
    …
        ExtractedData = Table.TransformColumns(BinariesOnly,
            {"Content", each
                let
                    ExcelContent = Excel.Workbook(_, null, true),
                    SelectedTarget = Table.SelectRows(ExcelContent,
                        each ([Kind] = "Table") and ([Item] = TargetName)),
                    DataOnly = Table.SelectColumns(SelectedTarget, {"Data"})
                in
                    Table.ExpandTableColumn(DataOnly, "Data", Table.ColumnNames(DataOnly[Data]{0})),
                Table.Type
            }
        ),

    • Marked as answer by khizer.daar Wednesday, November 20, 2019 6:12 AM
    Monday, November 18, 2019 10:30 PM

All replies

  • Try to look in your "Transform File from XX" function that's automatically created, the static number of column should be hardcoded there from your example file as parameter "Columns=###". You can delete it.

    Also if this transform function is missing "Promote header" or some other kind of column namer, you can add it. 


    Monday, November 18, 2019 2:43 PM
  • Hello

    So the goal is to get a unique list of column names in a specific worksheet?

    If this is the case you can

    - Read files from folder with power query

    - in case filter for the files you want to extract

    - create function that reads from the worksheet in question with Table.ColumnNames the column headers. Combine the result with List.Union

    Jimmy


    Query it

    Monday, November 18, 2019 4:27 PM
  • Hi,

    You should create table structure using Power query and then append tables as mentioned in below snapshot.

    Append Query

    Monday, November 18, 2019 5:02 PM
  • Hi

    If your files to combine consist of sheet within workbooks where the 1st row contains the column headers, replace the bolded+italic part with your info. and try:

    let
        Source = Folder.Files("PathToFilesToCombine"),
        TargetName = "SheetName",
        
        FilteredOutHiddenFiles = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
        BinariesOnly = Table.SelectColumns(FilteredOutHiddenFiles, {"Content"}),
        ExtractedData = Table.TransformColumns(BinariesOnly,
            {"Content", each
                let
                    ExcelContent = Excel.Workbook(_, null, true),
                    SelectedTarget = Table.SelectRows(ExcelContent,
                        each ([Kind] = "Sheet") and ([Item] = TargetName)),
                    DataOnly = Table.SelectColumns(SelectedTarget, {"Data"}),
                    ExpandedData = Table.ExpandTableColumn(DataOnly, "Data", Table.ColumnNames(DataOnly[Data]{0}))
                in
                    Table.PromoteHeaders(ExpandedData, [PromoteAllScalars = true]),
                Table.Type
            }
        ),
        CombinedTables = Table.Combine(ExtractedData[Content])
    in
        CombinedTables

    If they consist of Table within workbooks, same as above after replacing steps TargetName & ExtractedData:

    …
        TargetName = "TableName",
    …
        ExtractedData = Table.TransformColumns(BinariesOnly,
            {"Content", each
                let
                    ExcelContent = Excel.Workbook(_, null, true),
                    SelectedTarget = Table.SelectRows(ExcelContent,
                        each ([Kind] = "Table") and ([Item] = TargetName)),
                    DataOnly = Table.SelectColumns(SelectedTarget, {"Data"})
                in
                    Table.ExpandTableColumn(DataOnly, "Data", Table.ColumnNames(DataOnly[Data]{0})),
                Table.Type
            }
        ),

    • Marked as answer by khizer.daar Wednesday, November 20, 2019 6:12 AM
    Monday, November 18, 2019 10:30 PM
  • Lz replied for external excel files. Here the code for external CSV files:

    let
    	Source = Folder.Files("PathToFilesToCombine"),
    	Filter = Table.SelectRows(Source, each
    		try Text.Lower([Extension])=".csv" // [OPTIONAL] and Text.Lower([Name])="YOUR FILTER PARAMETERS"
    		otherwise false
    	)
    	fn_ImportContent = (myBinary as binary) as table => Table.PromoteHeaders(Csv.Document(myBinary, [Delimiter=",", /* [OPTIONAL}Columns=00,*/ Encoding=1252, QuoteStyle=QuoteStyle.Csv])),
    	Content = Table.Combine(List.Transform(Filter [Content], each fn_ImportContent(_)))
    	in Content


    • Proposed as answer by anthony34 Wednesday, November 20, 2019 10:41 AM
    • Edited by anthony34 Monday, November 25, 2019 8:22 PM
    Tuesday, November 19, 2019 7:21 AM
  • This is the solution I was looking for thanks! Just to confirm, whenever I plan to combine a series of workbook tables, I need to ensure all tables are named exactly the same right?

    Wednesday, November 20, 2019 6:20 AM
  • You can combine all the files that you want, whatever their filenames. The "Filter"step let you select the list of files that you want to combine. I marked it [OPTIONAL]  so that you understand you can personalize this part to your own needs.


    • Proposed as answer by Lz._ Wednesday, November 20, 2019 10:30 AM
    • Edited by anthony34 Wednesday, November 20, 2019 10:41 AM
    Wednesday, November 20, 2019 7:04 AM
  • Just to confirm, whenever I plan to combine a series of workbook Tables, I need to ensure all Tables are Named exactly the same right?

    Hi

    (Anthony already replied for CSVs)

    If you have a single Sheet or Table to consolidate in each workbook, ensuring consistent names make your life easier as you can use a generic query as the one posted above

    If Sheet/Table names are different in each workbook - or if you have to consolidate more than 1 Sheet/Table per workbook - then you need to implement a different logic within the query

    Example: a folder has 2 Excel workbooks (January.xlsx & March.xlsx). 1st one has a Table named January, 2nd one has a Table named March. To combine these 2 Tables you could do:

    let
        Source = Folder.Files("PathToFilesToCombine"),
        FilteredOutHiddenFiles = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
        ContentAndName = Table.SelectColumns(FilteredOutHiddenFiles, {"Content","Name"}),
        ExtractedData = List.Transform(List.Positions(ContentAndName[Content]), each
            let
                TableNameToCombine = Text.BeforeDelimiter(ContentAndName[Name]{_},"."),
                ExcelContent = Excel.Workbook(ContentAndName[Content]{_}, null, true),
                SelectedTarget = Table.SelectRows(ExcelContent, each
                    ([Kind] = "Table") and ([Item] = TableNameToCombine)
                ),
                DataOnly = Table.SelectColumns(SelectedTarget, {"Data"})
            in
                Table.ExpandTableColumn(DataOnly, "Data", Table.ColumnNames(DataOnly[Data]{0}))
        )
    in
        Table.Combine(ExtractedData)



    Wednesday, November 20, 2019 12:11 PM