none
Consolidate data from multiple CSV files

    Question

  • Hi,

    I have multiple CSV files stored in a certain folder.  I have been able to append data from these multiple CSV files into a single worksheet of another workbook by using the following script in Power Query

    let
        Source = Folder.Files("C:\Users\Ashish\Desktop\Monthly Data"),
        CombinedBinaries = Binary.Combine(Source[Content]),
        ImportedCSV = Csv.Document(CombinedBinaries),
        FirstRowAsHeader = Table.PromoteHeaders(ImportedCSV),
        FilteredRows = Table.SelectRows(FirstRowAsHeader, each ([Invoice_number] <> "Invoice_number")),
        RemovedColumns = Table.RemoveColumns(FilteredRows,{"_1", "_2", "_3", ""})
    in
        RemovedColumns

    The result is absolutely correct.  Here are my queries

    1. Is there a way to add another column to this appended table which will show me the name of the workbook of each row of data. I can obviously add a column to each CSV file and then the script above will show me that additional column but I want to avoid doing using a spare column in my base CSV files. This is mainly because it would be a manual exercise and would also be time consuming.

    2. Similar to the question above, is there a way to add yet another column to the appended table above, which will show me the name of the worksheet of each row of data

    3. If there are multiple worksheets in each CSV file, then can the script above be modified to specify the name of the worksheet to be used for appending data.  So, is there a way to say that pick data from the worksheet titled "Data" (there will be many other worksheets in each CSV file but the Data worksheet will always be there) from all CSV files.

    Thank you.


    Regards, Ashish Mathur Microsoft Excel MVP www.ashishmathur.com

    Saturday, August 24, 2013 1:47 AM

Answers

All replies

  • Only replying to the parts of the question  about the name of the sheet in the CSV and about having multiple sheets in one CSV.

    The answer is that a CSV cannot have multiple sheets and also there is no way to rename the sheet name .

    Saturday, August 24, 2013 4:17 PM
  • Hi,

    Sure.  Thank you.


    Regards, Ashish Mathur Microsoft Excel MVP www.ashishmathur.com

    Saturday, August 24, 2013 11:18 PM
  • Hi Ashish,

    Replying to the first question since Dany took care of the other two (thanks Dany!):

    There is no direct support for adding a new column with "lineage" information via Binary.Combine, however, one way you could go about it is parameterizing your transformation steps to take a file path as input by turning your query into a function, and invoke this function over a table containing the list of files in this folder.

    At a high level, this is what it would look like:

    a) Function: (call it ReshapeOneFile)

    = (filePath) =>

        ImportedCSV = Csv.Document(filePath),
        FirstRowAsHeader = Table.PromoteHeaders(ImportedCSV),
        FilteredRows = Table.SelectRows(FirstRowAsHeader, each ([Invoice_number] <> "Invoice_number")),
        RemovedColumns = Table.RemoveColumns(FilteredRows,{"_1", "_2", "_3", ""})
    in
        RemovedColumns

    b) List of files + Invoke:

    let
        Source = Folder.Files("C:\Users\Ashish\Desktop\Monthly Data"),
        MergedColumns = Table.CombineColumns(Source,{"Folder Path", "Name"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
        RemovedOtherColumns = Table.SelectColumns(MergedColumns,{"Merged"}),
        InsertedCustom = Table.AddColumn(RemovedOtherColumns, "", each ReshapeOneFile([Merged]))

    in

       InsertedCustom

    At this point, you will have an expandable column with the contents of each file, one per row. If you expand all columns, you will end up with a UNION of all files and the first column on each row ("Merged") contains the filepath from which each row comes.

    Hope this helps.

    M.

    Monday, August 26, 2013 3:27 PM
  • Hi,

    Thank you for replying.  I do not know where to add the a) portion of your answer.  Could you be descriptive.

    Thank you.


    Regards, Ashish Mathur Microsoft Excel MVP www.ashishmathur.com

    Monday, August 26, 2013 11:53 PM
  • Hi,

    Enable "Advanced Query Editing" in the Options dialog (if you haven't done it yet). Then, go to "From Other Sources" and the last option will be "Blank Query". Once in the Query Editor dialog, you can click the little script icon in the right end of the fx bar. In the new dialog, you can paste in the first query, starting from the "= (filepath)" part.

    Related help resources:

    Thanks,
    M.

    Tuesday, August 27, 2013 12:25 AM
  • Hi,

    I just cannot get it to work - Is it possible for you to upload your file to SkyDrive.  I just need to see how the function has to be written.


    Regards, Ashish Mathur Microsoft Excel MVP www.ashishmathur.com

    Tuesday, August 27, 2013 10:17 AM
  • Hello Ashish, your post helped me create a Power Query to combine CSV files. Do you know if there a way to do the same type of importing using Excel files instead of CSV files ? 

    Thanks,

    Nick

    Monday, September 02, 2013 6:50 AM
  • Hi,

    No.  I do not.  Perhaps someone from the Power Query team will shed some light on this.


    Regards, Ashish Mathur Microsoft Excel MVP www.ashishmathur.com

    Monday, September 02, 2013 6:52 AM
  • Hi Ashish,

    I've uploaded the workbook to this location: http://sdrv.ms/15x5h8q

    Please note that it currently gives an error on Query1 given that I don't have that file path on my machine. Opening it in your machine and refreshing the query should work.

    Thanks,
    M.

    Wednesday, September 04, 2013 2:31 AM
  • Thank you.  I see that this is fairly complicated - one has to first manually write a function 9ReshapeOnFile) and then use that to get the file path.

    I will try to learn this and revert with further queries.


    Regards, Ashish Mathur Microsoft Excel MVP www.ashishmathur.com

    Thursday, September 05, 2013 11:18 PM
  • Hi Ashish,

    Did you get the multiple import of CSV files working for you ? I tried to setup a scenario very similar to yourself but was unable to make it work. If you did get it to work , can you send me your CSV files plus Power Query Excel s/sheet (I use Excel 2010 with PQ add-on).

    Thank you,

    Nick Aubrey

    <object height="1" id="plugin0" style=";z-index:1000;" type="application/x-dgnria" width="1"><param name="tabId" value="{C29392BA-0D80-4E13-B262-9FD121A4E9EC}" /></object>
    Sunday, October 20, 2013 5:40 AM
  • Hi,

    I do not quite remember - it's been a while now.  Since that was not a live project I was working on, I did not save any files.


    Regards, Ashish Mathur Microsoft Excel MVP www.ashishmathur.com

    Tuesday, October 22, 2013 1:59 AM