none
Multiple CSVs from Folder result in multiple header rows

    Question

  • When importing multiple CSV files "From Folder" you can tell PQ to "Use First Row As Headers" which usually works just fine for the first file

    the problem is that the header row of all other CSV files that get imported afterwards do not get removed but result in a new row with the CSV headers as values

    from my point of view the idea of using "From Folder" is to import several files with the same format - e.g. with headers - so i would expect the headers to get removed from every file and not just from the first?!?

    i know there are workarounds like filtering out records where [MyColumn] = "MyColumn" but thats not really convenient and also does not guarantee correct results as "MyColumn" may also be a real value and not only be used in the header

    any feedback on this behaviour?

    -gerhard


    Gerhard Brueckl
    blogging @ http://blog.gbrueckl.at
    working @ http://www.pmOne.com

    Thursday, January 30, 2014 9:21 AM

Answers

  • You can achieve this by writing M code:

    let
        Source = Folder.Files("F:\csvs"),
        Tables = List.Transform(Source[Content], each Table.PromoteHeaders(Csv.Document(_,null,null,null,1252))),
        SingleTable = Table.Combine(Tables)
    in
        SingleTable

    This specific code will only work if the header rows are all the same, but the same principle can be used to perform just about any processing on a batch of files. The key is to keep the individual file contents in a list until they're sufficiently compatible to do a Table.Combine on them.

    Saturday, February 01, 2014 2:05 PM

All replies

  • Good feedback. That seems like something we should be able to handle in a simpler manner. I've logged this so that we can try to come up with a better solution. Thanks!
    Saturday, February 01, 2014 12:12 AM
  • You can achieve this by writing M code:

    let
        Source = Folder.Files("F:\csvs"),
        Tables = List.Transform(Source[Content], each Table.PromoteHeaders(Csv.Document(_,null,null,null,1252))),
        SingleTable = Table.Combine(Tables)
    in
        SingleTable

    This specific code will only work if the header rows are all the same, but the same principle can be used to perform just about any processing on a batch of files. The key is to keep the individual file contents in a list until they're sufficiently compatible to do a Table.Combine on them.

    Saturday, February 01, 2014 2:05 PM

  • thanks for the feedback - i know that i can write custom M code to do those kind of stuff
    point is, that it is not enduser friendly anymore and makes Power Query loose most of its charm

    and i think it is a very basic requirement and should be integrated into the wizard of loading multiple files "From Folder"

    if the wizard would incorporate your code that would be enough for me :)

    -gerhard


    Gerhard Brueckl
    blogging @ http://blog.gbrueckl.at
    working @ http://www.pmOne.com

    Saturday, February 01, 2014 4:35 PM
  • @Ben: could you please update this posts if/when there was something done in the source-code to address this issue

    thank you!

    -gerhard


    Gerhard Brueckl
    blogging @ http://blog.gbrueckl.at
    working @ http://www.pmOne.com

    Sunday, February 02, 2014 10:46 AM
  • +1 to Gerhard's suggestion - I've run into this problem too and thought it wasn't very user-friendly.

    Chris


    Check out my MS BI blog I also do SSAS, PowerPivot, MDX and DAX consultancy and run public SQL Server and BI training courses in the UK

    Monday, February 03, 2014 5:00 PM
  • "i know there are workarounds like filtering out records where [MyColumn] = "MyColumn" but thats not really convenient and also does not guarantee correct results as "MyColumn" may also be a real value and not only be used in the header"

    Surely, one can tell from the contents of the column, the likelihood that a value will be the same as the header? Plus, what is the likelihood that all columns will contain a value that is the same as each column's respective header? You need only to filter one column for all those extra headers to disappear.

    A bigger issue is having to change the column types after filtering out the headers because it involves additional work. 

    I don't know what kind of solution the developers can come up with to make the process simpler though. How do you account for the possibility that some files might have no header row, and what do you do if some files have different names for headers? I think that the dev. team decided to take a safe route and leave the choices for user to work out.

    Tuesday, February 11, 2014 3:20 PM
  • On further reflection, I think that the scenario with files having different headers or having missing headers should be the exception rather than the rule. Therefore, an automated script along the lines of Curt's suggestion is better than the script currently generated. Furthermore, the new script would (hopefully) also permit better inference of the column types.
    Tuesday, February 11, 2014 4:45 PM