none
PowerQuery and Data from Folder RRS feed

  • Question

  • I have been testing the data from folder with the expectation that we'll be adding data files of exported data from our time mgt program.

    I have a report that exports the data I want into .csv format but it does include some headers and footers, eg

    Row1:  PTD From 1/1/2018 To 12/31/2018 : YTD From 1/1/2018 To 12/31/2018
    Row2: blank
    Row3: column headers
    Row4: first row data
    .
    .

    Row100: last row of data
    Row101: Grand Total: (footer)
    Row102: blank
    Row103: blank
    Row104: [Date/Time] of report


    Now I have thought for ease of use I'd leave the data in its raw format to avoid manually formatting issues/errors and I have successfully used PowerQuery features to clean up the top/bottom rows, set the actual headers, delete blank columns. This all works fine but when I add a new data file with the exact format I get errors. When I open the error it is listing Row Number 1183 and the data is the column header row which looks like data type errors on some of the data columns which makes sense if its putting text in a number column.

    What I have noticed is if I delete the column header row from the 2nd data file the errors go away.

    Question

    Shouldn't each new file be exactly in the same format or do all subsequent data files only need to have raw data columns in the same order?


    Stunpals - Disclaimer: This posting is provided "AS IS" with no warranties.

    • Moved by Perry-Pan Thursday, July 4, 2019 8:19 AM
    Wednesday, July 3, 2019 9:09 PM

Answers

  • HI Stunpals. Based on the M you shared above, you need to promote the headers inside the "Transform Sample File From Billings (2)" query, not the query you shared above.

    Ehren

    Thursday, July 18, 2019 4:56 PM
    Owner

All replies

  • Hi,

    Since this issue is more related to PowerQuery, I'm moving it to a PowerQuery forum:

    https://social.technet.microsoft.com/Forums/en-US/home?forum=powerquery

    The reason why we recommend posting appropriately is you will get the most qualified pool of respondents, and other partners who read the forums regularly can either share their knowledge or learn from your interaction with us. Thank you for your understanding.

    If you have questions about Office, you can post to our forum in the future, we will be glad to help you solve it.

    Regards,

    Bella Wu


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Click here to learn more. Visit the dedicated forum to shareexplore and talk to experts about Microsoft Office 2019. 

    Thursday, July 4, 2019 8:18 AM
  • I have confirmed that when adding the 2nd file of data if I don't remove the column header rows it will error out. When I do a cleanup of the 2nd file and just have raw data in the same columns and order it adds the new records fine.

    This seems strange to me, my reading suggests you need to add the subsequent files in the exact same format as the original which would be easier for staff to run the export from our billing software and just drop the new file in folder.


    Stunpals - Disclaimer: This posting is provided "AS IS" with no warranties.

    Thursday, July 4, 2019 4:51 PM
  • Your expectations are correct.

    Are you able to share sample data that can reproduce the behaviour you're seeing? Please make sure to remove any sensitive data.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!


    Sunday, July 7, 2019 5:20 AM
    Moderator
  • Testing

    Getting an "Unexpected Error" when trying to submit my response.


    Stunpals - Disclaimer: This posting is provided "AS IS" with no warranties.

    Thursday, July 11, 2019 8:41 PM
  • I'll have to look and probably sanitize the data, its nothing complicated, here are the columns, most are just numbers (decimal for $ amounts). The export does have several empty columns but this is exactly the same in the 2nd added data file.

    The first "sample" file I load into the folder and PQ with it's columns and I clean-up the empty columns and add a column or 2 for subtracting dates, etc.

    If I leave the header columns in the 2nd data file that is placed in the same folder I get an error and when I edit the PQ I see the data listed in each column is the same as the header. If I open the 2nd file and remove the headers and refresh its works fine.

    Client No.  		Work Type  	Service  	WIP Date  	Inv Post Date  		PTD Adjust  	PTD Amounts  		PTD Billed WIP  	PTD Cost  	PTD Exp  		PTD Fees  	PTD Firm Adj  	PTD Effect Rate  	PTD Hrs  	PTD Invoiced Amt  	PTD Progress Bills  		PTD Real %  	PTD Real % w/o Expenses  	PTD Surchg  	PTD Total Cost  	PTD Amts  
    

    If I am able to provide a sample of data, how do you do that?


    Stunpals - Disclaimer: This posting is provided "AS IS" with no warranties.

    Thursday, July 11, 2019 8:41 PM
  • Hi Stunpals. My guess is that you're combining the files via some mechanism that causes the header rows to repeat. If you apply a function to each file which promotes headers and does the other necessary operations, and then append the results, you shouldn't have this issue. If you combine files from a Folder via our built-in "Combine" flow, we do this for you.

    Ehren

    Wednesday, July 17, 2019 11:41 PM
    Owner
  • Ehren, there is only one header row per file. There are 2 individual files in the folder. As I mentioned above the raw export from our billing software does add 2 rows at the top for dates and report names and 4 rows at the bottom with some totals and page numbers.

    I decided to manually remove these extra columns leaving only the header row after many attempts to use PQ actions to clean these up but continued to have issues. But even when I add a second file with header rows it gets the error.

    Sample files

    Refresh Error

    Now as soon as I remove the header row from the 2nd file it refreshes fine with no errors.

    Another thing I noticed is the tab name which must be set on the export, if I change it at all on the 2nd data file, I will get an error;

    Since this is just a test I an not worried about the tab at this point and may change to csv exports vs the excel.

    So as you can it is not working as everyone is saying but the bright side if I remove the headers manually in the second file it works fine. I have not tried a 3rd file at this point.

    Here is the PQ advanced code:

    let
        Source = Folder.Files("[Mapped_Drive_Path]\Billings"),
        #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
        #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Billings (2)", each #"Transform File from Billings (2)"([Content])),
        #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
        #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from Billings (2)"}),
        #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Billings (2)", Table.ColumnNames(#"Transform File from Billings (2)"(#"Sample File (3)"))),
        #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type any}, {"Column3", type text}, {"Column4", type any}, {"Column5", type any}, {"Column6", type any}, {"Column7", type any}, {"Column8", type any}, {"Column9", type any}, {"Column10", type any}, {"Column11", type any}, {"Column12", type any}, {"Column13", type any}, {"Column14", type any}, {"Column15", type any}, {"Column16", type any}, {"Column17", type any}, {"Column18", type any}, {"Column19", type any}, {"Column20", type any}, {"Column21", type any}, {"Column22", type any}, {"Column23", type any}, {"Column24", type any}, {"Column25", type any}, {"Column26", type any}}),
        #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Source.Name"}),
        #"Promoted Headers" = Table.PromoteHeaders(#"Removed Columns", [PromoteAllScalars=true]),
        #"Removed Columns1-Blanks" = Table.RemoveColumns(#"Promoted Headers",{"Column2", "Column7", "Column10", "Column14", "Column21"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Removed Columns1-Blanks",{{"PTD Amts  ", type number}, {"PTD Total Cost  ", type number}, {"PTD Surchg  ", type number}, {"PTD Real % w/o Expenses  ", Percentage.Type}, {"PTD Real %  ", Percentage.Type}, {"PTD Progress Bills  ", type number}, {"PTD Invoiced Amt  ", type number}, {"PTD Hrs  ", type number}, {"PTD Effect Rate  ", type number}, {"PTD Firm Adj  ", type number}, {"PTD Fees  ", type number}, {"PTD Exp  ", type number}, {"PTD Cost  ", type number}, {"PTD Billed WIP  ", type number}, {"PTD Amounts  ", type number}, {"PTD Adjust  ", type number}, {"Inv Post Date  ", type date}, {"WIP Date  ", type date}, {"Service  ", type text}}),
        #"Inserted Date Subtraction" = Table.AddColumn(#"Changed Type1", "Subtraction", each Duration.Days([#"Inv Post Date  "] - [#"WIP Date  "]), Int64.Type),
        #"Renamed Columns" = Table.RenameColumns(#"Inserted Date Subtraction",{{"Subtraction", "Days to Invoice"}}),
        #"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Client No.  ", "Work Type  ", "Service  ", "WIP Date  ", "Inv Post Date  ", "Days to Invoice", "PTD Adjust  ", "PTD Amounts  ", "PTD Billed WIP  ", "PTD Cost  ", "PTD Exp  ", "PTD Fees  ", "PTD Firm Adj  ", "PTD Effect Rate  ", "PTD Hrs  ", "PTD Invoiced Amt  ", "PTD Progress Bills  ", "PTD Real %  ", "PTD Real % w/o Expenses  ", "PTD Surchg  ", "PTD Total Cost  ", "PTD Amts  "})
    in
        #"Reordered Columns"
    NOTE: I did reduce the row count and replace the data rows with some fake data for the test above. But the exact thing occurs with the real data that is over 120,000 rows of data between the 2 files.


    Stunpals - Disclaimer: This posting is provided "AS IS" with no warranties.



    • Edited by Stunpals Thursday, July 18, 2019 2:41 PM correction
    Thursday, July 18, 2019 2:36 PM
  • Also regarding the link, these are the same steps I have been taking with the exception that I am using Combine & Edit, which I believe only provides a better view in to the data files allowing you to set filters on file types, names, etc. 

    I may plan on using that to filter extension types later.

    Also when I started there is only one data file in the folder and I perform the get data from folder, go through the PQ cleanup of empty columns, add a few columns, etc.

    I am noticing if I start all over with 2 files initially in the folder, both with header rows, and then do the Get Data from Folder, it loads into PQ fine. And I can add a 3rd file without errors on the headers.

    To me this seems like a bug that you must start with 2 data files before you setup your PQ?


    Stunpals - Disclaimer: This posting is provided "AS IS" with no warranties.

    Thursday, July 18, 2019 3:02 PM
  • HI Stunpals. Based on the M you shared above, you need to promote the headers inside the "Transform Sample File From Billings (2)" query, not the query you shared above.

    Ehren

    Thursday, July 18, 2019 4:56 PM
    Owner
  • To clarify, the screenshots are from some example files I made just for the screenshots. Same columns and layout just bogus data.

    Now that I am looking at that I am confused (can't recall) why I even had to promote the headers. I know on my first attempts with the raw data exports where there were additional top rows, I had to delete them and then promote the first row as headers. But after doing a manual clean-up on the exported data BEFORE using PQ that should not have been necessary.

    I did a quick test with a blank Excel starting from scratch again and the first row is automatically the header and doesn't need to be promoted. And when I add the 2nd data file (with header rows included) it updates as expected.

    There must be something messed with my Excel file that is linking to this data and I need to start from scratch.


    Stunpals - Disclaimer: This posting is provided "AS IS" with no warranties.

    Thursday, July 18, 2019 5:57 PM