none
PQ dropped column on import RRS feed

  • General discussion

  • It seems I’ve found a critical bug with PowerQuery. PQ reads an .xls file from one of our banks for which it generates accounting journal entries. This file uses sixty-one columns (A through BI). Column A is entirely blank, as are a few of the other columns.

    Reading the file initially works ok—column A maps to Column1, column B maps to Column B, etc.

    However, if this .xls file is opened/saved with NO MODIFICATIONS (either as an .xls or an .xlsx), then PQ will entirely drop the blank column A and begin reading with Column B mapping to Column1, column C mapping to Column2, etc. PQ will not acknowledge the existence of the blank column A, thereby rendering the remaining M code incorrect.

    I’ve searched for all variants of this issue and haven’t yet found a resolution.
    Monday, January 22, 2018 9:55 PM

All replies

  • Hey,

    Would you mind sharing your query and the source file so we can replicate the issue described?

    Thanks!

    Monday, January 22, 2018 11:32 PM
  • Absolutely I can! What's the best way of sharing/uploading these two (rather small) spreadsheet files? I'd also like to add a couple of JPG screen prints. If you send me your email, then I can respond with the docs.

    01/22/2018  03:39 PM            34,304 PAR DailySettle_External 20171130NEWorig.xls
    01/19/2018  03:16 PM            38,400 PAR DailySettle_External 20171130NEWsaveas.xls

    This PQ anomaly is the same on either Excel 2013 or Excel 2016.

    Tuesday, January 23, 2018 2:26 PM
  • Absolutely I can! What's the best way of sharing/uploading these two (rather small) spreadsheet files? I'd also like to add a couple of JPG screen prints. If you send me your email, then I can respond with the docs.

    01/22/2018  03:39 PM            34,304 PAR DailySettle_External 20171130NEWorig.xls
    01/19/2018  03:16 PM            38,400 PAR DailySettle_External 20171130NEWsaveas.xls

    This PQ anomaly is the same on either Excel 2013 or Excel 2016.

    Hey!

    Would you mind uploading them to a public place like Dropbox, Google Drive, OneDrive or any of those? You can later post the url to that file here. That way everyone who's looking at this thread can benefit from it and take a look at the issue and possible solution.

    Tuesday, January 23, 2018 7:21 PM
  • Waiting to get my account verified before I can post images or links...

    Wednesday, January 24, 2018 4:11 PM
  • Waiting to get my account verified before I can post images or links...

    I believe that you could post the link as clear text here. No need to wait for that verification.

    Often people write the url like:

    htt ps  social  dot technet dot microsoft dot com /Forums/en-US/d8d80d19-d85a-4468-bead-caa9c3002ba6/pq-dropped-column-on-import?forum=powerquery

    which usually doesn't get picked up by the system as an url.

    Friday, January 26, 2018 2:46 AM
  • We receive thousands of banking spreadsheets monthly that must be converted into general ledger accounting entries. A sample screen shot is included below for the file “DailySettle20171130-orig.xls”:


    Please note the following:

    1.       Column A is entirely blank of values or formulas.
    2.       The “actionable” data begins in Column B (row 17) under the column heading “Branch Code”.
    3.       The Excel formatting is hierarchical, ugly and sub-optimal—that’s the kind of external document we receive from banks for which we have no control. Cells are merged and sometimes misaligned.

    Opening up a new Excel spreadsheet (either v2013 or v2016) and opening this file in PowerQuery results in this spreadsheet. Everything looks good. Column A correctly maps to Column1, etc.


    However, when the original spreadsheet is opened and saved (with NO MODIFICATIONS whatsoever)—either using its original name, SaveAs a different name or SaveAs type=xlsx, the PowerQuery behavior is entirely different.

    The second file “DailySettle20171130-saveas.xls” is simply the original file (as received from the bank) opened and resaved. This file has an identical look-and-feel to the original file, i.e. Column A is blank, and the data begins in Column B, etc.

    PowerQuery opens the SaveAs file as follows:


    Please note the following:

    1.       Column A from the .xls file is no longer available for processing—with any amount of manipulation of M code.
    2.       Column B is imported as Column1 in PowerQuery.
    3.       All of the exacting transformations and formulas that were based on the original file, and that were column-specific are now incorrect.

    In further research I’ve found other spreadsheets that PowerQuery has handled with the same inconsistency. And I’ve not yet found other online narrative or any resolution for this issue.

    These two spreadsheets may be downloaded from:

    https://houseloan.sharefile.com/i/i212a04b8338403e9

    Friday, January 26, 2018 3:13 PM
  • These two spreadsheets may be downloaded from:

    https://houseloan.sharefile.com/i/i212a04b8338403e9

    Hey! When I click on that link it takes me to a form that asks me for my personal information. Do you think that you could share the file publicly? (I've found that Google Drive and Dropbox work best for publicly sharing the files)

    I'm unable to download the files because of this form.

    Friday, January 26, 2018 11:02 PM
  • Please try this link:

    https://www.dropbox.com/sh/qlc4osm5xa5svs0/AABIS9J9PNfnypyu1KeGp7c5a

    Monday, January 29, 2018 1:05 AM
  • Hey,

    Apologies for the delay. I downloaded the file, but I was unable to replicate the situation. I never got the Column A to load and it appears that in this case Power Query only takes in consideration the columns where it finds data (starting from Column B).

    Maybe the first time that you connected to the file the column A had some values? that's my only guess at this time.

    No matter what I try, I always end up with the same situation where the values from Column B appear as Column 1 and this to me appear to be working correctly.

    Here's a screenshot of what I see:

    Sunday, February 18, 2018 11:46 AM
  • A possible workaround for the problem is to delete any trailing empty first column as the very first steps:

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WUtJRSlSK1QEzymCMdKXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
        ToColumns = Table.ToColumns(Source),
        Select = if List.Distinct(ToColumns{0})={""} then List.Skip(ToColumns,1) else ToColumns,
        Reassemble = Table.FromColumns(Select)
    in
        Reassemble

    The problem might be the same than described here: http://excel-inside.pro/blog/2017/05/23/excel-sheet-as-a-source-to-power-query-and-power-bi-a-pitfall-of-usedrange/


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!


    Sunday, February 18, 2018 7:51 PM
    Moderator