none
Missed column from XLSX worksheet on import

    Question

  • Hi All

    Meet a strange behavior on imoprt data from XLSX file.

    I've got a sample file from my client, this file contains a lot of sheets, some of them (say two) with needed data. These two sheets have the same structure (checked, made by copying sheet and fill with other data). Say I need data starting from column 12 and row 2, from each of these two sheets. First two columns on each of these sheets is empty.

    I wrote a custom function that extracts needed columns and rows from sheets.

    But then I realised that for one sheet PQ imported all columns correctly, but for other it missed first column, therefrore my custom function also missed needed data. I didn't used any of column enumerations in code and cant accuse some of the functions that they limited allowed number of column names in code.

    I've checked on preview and there is the same problem: After invoking just Excel.Workbook I can see that "Data" column has the next preview for the first sheet: 

    but for the second sheet it looks like this:

    somehow PQ ignores 1st (or 2nd? both empty) columns on one of the sheets, therefore all other data on this sheet shifts to the left.

    It is first time I meet this behavior. Is it a bug or some unrecognized feature?

    I solved somehow this problem with much more expensive code then planned, but would like to undrstand, where is the catch

    Thanks!


    Maxim Zelensky Excel Inside

    Saturday, May 13, 2017 11:40 PM

Answers

  • Hi Maxim,

    It's likely because Excel is storing the empty cell values in the first case, but not in the second. Excel only stores cell values where the user has entered or modified data (to avoid storing millions of empty cells for each sheet). My guess is that the blank cells in the first sheet were modified at some point in their history, while the missing cells in the second sheet were not.

    Ehren


    Friday, May 19, 2017 11:19 PM
    Owner

All replies

  • Hi Maxim,

    It's likely because Excel is storing the empty cell values in the first case, but not in the second. Excel only stores cell values where the user has entered or modified data (to avoid storing millions of empty cells for each sheet). My guess is that the blank cells in the first sheet were modified at some point in their history, while the missing cells in the second sheet were not.

    Ehren


    Friday, May 19, 2017 11:19 PM
    Owner
  • Hi Ehren!

    Thank you for the answer.

    Yes, you are right, PQ takes UsedRange from sheet as a table. Checked it, found that in one sheet UsedRange sarts from B1. Cleaned these empty columns again (via "Clean All") to be sure there no other formats or "cell history", then UsedRange became equal and PQ start to take equal tables.

    But this is not good. It looks like the best solution for PQ is to take cells starting at A1 and then look for the last filled (i.e. with values in it) row/column on the sheet. 

    What is the sense to look after UsedRange, which also can include formatted or "historic" cells, if PQ don't see the formats, notes, etc?

    With UsedRange we can then miss actual sheet structure and cannot perform equal transformations for sheets with the same data location, like in my example.

    Maxim


    Maxim Zelensky Excel Inside

    Saturday, May 20, 2017 11:31 AM
  • Hi Maxim. Your suggestion has some advantages to be sure. One disadvantage, however, is that in the common case (i.e. sheets with leading blank columns that aren't part of UsedRange), most users expect to just see their data when importing, without all the blank columns.

    Ehren


    Tuesday, May 23, 2017 6:45 PM
    Owner
  • Hi Ehren.

     most users expect to just see their data when importing, without all the blank columns.

    yes, when it is about one file - no problem. But how one can predict, whether these blank columns or rows will be imported or not?

    If there is a batch load, then it is more complex/impossible to check. I described this problem there: http://excel-inside.pro/blog/2017/05/23/excel-sheet-as-a-source-to-power-query-and-power-bi-a-pitfall-of-usedrange/

    and post the idea on Power BI community site: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/19374004-allow-to-select-custom-data-range-on-excel-sheet-o

    May be import starting from A1 is not the best solution, but a possibility to optionally select a range address or top-left cell could be implemented. May be a little bit less performance at connection but it will definitely save performance and redice risk of errors on transformations.

    Thank you for answers and support!


    Maxim Zelensky Excel Inside

    Thursday, May 25, 2017 4:36 PM