none
Bug in Excel.Workbook(Web.Contents()) PowerQuery RRS feed

  • Question

  • Hi,

    I'm loading data from the following workbook. 

    let    Source = Excel.Workbook(Web.Contents("https://www.deutsche-boerse-cash-market.com/resource/blob/1948736/27dd521f2eb21d7df05c18ab5e2b9fb6/data/Order_Book_Statistics_FWB_XetraClose.20200421.xls"), null, true),    Cover = Source{[Name="Cover"]}[Data],    #"Filtered Rows" = Table.SelectRows(Cover, each Text.Contains([Column1], "Reported Date")),    AsofDate= #"Filtered Rows"{0}[Column2],    #"Order Book TO - DAX Equities1" = Source{[Name="Order Book TO - DAX Equities"]}[Data],    #"Removed Top Rows" = Table.Skip(#"Order Book TO - DAX Equities1",12),    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),    #"Removed Bottom Rows" = Table.RemoveLastN(#"Promoted Headers",1),    #"Removed Columns" = Table.RemoveColumns(#"Removed Bottom Rows",{"Open", "High", "Low", "Mixed Rate", "Order Book Turnover in Units", "Order Book Turnover in Euro ", "VWAP", "Open_1", "High_2", "Low_3", "Mixed Rate_5", "Order Book Turnover in Units_6", "Order Book Turnover in Euro _7", "VWAP_8"}),    #"Added Custom" =  Table.AddColumn(#"Removed Columns", "Date", each AsofDate),    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Date", "Reporting Instrument", "Column2", "Close", "Close_4"}),    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"Close_4", Number.Type}}),    #"Added Custom1" = Table.AddColumn(#"Changed Type", "Custom", each Number.Round([Close_4],3))in    #"Added Custom1"

    The last cell in the fourth row should show 48.505, but is rounded to 48.51. I've tried changing the type, cycling through locales, toggling 'Preserve cell formatting', but nothing seems to work. The cell in question in the original .xls file is in the 'Order Book TO - Dax Equities' worksheet in cell N17. 

    The original worksheet has about 12 blank cells before the data starts for that column, which might be affecting the way Excel transforms the data in the Excel.Workbook(Web.Contents()) call. 

    How can I import this data without losing decimal places?

    Thanks!



    • Edited by Tom2019_200 Wednesday, April 22, 2020 9:36 AM
    Tuesday, April 21, 2020 11:36 PM

Answers

  • Hi Tom. When reading from legacy xls files, we use a library called ACE. Unfortunately, ACE only returns the formatted values, not the underlying values. So if a cell is formatted to only show two decimal places (as cell N17 on your sheet), ACE will return it to PQ that way.

    As Aleksey said, you can work around this by changing the formatting of the cells in the xls file. You can also switch to using xlsx, which is not imported using ACE and which allows PQ to access the full underlying value of each cell.

    Ehren

    Thursday, April 23, 2020 9:16 PM
    Owner

All replies

  • Hi, Tom

    Problem in cell format and in file format ".xls".

    But you can download file and change format of cells to 3 decimal digits instead of 2. Save the file and make query to the edited file on the disk.

    In that case PQ'll read values from cells correctly.

    Wednesday, April 22, 2020 12:42 PM
  • Hi Aleksey,

    Thanks for the reply. 

    I have to spin off this query for multiple workbooks. Is there any way to pick up the raw 3 decimals from the original workbook? 

    The only other way I can think of doing this is running VB on Workbook.open to download the files and parse them that way, but I'm reluctant to start doing that. . 

    Thanks again.

    Wednesday, April 22, 2020 1:08 PM
  • No, it's a feature (not a bug ;) ) of Excel.Workbook() connector. When it reads data from binary files, such as .xls/.xlsb there's such a problem with cell formatting. And there's a lot of other "features" with that types of file, try to not use them as a source for Power Query.
    Wednesday, April 22, 2020 1:28 PM
  • Hi Tom. When reading from legacy xls files, we use a library called ACE. Unfortunately, ACE only returns the formatted values, not the underlying values. So if a cell is formatted to only show two decimal places (as cell N17 on your sheet), ACE will return it to PQ that way.

    As Aleksey said, you can work around this by changing the formatting of the cells in the xls file. You can also switch to using xlsx, which is not imported using ACE and which allows PQ to access the full underlying value of each cell.

    Ehren

    Thursday, April 23, 2020 9:16 PM
    Owner
  • Hi Ehren,

    Thanks for the detailed reply.

    Unfortunately, I've no control over the formatting or file type of the Excel workbook, so I'll just have to code up some kind of workaround. I'll try to download the file in VB, save it to a temporary file as a different type and proceed with Power Query, or at worst, parse and paste the values into a worksheet. 

    If you can think of any other way of doing this, please let me know!

    Thanks again. 

    Thursday, April 23, 2020 9:44 PM
  • The solution is to just not use XLS files or remove all formatting. The same things happens with negative numbers that are formatted in red, but have no parentheses nor negative sign. https://www.ehansalytics.com/blog/2020/2/15/avoid-using-excel-xls-files-as-data-source
    Sunday, June 14, 2020 9:44 PM
  • There's a handy PowerShell-script here to convert (a bunch of) xls files into xlsx: https://www.fourmoo.com/2019/03/27/how-to-convert-xls-files-to-xlsx-so-that-they-can-be-used-in-powerbi/

    Won't help for the current problem though, but could help others who come across this thread.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Monday, June 15, 2020 8:58 AM
    Moderator