none
How to work around Excel Get & Transform Data (Hang up) Performance (Speed) Issue? RRS feed

  • Question

  • I posted this earlier in the wrong forum, being a newbie.  Anyway here it is again in a place where I might get to people who know what I'm talking about.

    Have a chain of 6-queries to cleanup, combine and merge data from two-files of similar configuration and one file with data to merge.  First four queries are connection only (I experience crashes when loading to a sheet). When another query was added to create a result on a sheet, according to task manager, the CPU usage bounced around between 80 and 90%, memory jumped to ~15 GB (of 16) and control of cursor became very erratic (I assume because there wasn't any CPU processing left to manage the mouse).  The Microsoft Mashup Evaluation Containers (3 of them) are consuming the CPU and memory.  Perhaps all of the queries are being refreshed and the refresh implementation is not well connected to Windows 10 so the operating system can't throttle the Mashup. It may take hours for this condition to clear.  May also require hard computer reset and restart.  Like other threads, have experienced crashes and excel shut downs and auto recovery when using Get & Transform. 

    All queries refresh control boxes are unchecked.  Is this a bad thing or will enable background refresh impact this problem.  The dat files are less than 8,000 records of 20-fields.  Have read on other blogs that this "performance issue" and lack of monitoring tools is an broad issue.  Any suggestions on how to avoid or work around these hangups would be greatly appreciated.

    Additionally, have the condition where Excel file is open, Query Editor is open, have just deleted the final step, and performance issue re appears.  I can add a snippet of Task Manager later (Forum won't let me insert picture) .  Notice that Excel does not appear in the list of Applications yet it is still running.  

    My system is MS Office 365 subscription, Excel Version 1808 (Build 1073020053 Click-to-Run) Monthly Channel (Targeted); HP Envy 17 Notebook; Processor Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz; 16 GB; Windows 10, 64-bit 

    Wednesday, August 29, 2018 8:07 PM

Answers

  • Hi Gene. Just to recap the perf-related suggestions we discussed offline:

    • Disable the "Allow data preview to download in the background" option
    • Avoid operations that require a full scan of the dataset before returning any results. For example, Table.Sort, List.Max over all the values in a column, etc. If you must use these, try to defer them to the end of your processing.
    • Another recommendation we didn't discuss, but which is referenced in the article I linked to above, is to experiment with Table.Buffer to see if it speeds things up for you.

    I hope these suggestions prove helpful!

    Ehren


    Thursday, September 6, 2018 9:37 PM
    Owner
  • Hi Gene. Regarding which operations require a full scan of the data, the simplest way to determine this is to ask, "If I wanted to calculate the result of this operation by hand (at least the first few rows of it), would I have to manually reference/search/scan all the data?"

    Hope this helps.

    Ehren

    • Marked as answer by GeneQuery2 Wednesday, September 12, 2018 12:19 PM
    Friday, September 7, 2018 6:23 PM
    Owner

All replies

  • Hi, you are experiencing one of the major issues with Power Query, performance, particularly when performing Query merges and more complex manipulations. As you indicated there are numerous blogs however little answers. Not sure on how you are using the output as Power Query is more efficient within Power BI as opposed to Excel due to a different caching mechanism. I have also seen significant improvement by breaking my workflow into 2 stages, the first is to load your files into PQ and do any basic manipulation first (no Merging …..) and then save as CSV files through some minor VBA (available commonly on many sites) . Loading CSV files back into PQ will greatly increase the speed of operation when it comes to performing the more complex activities. A little clunky I know but in my case it took processing time down from 2 hours to 2 minutes.

    Mike

     
    Wednesday, August 29, 2018 10:18 PM
  • Thanks Mike for your reply.  I have tried similar things like saving the file, copy sheet and save as CSV, etc.  I haven't tried your suggestion explicitly but will.  

    I don't have Power BI nor PQ, only Excel so my options are limited.  I'm doing this as a volunteer for a non-profit (and as a hobby?).

    Have you seen an occasion where Excel no longer appears in the Windows 10 Task Manger list of Applications that are running but Excel is still functioning?  It appears that Task Manger still knows who is using CPU-cycle and Memory, but isn't aware of or at least is not reporting Excel as a running application.  I'll try and post a snip-it if forum will allow it.

    If the Query system can somehow bypass Task Mangers ability to report operations, wouldn't that be a major problem?  Also, how else can Queries force a reboot of Windows, and/or an Excel Failure and restore of Excel unless it is messing around in Windows Operating space?

    I'm a newbie at this Get and Transform Data (GATD?) stuff but I am fascinated by this whole system and concept.  I think the answer to my original post is a Microsoft issue.

    Thursday, August 30, 2018 6:14 PM
  • When I try to insert an image I get this message:

    Body text cannot contain images or links until we are able to verify your account.

    So when/how does my account get verified?

    Thursday, August 30, 2018 6:18 PM
  • Hi GeneQuery2 and Mikeinoz62. I'd like to better understand what you're seeing. When you load the raw data to an Excel sheet and export it to a CSV, does this part of the process still take a long time? Or does it only take a long time and consume memory/CPU when you add steps on top of the raw queries?

    Ehren

    Thursday, August 30, 2018 11:19 PM
    Owner
  • Thanks for responding, Ehren.

    It is hard to give a succinct answer, because I have tried many things and they kind of blur together.

    I'm also concerned that I am doing things wrong, since I'm fairly new to GATD.  I have many years experience with Excel and VBA and have only recently started self learning.  Bought the M is for Data Monkey's book by Puls and Lopez(?).  So with the understanding I may be doing dumb things....

    I would do a query to load a file do some cleanup then load to a sheet.  No problem.  Then I would Load the second file, do similar cleanup then load to a sheet.  Then I would combine the two queries by appending, and do some housekeeping. Then save to a sheet.  Then load a third files clean and prepare to merge with the output of the append of the other two files.  The merge would go fine. At the time I said to my self this is really fast. 

    Here my memory may be not accurate, but I would load the merge to a sheet.  This load might take hours.  I did discover that sometime, if I re-edited the query, then did the load to a sheet the second time, that would go more rapidly.  Sometime in this process, I trying out functions and learning.  On more than one occasion Excel crashed.  Sometime it would auto reboot and sometimes it wouldn't.

    I finally started avoiding load to sheet and instead would do connection only.  The current project uses relatively small files. Previously I had tried this on large >600,000 records and merges of >10,000 with hundreds of thousands.

    Hope this helps.  I would be happy to try different things and share the results with you.  

    Thursday, August 30, 2018 11:51 PM
  • Hi GeneQuery2. I'd like to see what you're encountering. Please send me a private message on the Power BI Community. Here's a link to my user profile.

    Ehren

    Tuesday, September 4, 2018 11:40 PM
    Owner
  • I followed the link but am not sure how to use it to communicate.

    Wednesday, September 5, 2018 5:12 PM
  • You'll need to create a Power BI Community account, then private message me.

    Or paste a link to your profile here once it's created, and I'll send you a private message.

    Ehren

    Wednesday, September 5, 2018 5:33 PM
    Owner
  • https://community.powerbi.com/t5/user/viewprofilepage/user-id/80442

    Wednesday, September 5, 2018 7:28 PM
  • Hi Gene. Just to recap the perf-related suggestions we discussed offline:

    • Disable the "Allow data preview to download in the background" option
    • Avoid operations that require a full scan of the dataset before returning any results. For example, Table.Sort, List.Max over all the values in a column, etc. If you must use these, try to defer them to the end of your processing.
    • Another recommendation we didn't discuss, but which is referenced in the article I linked to above, is to experiment with Table.Buffer to see if it speeds things up for you.

    I hope these suggestions prove helpful!

    Ehren


    Thursday, September 6, 2018 9:37 PM
    Owner
  • Thank you very much for your inputs.  It would be very helpful if there is a list of operations that require a full scan of the dataset.  Sort and list.max, .min, are now obvious.  Any discussion about how to achieve efficiency in Power Query would be useful also.

    Thank you.


    elc

    Thursday, September 6, 2018 11:05 PM
  • Hi Gene. Regarding which operations require a full scan of the data, the simplest way to determine this is to ask, "If I wanted to calculate the result of this operation by hand (at least the first few rows of it), would I have to manually reference/search/scan all the data?"

    Hope this helps.

    Ehren

    • Marked as answer by GeneQuery2 Wednesday, September 12, 2018 12:19 PM
    Friday, September 7, 2018 6:23 PM
    Owner