none
Power Query's order of refresh seems illogical RRS feed

  • Question

  • Hi all,

    I'm running into a problem with refreshing some Power Query tables that I can't find a satisfactory solution for - in the sense that I have tested out several alternative approaches to setting up the queries and their connection settings. Using one set of settings fixes some issues but causes others, so it's really a no-win situation. If anyone can share a better solution it would be much appreciated.

    [Context]

    I have a main data query to extract sales transactions from the sales history table - let's call that one [SalesQuery].

    I then have another query which is a derived query referencing the [SalesQuery] (i.e. the Source of it is Source = SalesQuery), which then aggregates the sales transactions up to the product category level then sorts it from the top selling category down - let's call this [SalesSummary].

    Both queries are explicitly loaded into the spreadsheet on 2 separate tabs. The [SalesQuery] table serves as the detailed data for end users to reference to while the [SalesSummary] serves as the front sheet of the report, giving a quick snapshot of sales performance for all product categories.

    [Solution 1]

    The 2 queries are constructed as described and both have "Background Refresh" enabled.

    When I click "Refresh all", Power Query pulls down the data in the background as expected. And even though I didn't trace through the refresh process line by line live as it happens, but looking at the little status line under each of the queries in the "Queries and Connections" pane, it looks like one feed of data was being pulled and shared between the queries - which is the expected behaviour, given that the [SalesSummary] query is a derived query, referencing the [SalesQuery] one, so a user would reasonably expect that data would be pulled down based on the underlying query generated from the constructs of [SalesQuery] only then share that result with [SalesSummary] for aggregation. In other words, you would expect your spreadsheet to only hit the server once, not twice. And on the face of it, that seems to be the case with this solution, but if anyone can either confirm or disprove it with tracing the queries in the background then that would be great information (because I don't know how to trace it in the background).

    The problem with Solution 1:

    The thing is, I also have to automate this report via VBA, and in the VBA, if I tell it to run "Refresh All" then "Save as ..." the refreshed report as an output file to be sent to users then it wouldn't wait for the refresh to finish, so that defeat the whole purpose of automating the report in the first place.

    Yes, it is possible to programmatically set a timer within VBA with Application.Wait, but its very concept is unreliable and inefficient (how can you be sure that your query will always finish within 10 seconds - or any other arbitrary wait period that you set? Some days there are more transactions in the retrieved dataset than other days, after all. You could set it to be a very long wait time to make sure that all the refreshes definitely finish, but that is just inefficient, and that is a guess still).

    The better solution is to turn off "Background Refresh" which turns the queries into active processes that will need to be completed before other actions can be taken.

    (If you are interested to know more about enabling and disabling Background Refresh, please refer to this article, it has a lot of great information: https://www.excelcampus.com/library/enable-background-refresh-on-all-power-query-connections/)

    And that's how I set up my Solution 2

    [Solution 2]

    The 2 queries are constructed as described and both have "Background Refresh" disabled.

    When a "Refresh All" command is run, each of the queries will then be evaluated separately, one after another. However, as a user, you would still expect that the main query [SalesQuery] to be refreshed first then its result is then used to produce the aggregation in [SalesSummary] because that one is referencing [SalesQuery]. In other words, the expectation would still be that the execution of these queries would result in the data being read only once from the server, not twice.

    However, that was not what was observed. What actually happened was that the [SalesSummary] query was evaluated first - reading more than 40,000 lines of data on this particular day. Once that finished, the [SalesQuery] was evaluated, again reading those same 40,000 lines of data.

    This behaviour just doesn't make any sense.

    Did I miss something?

    Is there a better solution to this problem?

    Friday, December 14, 2018 3:28 AM

Answers

  • Hi there,

    Can you please validate the number of calls to the server that you see in both solutions? The best way to do it is to use a tool similar to SQL Server Profiler, which shows you what your server is working on.

    In addition, if you are looking for a way to wait until your query is being refreshed without setting BackgroundRefresh property to false, then you can try using the QueryTable.Refreshing property as described here. Your code should look something like this:

    Do
    Loop While ActiveSheet.QueryTables("NameOfQuery").Refreshing

    Hope this helps.

    Guy

    - Excel Team



    Guy Hunkin - Excel Team

    Friday, January 4, 2019 4:52 PM