none
Queries evaluation chain

    Question

  • Hi All!

    There are two different scenarios I am working with:

    1) Query1 connects to the data source (flat file) and make basic cleaning and transformations.

    Then Query2 and Query3 reference to Query1, performing other transformations needed.

    Query3 also take some data from Query2.

    Query2 and Query3 then exported to sheet, Query1 - connection only.

    As far as I can understand, PQ can define refresh chain like this: evaluate Query1, then evaluate Query2, then evaluate Query3 (as it need the results from Query2). The question is: When PQ performs calculation of Query3, will it recalculate Query1? Or, as it was evaluated in the chain before, Query3 will use cahced results of repvious Query1 calculation (performed when Query2 was evaluated)?

    2) I have a set of flat files, and I take data from them with one Query1. Query1 also performs some transformations. Then I have two independent Query2 and Query3, both connected to Query1 performing different transformations of source data. Results of Query2 and Query3 evaluations then used in Query4, which exports its results to the sheets, Query1, Query2 and Query3 - connection only

    The second question is: performing "Refresh" on Query4, how much times will be Query1 evaluated - one, two or three ? Or there also will be chain: calculate Q1, caching, then Q2 or Q3, calculate next using cached results of Q1 evaluation, and then - Q4?

    3) Is there is a difference with connection to database?

    4) Is there any rules of evaluation chain (like each expression/query will be calculated once in the evaluation chain)?

    Thanks in advance!


    Maxim Zelensky Excel Inside

    Friday, November 11, 2016 10:03 AM

Answers

  • EDIT: I'm updating this answer for clarity and comprehensiveness. Please re-read it if you saw a previous version!

    Hi Maxim,

    There's a lot involved in answering your question, so let me back up and explain a few things first.

    Caching
    Power Query (in both Excel and Power BI Desktop) utilizes a "persistent cache", stored on disk, when refreshing queries. But what exactly does that mean?

    First, let's look at what gets cached. The persistent cache does not store the results of your M Queries (Q1, Q2, etc. in your example). Instead, it stores the results of the behind-the-scenes requests sent to data sources.

    So if Q1 queries a SQL Server database called "MyServer/MyDatabase" and returns a single unfiltered table called "MyTable", the query sent to the server might be "select [Col1] from [MyTable]". In this case, the persistent cache will now know the result of sending "select [Col1] from [MyTable]" to "MyServer/MyDatabase". If another M query (whether through referencing Q1, or by querying the same table directly) needs the same result, the persistent cache can provide it, and the result won't have to be fetched a second time from the SQL Server.

    "Great," you might say. "So if I'm pulling from a flat file in Q1, and in a few places in Q2 I need to do Table.RowCount(Q1), the file should only be read from disk once, right?" And the answer would be...no. This is because not all data sources are cached. Specifically, the results of calls to File.Contents are not stored in the persistent cache. Why not? Well, the cache is stored on disk, and caching local files (which are already on disk) elsewhere on disk doesn't really make sense. (Using Table.Buffer in this context may help...see more on Table.Buffer below.)

    "Okay", you might say. "But if Q1 is pulling from a SQL table, and in a few places in Q2 I reference Q1, that should hit the persistent cache, right?" Maybe. It depends on how Q2 is using Q1, since doing additional operations on Q1 (such as filtering or merging) might cause the M engine to compute a different SQL query, resulting in the server being hit again.

    Next, let's look at the scope of caching. The scope of caching differs depending on what you're doing, as well as what tool you're using.

    Previewing
    If you've opened the Power Query editor in Excel or Power BI Desktop, you might have seen warnings like "This preview may be up to 3 days old". This is because there is a persistent cache used for interactive previewing of query results. As you can imagine from the fact that we have warnings about preview results being days old, this cache is long-lived and is intended to make the experience of working in the editor faster.

    Loading to Excel
    If you load/refresh three queries in Excel, each of them gets their own persistent cache. So the fact that a SQL result is cached during the load of Q2 won't benefit the loading of Q3, even if it needs the same result.

    Loading to Power BI Desktop
    If you load/refresh three queries in PBI Desktop, they all share a single persistent cache. When you refresh multiple times, each refresh operation gets its own cache (shared by all the queries being refreshed at that particular time). This means that if SQL result is cached during the load of Q2, it will still be cached during the loading of Q3 (assuming they're both being loaded at the same time).

    What about Table.Buffer?
    Table.Buffer can be useful if you want to store an intermediate result in memory and avoid pulling content from disk, a remote file share, a SQL Server, a website, or any other data source multiple times during an evaluation.  Think of Table.Buffer as, "load this table into memory, and stop folding subsequent operations back to the data source".

    However, because buffering happens in memory and is not persisted on disk, buffering during the load of one query does not affect the load of another query. If Q1 is buffered when Q2 is loaded, Q1 will be re-buffered when Q3 is loaded.

    And now to answer your question...
    Now let's take a look at your example (Q4 references Q2 and Q3, and Q2 and Q3 both reference Q1).

    Since you're pulling from a flat file, and File.Contents results aren't cached, the flat file will be read each time Q1 is referenced (twice in Q4, once in Q3, Q2, and Q1). If you buffered the result of Q1, then Q4 would only read the file once. But when Q1, Q2, and Q3 are loaded (even in PBI Desktop), they will still each also read the file.

    What about immutability?
    You asked about the fact that M values are supposed to be immutable. This is true for the "pure" parts of the language, but breaks down when you introduce external data sources and folding. (In fact, you could think of Table.Buffer as transferring a table from the fuzzy unpredictable world of folding to the immutable world of pure M values.) You can see this in action by doing the following test, using a query called "MyTextFileQuery" that pulls from a local file on disk.

    Reads the file five times
    = Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery)

    Reads the file once
    = let rowCount = Table.RowCount(MyTextFileQuery) in rowCount + rowCount + rowCount + rowCount + rowCount

    I know this is a lot to take in, but I hope it helps clarify some less-than-obvious but oh-so-important parts of our system. I apologize for any confusion caused by my previous answers.

    Ehren



    Wednesday, November 16, 2016 5:46 PM
    Owner

All replies

  • Hi Ehren,

    thank you for this information, that's very helpful.

    I'd really like to understand what role the Table.Buffer could play here: "Buffers a table into memory, isolating it from external changes during evaluation."

    I always thought that a Table.Buffer would stop re-evaluation of previous steps, but in practice this often also doesn't seem to happen.


    Imke Feldmann TheBIccountant.com

    Wednesday, November 16, 2016 7:16 AM
    Moderator
  • Hello Ehren

    Thank you for comment. It is a pity, seems like performance leak.

    Do I understand you correctly that in my second case there will be the next chain:

    Q4 call Q2 and Q3, so Q2 and Q3 both call to Q1, and Q1 will be evaluated twice?

    Is this behaviour different in PowerBI? It looks different when I am refreshing a report with multiple queries built on top of single source Query - as if there is only one refresh of Query1.

    I based my suggestion on 'M' immutability concept, assuming that all queries, defined by user, is a values evaluated, and this values is the filelds in the Section1 record. And, if I understood correctly, it is what this part of docs tell us about:

    "Once a value has been calculated, it is immutable, meaning it can no longer be changed. This simplifies the model for evaluating an expression and makes it easier to reason about the result since it is not possible to change a value once it has been used to evaluate a subsequent part of the expression.
    For instance,a record field is only computed when needed. However, once computed, it remains fixed for the lifetime of the record."

    Оr queries is not a filed values that should be immutable? So then all queries made by me could be mutable during one evaluation process?


    Maxim Zelensky Excel Inside



    Wednesday, November 16, 2016 3:27 PM
  • EDIT: I'm updating this answer for clarity and comprehensiveness. Please re-read it if you saw a previous version!

    Hi Maxim,

    There's a lot involved in answering your question, so let me back up and explain a few things first.

    Caching
    Power Query (in both Excel and Power BI Desktop) utilizes a "persistent cache", stored on disk, when refreshing queries. But what exactly does that mean?

    First, let's look at what gets cached. The persistent cache does not store the results of your M Queries (Q1, Q2, etc. in your example). Instead, it stores the results of the behind-the-scenes requests sent to data sources.

    So if Q1 queries a SQL Server database called "MyServer/MyDatabase" and returns a single unfiltered table called "MyTable", the query sent to the server might be "select [Col1] from [MyTable]". In this case, the persistent cache will now know the result of sending "select [Col1] from [MyTable]" to "MyServer/MyDatabase". If another M query (whether through referencing Q1, or by querying the same table directly) needs the same result, the persistent cache can provide it, and the result won't have to be fetched a second time from the SQL Server.

    "Great," you might say. "So if I'm pulling from a flat file in Q1, and in a few places in Q2 I need to do Table.RowCount(Q1), the file should only be read from disk once, right?" And the answer would be...no. This is because not all data sources are cached. Specifically, the results of calls to File.Contents are not stored in the persistent cache. Why not? Well, the cache is stored on disk, and caching local files (which are already on disk) elsewhere on disk doesn't really make sense. (Using Table.Buffer in this context may help...see more on Table.Buffer below.)

    "Okay", you might say. "But if Q1 is pulling from a SQL table, and in a few places in Q2 I reference Q1, that should hit the persistent cache, right?" Maybe. It depends on how Q2 is using Q1, since doing additional operations on Q1 (such as filtering or merging) might cause the M engine to compute a different SQL query, resulting in the server being hit again.

    Next, let's look at the scope of caching. The scope of caching differs depending on what you're doing, as well as what tool you're using.

    Previewing
    If you've opened the Power Query editor in Excel or Power BI Desktop, you might have seen warnings like "This preview may be up to 3 days old". This is because there is a persistent cache used for interactive previewing of query results. As you can imagine from the fact that we have warnings about preview results being days old, this cache is long-lived and is intended to make the experience of working in the editor faster.

    Loading to Excel
    If you load/refresh three queries in Excel, each of them gets their own persistent cache. So the fact that a SQL result is cached during the load of Q2 won't benefit the loading of Q3, even if it needs the same result.

    Loading to Power BI Desktop
    If you load/refresh three queries in PBI Desktop, they all share a single persistent cache. When you refresh multiple times, each refresh operation gets its own cache (shared by all the queries being refreshed at that particular time). This means that if SQL result is cached during the load of Q2, it will still be cached during the loading of Q3 (assuming they're both being loaded at the same time).

    What about Table.Buffer?
    Table.Buffer can be useful if you want to store an intermediate result in memory and avoid pulling content from disk, a remote file share, a SQL Server, a website, or any other data source multiple times during an evaluation.  Think of Table.Buffer as, "load this table into memory, and stop folding subsequent operations back to the data source".

    However, because buffering happens in memory and is not persisted on disk, buffering during the load of one query does not affect the load of another query. If Q1 is buffered when Q2 is loaded, Q1 will be re-buffered when Q3 is loaded.

    And now to answer your question...
    Now let's take a look at your example (Q4 references Q2 and Q3, and Q2 and Q3 both reference Q1).

    Since you're pulling from a flat file, and File.Contents results aren't cached, the flat file will be read each time Q1 is referenced (twice in Q4, once in Q3, Q2, and Q1). If you buffered the result of Q1, then Q4 would only read the file once. But when Q1, Q2, and Q3 are loaded (even in PBI Desktop), they will still each also read the file.

    What about immutability?
    You asked about the fact that M values are supposed to be immutable. This is true for the "pure" parts of the language, but breaks down when you introduce external data sources and folding. (In fact, you could think of Table.Buffer as transferring a table from the fuzzy unpredictable world of folding to the immutable world of pure M values.) You can see this in action by doing the following test, using a query called "MyTextFileQuery" that pulls from a local file on disk.

    Reads the file five times
    = Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery)

    Reads the file once
    = let rowCount = Table.RowCount(MyTextFileQuery) in rowCount + rowCount + rowCount + rowCount + rowCount

    I know this is a lot to take in, but I hope it helps clarify some less-than-obvious but oh-so-important parts of our system. I apologize for any confusion caused by my previous answers.

    Ehren



    Wednesday, November 16, 2016 5:46 PM
    Owner
  • Thank you a lot, Ehren! This is a most full answer I expected.

    to clarify, when working with files (not databases, and there are no query folding then), behavior is the same?


    Maxim Zelensky Excel Inside

    Wednesday, November 16, 2016 6:44 PM
  • Very interesting thread.

    Could you please explain what is the limit of the buffer in Table.Buffer, and what happens when the buffer is full ?

    Thursday, November 17, 2016 11:45 AM
  • Thanks, Ehren, got it.

    Maxim Zelensky Excel Inside

    Thursday, November 17, 2016 12:11 PM
  • Hi Anthony. There isn't a single "buffer" shared by all Table.Buffer operations. Think of Table.Buffer as, "load this table into memory, and stop folding subsequent operations back to the data source". Size-wise, Table.Buffer is limited by the amount of memory available in the mashup evaluation process in which it runs. If you exceed that, you'll get an out of memory error.

    Ehren



    Thursday, November 17, 2016 4:39 PM
    Owner
  • Thank you Ehren
    Friday, November 18, 2016 1:27 PM
  • Hi Ehren,

    In re-reading your excellent response, a couple of questions came to mind:

    1) Is there some technical reason why Excel and Power BI do caching differently? The power BI method appears to be more efficient.

    2) At the end of the response, how is it that the file is read only once, even though there is no Table.Buffer function used in the rowCount variable?

    Thx

    Tuesday, November 29, 2016 4:55 PM
  • Hi Colin,

    Regarding #2, the reason the file is read only once is that scalar values (such as text or numbers), once computed, are immutable. (All M values are nominally immutable, but as I mentioned above, this isn't always true for complex values that are enumerable and for which each enumeration can force the underlying source to be revisited. This includes lists, tables and binary values.)

    Ehren


    Tuesday, November 29, 2016 8:37 PM
    Owner
  • My bad. Just realized that you're calculating a row count of the table, which is stored as an immutable variable (as you indicate), and not referencing the table directly. :) 
    Tuesday, November 29, 2016 9:14 PM
  • Hi Colin,

    >> Is there some technical reason why Excel and Power BI do caching differently? The power BI method appears to be more efficient.

    Yes, the difference between the two is pretty "technical" indeed and lies in the fact that Excel doesn't have a notion of "Refresh All Queries". Hence every query is refreshed independently without sharing any cache with each other.

    We are well aware of this limitation, and would definitely like to address it going forward.

    Guy

    - Excel Team


    Guy Hunkin

    Monday, December 05, 2016 6:39 PM
  • ...lies in the fact that Excel doesn't have a notion of "Refresh All Queries".


    Guy Hunkin

    Hi Guy,

    What about the Refresh All command on the Data tab? Are you referring to some kind of refresh dependency that Refresh All doesn't support?

    Tuesday, December 06, 2016 9:33 PM
  • Hi Colin,

    This is correct. We do have a Refresh All command in Excel, but the command simply refreshes the workbook queries one by one. It doesn't provide any real context or scope where the refreshed queries can share a cache with each other. Every query refresh happens independently.


    Guy Hunkin

    - Excel Team

    • Edited by Guy Hunkin Wednesday, December 07, 2016 12:44 PM
    Wednesday, December 07, 2016 12:42 PM
  • Thanks Guy, I suspected as much. I just thought that the blanket phrase "...Excel doesn't have a notion of Refresh All Queries" needed some qualification. :)
    Wednesday, December 07, 2016 2:58 PM
  • Hi Ehren,

    I have a related question:

    I'm using DateTimeZone.LocalNow function for some date filtering in when querying an SQL server (in DirectQuery mode).

    The query looks like:

    let
        Source = Sql.Database("servername", "dbname"),
        dbo_ReportUsersView = Source{[Schema="dbo",Item="ReportUsersView"]}[Data],
        #"Filtered Rows1" = Table.SelectRows(dbo_ReportUsersView, each ([LastChangedPswd] < DateTimeZone.LocalNow()))

    in
        #"Filtered Rows1"

    The issue is that value of DateTimeZone.LocalNow() is calculated once and not changing afterwards. Probably this is somehow related to lazy nature of the language and caching.

    Is there a way to reevaluate DateTimeZone.LocalNow() each time this query executes?

    Best regards,

    Sergey

    Friday, December 09, 2016 5:09 PM
  • Hi Sergey,

    Your question is not related to the topic of this discussion, but here are a couple of links that will you out:

    https://blog.crossjoin.co.uk/2014/11/17/timing-power-query-queries/
    https://blog.crossjoin.co.uk/2016/04/04/timing-m-query-execution-in-power-query-and-power-bi-revisited/
    https://social.technet.microsoft.com/Forums/en-US/fcf37fe0-144d-4a69-8d4b-7f2188cfaff3/questions-about-timing-queries?forum=powerquery

    I don't know how it's possible to retain information from a previous query execution in a subsequent query execution, since the new execution evaluates everything from scratch. Lazy evaluation applies only during the course of the current execution.


    Friday, December 09, 2016 5:39 PM
  • Thanks Colin,

    I've already seen these links and also thought that  "Lazy evaluation applies only during the course of the current execution.". But for some unknown for me reasons Power BI DirectQuery report continues to use exactly the same cached time when querying the database. This issue is reproduced both when using Power BI service and Embedded Power BI service.

    Sergey

    Saturday, December 10, 2016 12:18 PM
  • Hi Sergiy. The behavior you're observing when using Direct Query against SQL is a known issue with the legacy Direct Query architecture. We will be moving SQL Direct Query to a new architecture over the next few months, but until then there's unfortunately no workaround besides manually refreshing.

    Ehren


    Monday, December 12, 2016 8:30 PM
    Owner