none
Table.Buffer for cashing intermediate query results or how workaround Unnecessary Queries Issue

    Question

  • Hi All!

    I'd like to use Table.Buffer to workaround the issue of multiple unnecessary calls of
    A. steps of a Query and
    B. Queries in a query chain in Power Query.

    The issue is that PQ make a lot unnecessary calls (at least I see them through tracing). If you change some step in a Query (in this text a Query is a field of #sections[Section1] of global PQ environment), then:
    A. if we have N steps in a Query, then all the N steps will be called in this manner: first step will be called N times, second - N-1, and so on (the last step will be called once). Let's illustrate it as NQs1, N-1Qs2 ... 1QsN.
    B. All the previous AND next queries in a Query chain will be called in this manner - N time every previous queries in a chain, and 1 time every of the next.

    So if you have a chain of Q1 -> Q2 -> Q3 -> Q4 -> Q5 queries (Q2 uses Q1, and so on), and Q3 has 5 steps, and you change any step in Q3, you'll have 5 Q1, 5 Q2, 5 Q3s1 4 Q3s2 .. 1 Q3s5, 1 Q4 and 1 Q5 calls. Seems like digital cash mining just to see a result of a small change, isn't it?

    So if you make a small change in a Query step, PQ will do a lot of work, recalling the same queries, while me as a user perfectly know that there weren't ANY changes anywhere except this small change I just made in one particular step and would be happy to see the results ASAP. In case of simple queries it's okay , and you workarounded calls to sources issue through cashing. But in case of more or less complex ETL (which means a lot of work AFTER you downloaded data from source) it just makes such workload not possible to develop in UI.

    SO - I'd like to buffer complex calculation whether inside a Query for it's steps or between Queries, but it seems that Table.Buffer doesn't provide any service you may intuitively expect from this function (lack of official documentation let us use our imagination at full force ;-)).

    So I have 3 questions:
    1. What exactly Table.Buffer does?
    2. Is it possible to make some workaround of Unnecessary Queries Issue? It seems that I even cannot use laziness for that - you calculate all the expressions whether they are directly shown in UI of steps (e.g. in cells of a table) or I hide them in e.g. lists and never use them directly anywhere!
    3. Why do you do SO MANY all these Queries?
    Tuesday, December 11, 2018 1:26 PM

Answers

  • Hi Andrey and Chris. Apologies for the delay. I've been trying to wrap my head around the details of how evaluation and caching work, since your questions are delving into aspects of the system I've not had to understand before.

    Chris, regarding your questions about memory limits: I'm not sure. I'll need more time to answer this. And yes, caching still takes place independent of Table.Buffer. The Excel change was to have each refresh of N queries share a temporary persistent cache, instead of each query getting its own temporary cache.

    Andrey, the error you listed is not surprising. It's essentially a cache miss. We encountered an IO error trying to read something from the persistent cache. That said, it's possible it's indicative of a deeper issue (for example, maybe mashup container A is accessing the cache while container B is updating it, resulting in the container A redoing the work B already did). I haven't had time to dig in and find out.

    However, what I have had time to do is create a gargantuan diagram showing how refresh and caching and buffering work in a simple-ish sample scenario. I haven't yet vetted it with my teammates, so take it with a grain of salt. (I'll post an updated version once I'm able to gather feedback.) But it does distill what I understand about the innards of PQ, based on debugging through various parts of our code. You may consider it an early Christmas gift or a confusing lump of coal, depending on your view, but I thought I would share it now. It's not pretty, but I hope you find it helpful. Then again, it might just raise more questions than it answers. :)

    Here's the sample M:

    shared Query1 = let
        Source = Sql.Database("SomeSQLServer", "MyDb"),
        MyTable = Source{[Item="MyTable"]}[Data],
        BufferMyTable = Table.Buffer(dbo_MyTable)
    in
        BufferMyTable;

    shared Query2 = let
        Source = Query1
    in
        Source;

    And here's the diagram (you'll likely need to zoom in to read everything):

    PQ Caching (Excel & PBIDesktop)

    As you might notice, there are a number of extra evaluations even in this simple scenario that will be eliminated if you disable background analysis (they will still happen on-demand, but won't happen automatically). There can also be extra evaluations performed by the firewall in its attempt to determine how to safely partition a given set of queries (there's one such evaluation in this scenario, but can be more depending on the complexity of your queries). If you disable the firewall, these will also no longer occur. Finally, I hope it's clear that Table.Buffer is limited in scope. It only affects a single evaluation of a top-level query, and in some scenarios (such as the one above) may actually degrade performance due to the extra unnecessary work it performs (such as reading through all the data when only zero or 1k rows are being requested).

    EDIT: Updated the diagram on Jan 9th.

    Ehren


    Friday, December 21, 2018 9:54 PM
    Owner

All replies

  • Hi Andrey,

    First of all, this thread has some answers to your questions:

    https://social.technet.microsoft.com/Forums/en-US/8d5ee632-fdff-4ba2-b150-bb3591f955fb/queries-evaluation-chain?forum=powerquery

    Also, have you turned off the "allow data preview to download in the background" option?

    https://blog.crossjoin.co.uk/2016/12/05/power-query-power-bi-and-the-allow-data-preview-to-download-in-the-background-option/

    Chris


    Check out my MS BI blog I also do SSAS, Power BI, MDX, DAX and M consultancy

    Tuesday, December 11, 2018 1:38 PM
  • 1. Chris, thanks a lot, "allow data preview to download in the background" feature was really helpful, the behavior of PQ changed dramatically, and now, if I change Q3, I'll just have 1 call for every step of Q1 .. Q3, which is perfect.

    The behavior during Refresh of Q3 haven't changed, but, despite this is not logical, this is not a _stop_ factor for a little bit of complex ETL.

    I don't understand why the behavior changed though, actually such behavior (with ""allow data preview to download in the background" turned ON) seems like a bug in a PQ. If not - it would be great to understand it's logic.

    2. As for Buffer - I'm still not sure that I understand it's definition. As far as I understand from all the explanations and my experience - Buffer creates an immutable copy of a variable (even created from datasource), but only for the time of one evaluation, i.e. only during execution of operator like Q2 = some(Q1) where Q1 = Table.Buffer(datasource). Is it correct?

    So for code below we'll have 2 call to datasource - 1 per each row, but without Buffer instead of 1 call for row 2 we would have multiple calls to datasource.

    Q1=Table.Buffer(datasource),
    Q2 = Table.AddColumn(Source, "Q2", each Table.RowCount(Q1))

    And still it would be great if we could do cashing of the results of the previous queries instead of calling the full stack of previous queries (even though it's 1 time per each) for every change we do in a Query.
    Tuesday, December 11, 2018 4:29 PM
  • Hi Andrey,

    To answer your questions/points:

    1) It's one of those situations where something that is a benefit on small, simple queries suddenly becomes a massive problem on complex queries. It's not a bug, but it's something MS really should deal with better I think.

    2) I'm not quite clear from your explanation whether Q1 and Q2 are queries or steps within the same query - I guess you mean they are queries? If so, yes I think you have understood the same thing from Ehren's explanation that I have. As he says:

    However, because buffering happens in memory and is not persisted on disk, buffering during the load of one query does not affect the load of another query. 

    Chris


    Check out my MS BI blog I also do SSAS, Power BI, MDX, DAX and M consultancy

    Tuesday, December 11, 2018 10:16 PM
  • Chris, thanks a lot for your answers!

    Meanwhile - I investigated the current Targeted Excel version regarding the issue - Table.Buffer for different Queries (Query is a Field of #sections[Section1]). And - I found out that in the Targeted version of PQ Table.Buffer has effect even for different Queries, at least for files on a local disk.

    So if you have Q1s1 = Table.Buffer(datasource) and Q1 -> Q2 -> Q4 and at the same time Q1 -> Q3 -> Q4, and refresh Q4, PQ does just 1 call to the datasource. That's important improvement (of course if I properly understood Ehren's message about the logic of previous versions of PQ).


    Wednesday, December 12, 2018 11:06 PM
  • And - just not to mislead anybody who may read this discussion - I was wrong when I wrote that laziness doesn't work in PQ UI - seems I did something wrong when checked laziness while wrote the first message in the discussion.

    Laziness does work in PQ UI, so if you refresh a Query step with lists in cells of a table in PQ UI, the lists' items are not calculated. Sorry for my mistake.

    Wednesday, December 12, 2018 11:30 PM
  • It would be great if Ehren (if you're reading!) could comment on whether Table.Buffer does work across queries now.

    There have been other caching improvements released in Excel recently which could be helping here:

    https://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/16852819-power-query-cache-shared-nodes

    Chris


    Check out my MS BI blog I also do SSAS, Power BI, MDX, DAX and M consultancy

    Thursday, December 13, 2018 9:14 AM
  • I still optimizing my task, and meanwhile it seems now that I fully understand what Ehren meant in this phrase: "you could think of Table.Buffer as transferring a table from the fuzzy unpredictable world of folding to the immutable world of pure M values". And this is very important, I would say conceptual thing :-).

    What I understand at this moment is that Buffer plus immutability concept is a very cool thing. So as soon as you say to PQ (using Table.Buffer, or Binary.Buffer and so on) that all the sources used for calculating a variable are immutable, the whole value became immutable for PQ, and that means that it won't be recalculated _in the same evaluation process_. I mean that if you have Q0=mutable stuff -> Q1s1 = Table.Buffer(Q0) .. Q1sN -> Q2s1(Q1) .. Q2sN(Q1), Q1 will be calculated only once during the whole evaluation process, despite the fact that it is called twice (usual thing for M, but not at all for variables with mutable sources). And that's really cool. And it seems that it doesn't matter on which step of Q you use Buffer (that is logical).

    This is cool because, from my experience, PQ sometime "losts" the state of immutability due to not so obvious reasons. And due to immutability + Buffer you can make (and should) the whole chain immutable (by using Buffer) just at the Q which will be reused elsewhere, and not at the point you open an mutable datasource. And all the calculations before and in the Q won't be recalculated.

    But anyway, from my experience, you should re-check whether immutability works or not in your case - through tracing or Process monitor (Thanks a lot to Chris for his very helpful blog!).

    And of course this concept won't help while working with UI, because if you launch the same evaluation chain again, the whole chain will be calculated again. So it would be great if, while working in UI, PQ would consider Buffered variables as immutable between explicit refreshes by a user (using Refresh/refresh All of the Ribbon).








    Friday, December 14, 2018 10:26 AM
  • Table.Buffer still doesn't help across different top-level query refreshes, which is what I meant by my original statement. Each refresh occurs in an isolated Microsoft.Mashup.*exe process, and Table.Buffer buffers the data in memory, so it's not really possible for buffering in one query's refresh to benefit another query.

    As Andrey points out, however, if a query Q4 references a buffered query Q1 via two different dependent queries, Q1 will only buffer its data once. The fact that there are multiple dependent queries involved doesn't really matter. At the top level, this is just a refresh of a single query (Q4). If Q1 happened to be refreshing at the same time, the buffering would happen twice.

    Hope this clarifies things a bit.
    Ehren


    Saturday, December 15, 2018 12:21 AM
    Owner
  • Thank you Ehren, this is extremely useful information. I have a few follow-up questions:

    1) Is the information that Curt gives in this thread about memory usage:

    https://social.technet.microsoft.com/Forums/en-US/471b18e6-9086-489d-9445-12b3a6205120/pq-query-optimization?forum=powerquery

    ...still correct? For example, is there still a 256MB cap on the amount of memory used by a single top-level refresh?

    2) If so, is there any way apart from using Task Manager to know whether this limit is being hit and which step is causing the problem (for example is there something in the diagnostic logs)?

    3) Even if you don't use Table.Buffer(), am I right in thinking that some caching still takes place? Can you explain what the "cache shared nodes" optimisation that was in Power BI Desktop and has only recently moved into Excel does? 
    https://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/16852819-power-query-cache-shared-nodes

    I think there are a lot of advanced Power Query/Power BI users out there writing complicated queries that are slow, and better understanding of the internals of Power Query could help us a lot with performance tuning.

    Thanks!

    Chris


    Check out my MS BI blog I also do SSAS, Power BI, MDX, DAX and M consultancy

    Saturday, December 15, 2018 10:25 AM
  • Ehren, thanks a lot for your very helpful information.

    And I have question - why, if I do a refresh of a top-level Query in PQ UI (what you wrote about), PQ performs it through as many independent refresh processes (through several separate Mashup processes), as we have steps in the top-level Query? I think that these refresh processes are totally independent from each other (even if several of them are performed through one Mashup process), because Buffer doesn't work for these processes. So I'll have N queries of a data source and all the previous calculations for every of N steps in the Query, regardless of whether I use Buffer or not. I understand that I use Buffer properly because in this situation PQ makes 1 request to the datasource if I refresh #"Q2'":

    Section1;

    Q1 = Table.Buffer(datasource);

    Q2 = let s1 = Q1, .. sN in sN;

    Q2' = Q2;

    So I think that when you say about "refreshing of top-level query", let's say refresh Q2, you mean "independent refreshing of every step of the Query", not just one refresh for the Query as a whole - at least that's what I see via tracing. And that's happens even if "Allow data preview to download in the background" is set off. The latter has effect only in case of changing the Query, not refreshing it.

    Taking into account, that refreshing of all these N steps is done through just up to 4 separate Mashup processes (from my experience), and inside of that processes refresh of every step is performed on one-by-one basis, I definitely may expect slow down of the things in case of big data sizes or complex ETL.

    But situation is even more inconvenient. In case of a Query with a number of steps, if I click (right after refresh of the Query) on some steps in the UI, instead of showing me results of the step instantly, refreshing of that particular step is launched again, and I cannot see it's results until the refreshing is finished. Again - the full refresh of the Query is just finished and I haven't done anything else apart of clicking on it's steps in UI. And the large a Query the more proportion of steps behave in that manner.

    And that's happens on a PC with 64 bit Excel and 70% free RAM of total size of 40Gb and Peak RAM usage by a Mashup process of 90 - 223 K (according to Task Manager). And for data sources of total size of just several MB. And that's for just ETL with 4 source-reading plus 5 top-level Queries in a chain and 6 functions called from tables on different steps, 4 of which where created from top-level Queries. That's doesn't seem to be overwhelming.

    That latter experience (with refreshes of particular steps on the fly despite a full Query refresh) is particularly frustrating. Of course I can hide all the steps in lists of tables' cells in order to use laziness. But, apart from my own inconvenience, PQ is created (as far as I understand) for self service purposes by clients. And, if I provide such a cryptic table to my clients, the maintaining of it of any kind will be very inconvenient for them. At the end, you did a lot of amazing job in order to let us perform _a very complex ETL_ VISUALLY - I'm not tired to be fascinated by that - that it is very pity to throw away much of your UI achievements and start just programming AND - provide just a program to clients, instead of the nice self-served solution that may be (at least partly) supported by clients themselves.

    So what may be done in order to workaround the issue of unnecessary calls for Refresh of a Query and still use UI ETL at full scale? My be there are some configuration parameters for that, or some other techniques?

    All my questions are just for the sake of my clients and in order to have a great product at hands, so I'm sorry for any inconvenience and thank you very much in advance for your answer!
    Saturday, December 15, 2018 7:30 PM
  • Hi All! I found out that actual behavior of PQ is slightly different in comparison with what I described in my previous comment.

    Actually all the N steps are called only when this exception happens (text is from trace messages):
    Exception:
    ExceptionType: System.IO.EndOfStreamException, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
    Message: Reading after the end of stream impossible.
    StackTrace:
       in System.IO.__Error.EndOfFile()
       in System.IO.BinaryReader.FillBuffer(Int32 numBytes)
       in System.IO.BinaryReader.ReadInt64()
       in Microsoft.Mashup.EngineHost.Services.DiskPersistentCache.PersistentDictionary.<>c__DisplayClass2c.<ReadBucket>b__2b()
       in Microsoft.Mashup.EngineHost.Services.FileSystemAccessHelper.<>c__DisplayClass4.<IgnoringAccessExceptions>b__3()
       in Microsoft.Mashup.EngineHost.Services.FileSystemAccessHelper.TryIgnoringAccessExceptions[T](Func`1 func, IEvaluationConstants evaluationConstants, T& result, Exception& accessException)

    I translated "Reading after the end of stream impossible." from Russian, because original message was in Russian, so possibly the wording is slightly different in English locales.

    I can see this message only in trace, in PQ UI all goes fine, except longer that without the error. And I see good data in UI, so if I would not look at trace, I wouldn't understand that there is this error.

    If you don't have an error during a Query refresh, PQ launches evaluation of several steps (usually some last ones, from my experience), and stops refreshing. After that, if you click on steps of the Query, evaluated during the refresh - you get results immediately. If steps were not evaluated - the evaluation of particular clicked step is launched after the click, and you have to wait for results.

    So the situation is not so bad as I described in the previous comment (but still not good).
    It is not good, because every step in a Query is a separate query for PQ. If you have a number of steps, usually not all the steps are refreshed (if PQ doesn't encounter any problems while refresh them). And if a step was refreshed during the Query refresh, you see results immediately after clicking on the step after the finish of Query refresh.

    The problem here is that during Query refresh every step is launched as very :-) separately. Virtually, every step of any Query is a separate query for PQ. They are combined in something common only in UI.

    Current targeted version of Excel shares cash between different queries, and while refreshing every step separately, PQ still uses the cash after the first request for the whole Query refresh. But it does it just like for different Queries, called from refreshed Query. So I cannot say that refreshing of steps of queries are totally separated. But you won't benefit from Buffer while every of the step of refreshed Query, even if the refreshing happens in the same Mashup process.

    Of cource it would be much better, if PQ would use the previous step (or even Query) for both purposes - for providing results for the next step of a Query AND for showing the results in UI. In particular, I don't understand why _steps_ are calculated very separately.

    SO I have 2 questions:

    1. Can we expect any improvement in steps (or even Query) calculating, in order to avoid this unnecessary multievaluation?

    2. Why the error, I described, happens? it's defenitely because of some troubles in PQ. It starts to happen after of quite long work in PQ UI. And as soon as it starts to happen - it happens more and more often. But if you close PQ window (but not close Excel), and then launch PQ window again - the error disappears and all works fine. For some time ;-).

    Sunday, December 16, 2018 5:30 PM
  • Hi All!
    I found why this error, described in my previous comment, happens:

    "2. Why the error, I described, happens? it's defenitely because of some troubles in PQ. It starts to happen after of quite long work in PQ UI. And as soon as it starts to happen - it happens more and more often. But if you close PQ window (but not close Excel), and then launch PQ window again - the error disappears and all works fine. For some time ;-)."

    This is because there is an error in reading context of Excel sheets/tables. While reading metadata of Excel file, all works fine, but as soon as reading of the data themselves happens. You can see that only through tracing, data are read successfully, from end user point of view.

    BUT - it is read with a noticeable delay. That happens because:
    1. If the error happens, a great number of exceptions shown in trace.
    2. As I described earlier, in case of error every step of refreshable Query is refreshed. But it seems for me,that it is true not only for refreshed query, but for all the previous as well! Even cashing doesn't work quite often, and PQ makes a lot of request of external data source.

    All of that is not good at all, especially calling external data source a number of time. Not all the providers are tolerant for that.

    The error happens from time to time, not always.

    And if you read data from the same Excel file, all works fine - no exceptions in tracing.
    Friday, December 21, 2018 1:00 AM
  • Hi Andrey and Chris. Apologies for the delay. I've been trying to wrap my head around the details of how evaluation and caching work, since your questions are delving into aspects of the system I've not had to understand before.

    Chris, regarding your questions about memory limits: I'm not sure. I'll need more time to answer this. And yes, caching still takes place independent of Table.Buffer. The Excel change was to have each refresh of N queries share a temporary persistent cache, instead of each query getting its own temporary cache.

    Andrey, the error you listed is not surprising. It's essentially a cache miss. We encountered an IO error trying to read something from the persistent cache. That said, it's possible it's indicative of a deeper issue (for example, maybe mashup container A is accessing the cache while container B is updating it, resulting in the container A redoing the work B already did). I haven't had time to dig in and find out.

    However, what I have had time to do is create a gargantuan diagram showing how refresh and caching and buffering work in a simple-ish sample scenario. I haven't yet vetted it with my teammates, so take it with a grain of salt. (I'll post an updated version once I'm able to gather feedback.) But it does distill what I understand about the innards of PQ, based on debugging through various parts of our code. You may consider it an early Christmas gift or a confusing lump of coal, depending on your view, but I thought I would share it now. It's not pretty, but I hope you find it helpful. Then again, it might just raise more questions than it answers. :)

    Here's the sample M:

    shared Query1 = let
        Source = Sql.Database("SomeSQLServer", "MyDb"),
        MyTable = Source{[Item="MyTable"]}[Data],
        BufferMyTable = Table.Buffer(dbo_MyTable)
    in
        BufferMyTable;

    shared Query2 = let
        Source = Query1
    in
        Source;

    And here's the diagram (you'll likely need to zoom in to read everything):

    PQ Caching (Excel & PBIDesktop)

    As you might notice, there are a number of extra evaluations even in this simple scenario that will be eliminated if you disable background analysis (they will still happen on-demand, but won't happen automatically). There can also be extra evaluations performed by the firewall in its attempt to determine how to safely partition a given set of queries (there's one such evaluation in this scenario, but can be more depending on the complexity of your queries). If you disable the firewall, these will also no longer occur. Finally, I hope it's clear that Table.Buffer is limited in scope. It only affects a single evaluation of a top-level query, and in some scenarios (such as the one above) may actually degrade performance due to the extra unnecessary work it performs (such as reading through all the data when only zero or 1k rows are being requested).

    EDIT: Updated the diagram on Jan 9th.

    Ehren


    Friday, December 21, 2018 9:54 PM
    Owner
  • Wow, this is really great - just the gift I wanted! Thanks for taking the time to write this up, I really appreciate it. Now to take some time to understand this all properly...

    Chris


    Check out my MS BI blog I also do SSAS, Power BI, MDX, DAX and M consultancy

    Saturday, December 22, 2018 10:34 AM
  • Wow, this is really great - just the gift I wanted! Thanks for taking the time to write this up, I really appreciate it. Now to take some time to understand this all properly...

    Chris


    Check out my MS BI blog I also do SSAS, Power BI, MDX, DAX and M consultancy

    My thoughts and feelings exactly! Thanks Ehren. A Christmas gift that will endure beyond the season. :)
    Saturday, December 22, 2018 5:50 PM
  • Hi Ehren and Chris, and All!

    Ehren, this is extremely helpful information, it's really appreciated! IMHO the earlier we have such info, the better, regardless the possible problems in details. It's because it anyway helps to see the whole picture. And gives a lot of info we need time to think about before you may do some changes. Thanks again for the information!

    And yes, I still have questions, and still about Buffer :-).
    1. Is Buffering "flat" or "deep"? I mean - If a table consists from cells with not only with scalar values (lists, tables, records), and some of non-scalars already where evaluated - will non-scalar values be stored in Buffer? And will not evaluated non-scalars not? It seems for me that Buffer is "flat", and I have to Buffer non-scalar cells in a table and table itself separately, other wise I have Buffered table and not Buffered non-scalar Cells :-).

    1.2. By the way (regardless of Buffer) - how cells of a table with scalar expressions are calculated - immediately as they appear in a expression or upon access (obviously the question is not about the situation, when you see the table in PQ UI)? I read about streaming semantics and lazy evaluation of Records, but from my point of view there is no clear answer there.

    If they a lazy evaluated, what happens when we use Buffer for a table with some not evaluated scalar expressions? They are calculated at the point we Buffer the table? Logs make me feel that yes, but it is interpreted, not direct, info.

    2. Immutability of variables and their recalculation. May PQ, under some circumstances, calculate really _immutable_ variables again? Theoretically, in case of memory shortage, PQ may remove the variable from memory in order to free it, and recalculate the variable upon next access. When I stress _immutable_ - that means exactly that there weren't used any mutable operands for variable calculation, whether by nature of operands or through Buffer. I understand that _I_ cannot change the value of variable, but it doesn't mean that the value may not be destroyed (and recalculated again) by PQ, during evaluation.
    Theoretically, PQ may flashing values of immutable variables like for Buffered values and then restoring from disk when needed, instead of just removing the variable. Or just throw an error. It seems for me (via logs), that PQ sometime recalculates, but you never sure whether you interpret such info correctly.

    3. Is it possible that PQ released Buffered memory (by removing buffered table from memory) BEFORE the end of evaluation? In general it may be possible, because, while evaluating, M analyzer may find, that Buffered variable is not used in the remain steps of the evaluation plan. If we have low remain memory or big data tables, it may be very effective approach. We could break big data table on pieces, calculate them separately, using Buffer for every part as we want, and hope that PQ will free memory from previous part for calculating the next part of the table.
    Monday, December 24, 2018 11:38 PM
  • Regarding my question 1 - just now I've got situation, showing me clearly that Buffer has effect for tables with tables in cells (not only for scalars in cells, as I wrote earlier). Sorry guys, who probably read this discussion, for different information. But I (as ordinary PQ user) have access only to interpretations, not to facts (which are program code)...

    And overall feeling is that in case of deeper tables Buffer doesn't have effect for deep structures. So in case of cells of type of Record with tables, Buffer didn't help me (but again this may be misinterpretation).

    Thursday, January 3, 2019 10:01 PM
  • Hi Andrey. Regarding your questions:

    1. Buffering discards any nested non-scalar values (tables, lists, records, etc.) and converts them into text values such as "[Table]", "[List]", "[Record]", etc. Question 1.2: The way cells of a table are calculated depends on the table. But buffering a table will definitely force any "lazy" cells or rows to be calculated immediately, resulting in non-lazy values in the buffered version of the table. CORRECTED ANSWER: Buffering is shallow. It will force the evaluation of any scalar cell values, but will leave non-scalar values (records, lists, tables, etc.) as-is.
    2. The behavior you describe (automatically storing buffered values on disk and removing them from memory) is not part of how our system works AFAIK. If you buffer a bunch of large tables that are all being actively referenced, you can end up getting an out-of-memory error.
    3. Once a buffered value is no longer referenced, it will be garbage collected by .NET. But this depends on what you're doing with the buffered value, and whether the references to it treat it as a lazy value to be accessed as-needed, or as something to be accessed immediately and then discarded.

    EDIT: Curt pointed out my previous answer to #1 was incorrect. I've corrected it above. (The previous answer describes what happens when the Data Privacy Firewall buffers data, not what Table.Buffer does all by itself.)

    Ehren




    Monday, January 7, 2019 11:28 PM
    Owner
  • Hi Ehren!

    Thanks a lot for your answers!

    Tuesday, January 8, 2019 11:50 PM
  • You're welcome. In addition to updating my answer to your last questions, I just updated the diagram. Lots of text changes that hopefully make things clearer (and more accurate). It's also now color-coded to show PBIDesktop-specific things in yellow, and Excel-specific things in green.

    Ehren

    Wednesday, January 9, 2019 7:54 PM
    Owner