none
When to use Table.Buffer RRS feed

  • Question

  • Hi,

    I know table.buffer can be very helpful in some cases. However, it can also slow down the query.

    https://social.technet.microsoft.com/Forums/en-US/5601e968-b52e-40ef-81bf-c9f9aa43613b/difference-between-binarycombine-and-tablecombine?forum=powerquery#7777681f-4083-4671-8eba-3269e4cc82ec

    This is my last post. 

    If I change 

        Result1=Table.Combine({CSV1,CSV2}),
        Result2=Table.Combine({Result1,CSV3}),
        Result3=Table.Combine({Result2,CSV4})

    to 

        Result1=Table.Buffer(Table.Combine({CSV1,CSV2})),
        Result2=Table.Buffer(Table.Combine({Result1,CSV3})),
        Result3=Table.Buffer(Table.Combine({Result2,CSV4}))

    it will be much slower (15s vs 47s).

    Anyone knows why?

    Thanks

    Tuesday, November 3, 2015 7:57 PM

Answers

  • A "limited working set" means we limit the amount of stuff we store in RAM. Anything over that limit gets temporarily stored on your hard drive, which is much slower to access than RAM. (This can lead to "thrashing", where things on the hard disk are loaded into RAM and things in RAM written to the disk, then more things on disk are loaded into RAM which requires more things in RAM to be written to the disk, and so on). This working set limit is much less than 8GB, which means you could be hitting it even if your computer has plenty of RAM.

    I would suggest using Table.Buffer when you know for sure that reading all the data into memory is faster than just letting the M engine handle things the way it usually does.

    Ehren

    • Proposed as answer by Maxim ZelenskyMVP Friday, November 13, 2015 9:06 PM
    • Marked as answer by bjzk Monday, November 16, 2015 7:09 PM
    Thursday, November 12, 2015 10:29 PM
    Owner

All replies

  • Is the time you listed for previewing the data in the Query Editor, or filling to Excel/loading to the data model?

    Ehren

    Wednesday, November 4, 2015 8:23 PM
    Owner
  • File to Excel only (Right click the query and click refresh).
    Thursday, November 5, 2015 3:58 PM
  • Here's the answer I received from someone on the M engine team:

    My guess is that this is due to reading all of the files into memory.  Without buffering, all of the rows would stream through to Excel and only use a small amount of memory.  With buffering, it’s going to read all of the rows into memory.  Additionally, it’s going to make copies of Result1 and Result2 during the process as well.  Since we run with a limited working set, I might imagine that we’ve exceeded that size and are thrashing to disk.

    Ehren


    Thursday, November 5, 2015 9:38 PM
    Owner
  • BTW, do you know that you can combine more than 2 tables in a single step - but not with the UI ?
    You could write Result1=Table.Combine({CSV1,CSV2, CSV3,CSV4})
    Yet, it may not have a perf impact.

    Friday, November 6, 2015 10:48 AM
  • Hi Ehren,

    Sorry for the late answer. Busy with other stuff recently.

    I am not following "Since we run with a limited working set, I might imagine that we’ve exceeded that size and are thrashing to disk" part. My background is not IT. Hope the below question does not look dumb to you.

    Do you mean reading Result 1 and Result 2 into memory and make copies of them would run out of memory? How come? Result 1 and Result 2 are not big enough to run out of 8GB RAM. 

    If "thrashing to disk" is the potential risk of using table.buffer, when would be the good time to use table.buffer?

    Thanks 



    Thursday, November 12, 2015 7:40 PM
  • Thanks for the suggestion. The real question here is when to use table.buffer. Using it in the right position would speed up; otherwise, it will slow down.
    Thursday, November 12, 2015 7:42 PM
  • A "limited working set" means we limit the amount of stuff we store in RAM. Anything over that limit gets temporarily stored on your hard drive, which is much slower to access than RAM. (This can lead to "thrashing", where things on the hard disk are loaded into RAM and things in RAM written to the disk, then more things on disk are loaded into RAM which requires more things in RAM to be written to the disk, and so on). This working set limit is much less than 8GB, which means you could be hitting it even if your computer has plenty of RAM.

    I would suggest using Table.Buffer when you know for sure that reading all the data into memory is faster than just letting the M engine handle things the way it usually does.

    Ehren

    • Proposed as answer by Maxim ZelenskyMVP Friday, November 13, 2015 9:06 PM
    • Marked as answer by bjzk Monday, November 16, 2015 7:09 PM
    Thursday, November 12, 2015 10:29 PM
    Owner
  • Hi Ehren,

    Thanks for the explanation. I suppose I have experiment first every time I want to use Table.Buffer.


    Friday, November 13, 2015 5:33 PM
  • Table.Buffer appears to have solved a problem I was having.

    I couldn't figure out how get a deduplication step to respect my sort order and retain the top row for each value in my selected column. With limited QA, it looks like using Table.Buffer directly after the sort and before deduplicating does the trick.

    I'm using small datasets (<3,000 rows...that's "less than three thousand", not heart thousand), so I haven't had any processing issues.

    Wednesday, March 13, 2019 5:06 AM