none
Question about necessity of buffering tables when referencing tables twice but with one of them just for the column names RRS feed

  • Question

  • Sometimes you buffer a table if you reference the same step/table twice. Quite often enough, one of these functions that reference said table will be Table.ColumnNames. It seems quite expensive to buffer the whole table when only columns names may be needed again.  

    So question is: In this scenario is it necessary to buffer the table? 

    (Do/Can column names automatically get buffered when table is evaluated?)

    I also give an example to help with this question (a subset of above scenario):

    Let's say I want to add a column which is a simple function of another other column, but I also want to put those columns next to each other. I also do not want to hard code the other columns names in the table. 

    I could write:

    let
        //Should I buffer table in step below?? - table gets referenced twice in steps: AddZ and ColNames
        Source = #table(
            {"A".."Y"},
            {{1..25}}),
        AddZ = Table.AddColumn(Source,"Z",each [C]+100), 
        ColNames = List.Buffer(Table.ColumnNames(Source)),
        ReorderedList  = List.InsertRange(
            ColNames,
            List.PositionOf(
                ColNames,
                "C",
                Occurrence.First)+1,
            {"Z"}),
        ReorderedTable =  Table.ReorderColumns(AddZ, ReorderedList)   
    in
        ReorderedTable

    but I could also write:

    let
        Source = #table(
            {"A".."Y"},
            {{1..25}}),
        #"Split Column by Delimiter" = Table.SplitColumn(Source, "C", each {_,_ + 100}, {"C","Z"})
    in
        #"Split Column by Delimiter"

    The second is smaller and I think has much better performance (but I don't actually know if this is the case). 



    • Edited by CamWally Monday, August 26, 2019 12:08 PM
    Monday, August 26, 2019 12:06 PM

Answers

  • Hi Cam. It likely depends on the data source. For example, when pulling from SQL or another database, it's possible that we fold the Table.ColumnNames call to something that doesn't require reading any of the data. And even with a text file, Table.ColumnNames will likely be smart enough to only read the first line of the file. But if the file is coming from a web source, even just reading the first line will likely involve requesting the entire file via HTTP. So it can be complicated and depend on multiple factors. Testing various approaches is always the best way to determine which is the most performant.

    Ehren

    Monday, September 9, 2019 6:00 PM
    Owner

All replies

  • Hi Cam!

    From my experience, and according to my "feeling" from answers from MS team, you always pay the full price if you access a data source for any reason :-(.

    100% sure that when you access web and file resources you'll have full-scale query (if you don't buffer them).

    All you can hope to is Query Folding in case of SQL, but I'm not sure that PQ does special SQL query just for getting the table structure, which would be very useful.


    Monday, September 2, 2019 12:37 PM
  • Hi Cam. It likely depends on the data source. For example, when pulling from SQL or another database, it's possible that we fold the Table.ColumnNames call to something that doesn't require reading any of the data. And even with a text file, Table.ColumnNames will likely be smart enough to only read the first line of the file. But if the file is coming from a web source, even just reading the first line will likely involve requesting the entire file via HTTP. So it can be complicated and depend on multiple factors. Testing various approaches is always the best way to determine which is the most performant.

    Ehren

    Monday, September 9, 2019 6:00 PM
    Owner