none
What is better from internal presentation point of view - Group & Expand, or Group & Combine? RRS feed

  • Question

  • Hi All!

    What is better from internal presentation point of view - Group & Expand, or Group & Combine?

    I mean

    s1 = Table.Group(t, …, {"Data", each _}),
    s2 = ...
    s3 = Table.ExpandColumns(s1, "Data"...)
    vs
    s1 = Table.Group(t, …, {"Data", each _}),
    s2 = ...
    s3 = Table.Combine(s1[Data])


    Friday, July 19, 2019 8:02 AM

Answers

  • As Colin suggests, the primary difference is that Table.ExpandTableColumn requires you to supply a list of column names. This can be a drawback or an advantage, depending on what you're trying to do. The main advantage -- maybe even the sole advantage -- is that we don't have to calculate the table type until all of the previous tables have had their rows exhausted. By contrast, for Table.Combine we have to calculate all of the input table types before we can compute the result. If, for instance, the input tables are the result of a REST request it would mean that we have to make all the requests up front.
    Monday, July 22, 2019 3:59 PM

All replies

  • Hi Andrey,

    All things being equal, Table.Combine is better because you do not have to worry about column names or count changing in the future, unless the columns that changed participate in your table evaluation, before combining or expanding (in which event you are screwed either way).

    If you happen to add new columns to the 'outer' table before combining or expanding the column with tables, Table.Combine obviously will not contain those extra columns after combining. The solution would be to either use Table.ExpandColumns instead, or perform a join between the combined tables and an earlier table with the extra columns. I prefer to avoid hard-coded column names wherever possible.

    Sunday, July 21, 2019 6:37 PM
  • As Colin suggests, the primary difference is that Table.ExpandTableColumn requires you to supply a list of column names. This can be a drawback or an advantage, depending on what you're trying to do. The main advantage -- maybe even the sole advantage -- is that we don't have to calculate the table type until all of the previous tables have had their rows exhausted. By contrast, for Table.Combine we have to calculate all of the input table types before we can compute the result. If, for instance, the input tables are the result of a REST request it would mean that we have to make all the requests up front.
    Monday, July 22, 2019 3:59 PM
  • Curt, thanks for your feedback regarding the internal logic of the function.

    Regarding "list of column names", I use Table.ColumnNames for that (actually I have a function for that).

    BUT - I see that it leads M to query the file one more time. For me easier handling code is more important than decrease in performance (I access local files), but it would be great to try to avoid that.

    Do you guys have an idea how to avoid hard-coding column names and to avoid double querying data source, using appropriate operators? The question is NOT about alternatives to such operators, but about avoiding usage of hard-coded field / column names in such operators and not lead M to query the source twice.

    By the way - one of the ways would be not to query a source twice if I get a list of columns / fields :-). M is lazy evaluated at the end, and I use data and info of their structure in the same query step, but it doesn't help.


    Monday, July 22, 2019 4:53 PM
  • You can't have it both ways :). Either we get the list of column names from the source of the query or we get it from the source of the data.
    Monday, July 22, 2019 5:14 PM
  • Table.Combine we have to calculate all of the input table types before we can compute the result. If, for instance, the input tables are the result of a REST request it would mean that we have to make all the requests up front.

    That's good to know, and keep in mind.

    My response was based on Andrey's example of using Table.Combine from a previously grouped table (vs. using Table.ExpandColumns).

    When you say "calculate all of the input table types", do you mean the column types of each table?

    Monday, July 22, 2019 7:35 PM
  • I think it's about all what is included in (record of) table type definition, so the list of field names with their types.
    Monday, July 22, 2019 8:40 PM
  • There is a major issue with the Table.Group and Table.Combine method however. If the original table gets zero rows, then the resultant table (after grouping and combining) is not only empty but it also has no columns. This loss of table schema can cause a lot of problems: causing errors latter steps or breaking things in PBID.

    I have a question though, regarding what Andre Minakov said: "avoid double querying data source" -  could/should there be a way of buffering just the type of each step, (that includes complex table types) so functions like Table.ColumnNames don't cause the step do be recomputed? 

    This could be useful in many scenarios.

    • Edited by CamWally Tuesday, July 23, 2019 1:24 AM
    Tuesday, July 23, 2019 1:24 AM
  • Does Table.Combine lead to 2 queries to every table (it's source) - one for the structure, one for the data themselves?

    And, Just in case - Curt, please could you say - is table & table is strictly the same as Table.Combine, from "internal" point of view?


    Monday, July 29, 2019 12:25 PM
  • Yes, "table1 & table2" is strictly the same as Table.Combine({table1, table2}) -- the same is true for lists and record.

    I don't think "query" is really the right word, but that notwithstanding I'd say the answer to your question is "yes". To calculate a table value, we have to calculate its type -- but we don't have to fetch its rows until an operation which requires them is done against the value. For something like a relational database, calculating the type is done by getting the metadata from the server and is a different kind of call than fetching the data. For a lot of REST APIs, though, there's only one API which returns both the data and an implicit structure. That's where you end up making what looks like duplicate requests.

    Monday, July 29, 2019 4:42 PM