none
Pivot Column (from GUI). Double evaluation??? RRS feed

  • Question

  • Hi

    Despite reading and searching I'm still not sure what happens, from an evaluation perspective, when i.e. one pivot a column from the PQ GUI. As this might help other newbies, basic scenario is:

    Pivoting the [date] column with a Sum of column2 values generates the following code (a bit reorg. for clarity):

    step = Table.Pivot(Table.TransformColumnTypes(PreviousStep, {{"date", type text}}, "xx-YY"),
         List.Distinct(Table.TransformColumnTypes(PreviousStep, {{"date", type text}}, "xx-YY")[date]),
         "date", "column2", List.Sum)

    No problem to understand this. Question: is the Table.TransformColumnTypes really evaluated twice (I would not expect) or is the evaluation model good enough to figure-out, while evaluating the List.Distinct, that the "information" is already "available"?

    Thanks all for any info. re. the above idiot question + Any pointer (other than this one) that could help better understand this will be highly appreciated

    Off topic (to MSFT): Interesting to note that the last arg. (culture) of the Table.TransformColumnTypes function is optional but enforced when pivoting from the GUI. I guess there must be a good reason for this. If one knows, I'm interested as well

    Saturday, February 2, 2019 2:50 PM

Answers

  • "However, between a non-optimized code and one that does twice the same thing I see a real gap… that I didn't expect"

    Actually, in terms of optimized code, I was only addressing your question about culture. As far as the pivot code is concerned, there is no other way to generate it, if the column type that you're pivoting is not text. The code would be exactly the same if written manually.

    The first parameter in Table.Pivot uses the previous step as the first parameter. To generate a distinct list of pivot values for the second parameter, we get the list of dates from the previous step (where else?), and then derive a distinct set of date values. The two parameters are independent of each other, so duplication of the TransformColumnTypes function is inevitable. The fact that we end up with two identical TransformColumnTypes functions simply suggests that we should create a separate step with the transform function, so that we can refer to the step name in the pivot function parameters. This is exactly what you're done by adding the TransformedDateAsText step.

    Now both the date column in the table passed to Table.Pivot and list of dates in the second parameter must be of the same type. If the types are different, you will get nulls everywhere, since values of type date and type text are different (therefore no association can be made to match the pivot values). 

    Lastly, as you would have observed, after you add the TransformedDateAsText step, you can do the pivoting from the UI, and the resulting code has no TransformColumnTypes functions, since the dates are of the proper text type.



    • Marked as answer by Lz._ Monday, February 4, 2019 7:51 PM
    Monday, February 4, 2019 6:18 PM

All replies

  • Hi Lz_

    No problem to understand this. Question: is the Table.TransformColumnTypesreally evaluatedtwice (I would not expect) or is the evaluation model good enough to figure-out, while evaluating the List.Distinct, that the "information" is already "available"?

    It should be clear that the evaluation is done twice. This is so because both calls to Table.TransformColumnTypes use PreviousStep as the first parameter, and PreviousStep doesn't have the required text type for the date column.

    Table names must be text values, so the conversion of date values to text values is necessary. When pivoting a column that is non-text, the column type should be changed to text before pivoting. This approach simplifies the pivot step, since you don't have to make multiple calls to Table.TransformColumnTypes.

     Off topic (to MSFT): Interesting to note that the last arg. (culture) of the Table.TransformColumnTypes function is optional but enforced when pivoting from the GUI. I guess there must be a good reason for this. If one knows, I'm interested as well

    When code is being auto-generated, the code generator has to assume that the generated code may not be modified manually, and therefore it uses appropriate defaults for optional parameters.

    This is the same approach used when recording a macro in Excel. In this case, the recorder includes default values for all of the properties of an object, even if you don't need to specify 95% of these properties. Excel Pros know how to simplify recorded macros to the bare essentials.


    Sunday, February 3, 2019 4:14 PM
  • Hi Professor Colin ;-)

    It should be clear that the evaluation is done twice
    That's what I was afraid to here :-(

    This is so because both calls to Table.TransformColumnTypes use PreviousStep as the first parameter, and PreviousStep doesn't have the required text type for the date column. Table names must be text values, so the conversion of date values to text values is necessary. When pivoting a column that is non-text, the column type should be changed to text before pivoting. This approach simplifies the pivot step, since you don't have to make multiple calls to Table.TransformColumnTypes
    All of that was well understood before posting, hence my question (real double evaluation?)

    Moving forward I'll instead do (open to other suggestions/recommendations):

        ChangedType = Table.TransformColumnTypes(Source,
            {{"column1", type text}, {"date", type date}, {"column2", Int64.Type}}),
        TransformedDateAsText = Table.TransformColumnTypes(ChangedType,
            {{"date", type text}}
        ),
        DistinctDates = List.Distinct(TransformedDateAsText[date]),
        PivotedDate = Table.Pivot(TransformedDateAsText, DistinctDates, "date", "column2", List.Sum)  
    

    According to the last para. of the Evaluation model Lists aren't eagerly evaluated so this should cause no problem in such case

    Sure I know and understand that generated code (like as you say when recording an Excel macro) isn't fully optimized. However, between a non-optimized code and one that does twice the same thing I see a real gap… that I didn't expect

    Anyway that's it and good to know. One reason could be that the APPLIED STEPS panel should display a single step (vs. 3 above). From a marketing perspective that make sense…

    Once again it's really appreciated Colin and recommendations are more than welcome
    If you don't mind I'll keep this thread open for a few more days in case someone would want to add something

    Monday, February 4, 2019 5:04 PM
  • "However, between a non-optimized code and one that does twice the same thing I see a real gap… that I didn't expect"

    Actually, in terms of optimized code, I was only addressing your question about culture. As far as the pivot code is concerned, there is no other way to generate it, if the column type that you're pivoting is not text. The code would be exactly the same if written manually.

    The first parameter in Table.Pivot uses the previous step as the first parameter. To generate a distinct list of pivot values for the second parameter, we get the list of dates from the previous step (where else?), and then derive a distinct set of date values. The two parameters are independent of each other, so duplication of the TransformColumnTypes function is inevitable. The fact that we end up with two identical TransformColumnTypes functions simply suggests that we should create a separate step with the transform function, so that we can refer to the step name in the pivot function parameters. This is exactly what you're done by adding the TransformedDateAsText step.

    Now both the date column in the table passed to Table.Pivot and list of dates in the second parameter must be of the same type. If the types are different, you will get nulls everywhere, since values of type date and type text are different (therefore no association can be made to match the pivot values). 

    Lastly, as you would have observed, after you add the TransformedDateAsText step, you can do the pivoting from the UI, and the resulting code has no TransformColumnTypes functions, since the dates are of the proper text type.



    • Marked as answer by Lz._ Monday, February 4, 2019 7:51 PM
    Monday, February 4, 2019 6:18 PM
  • If the types are different, you will get nulls everywhere, since values of type date and type text are different (therefore no association can be made to match the pivot values)
    That was understood

    Lastly, as you would have observed, after you add the TransformedDateAsText step, you can do the pivoting from the UI, and the resulting code has no TransformColumnTypes functions, since the dates are of the proper text type
    I did not observed it as I did not try (from the UI after creating the 
    TransformedDateAsText step). So tried…

    The fact that we end up with two identical TransformColumnTypes functions simply suggests that we should create a separate step with the transform function, so that we can refer to the step name in the pivot function parameters
    And that's exactly my point: if the UI 'Pivot Column' feature/function detects that the column to pivot is not of the correct type - IMHO - 2 steps should be generated to prevent a double Transformation

    Anyway, that's the way it is. Topic closed & Thanks very much again

    Monday, February 4, 2019 7:50 PM
  • And that's exactly my point: if the UI 'Pivot Column' feature/function detects that the column to pivot is not of the correct type - IMHO - 2 steps should be generated to prevent a double Transformation

    I see what you mean. I think that the only time multiple steps are generated is during the initial phase, when a text or Excel file is first loaded (ChangedType, PromoteHeaders...).

    Monday, February 4, 2019 9:41 PM