none
Wrong behavior of FullOuter and RightOuter Join mode followed by Table.SelectRows, and 2 other questions RRS feed

  • Question

  • Hi All!

    Question 1.

    If you launch the code:

    let
        Source = Table.NestedJoin(Table.Buffer(#table(type table [k = number, w = text], {{1, "w"}, {2, "w"}})), {"k"}, Table.Buffer(#table(type table [k = number, w = text], {{2, "k"}, {4, "w"}})), {"k"}, "ww", JoinKind.FullOuter),
        #"1.Zombie Values if filter Null" = Table.SelectRows(Source, each ([w] = null)), 
        #"2.Zombie Null Rows when filter not null" = Table.SelectRows(Source, each ([w] <> null))
    in 
        #"2.Zombie Null Rows when filter not null"

    you'll see:

    In step 1 - 2 values in ww Field of the Table instead of 1. 

    In step 2 - you can still see row with null values, despite the filter for not null values.

    I understand that this is due to Streaming semantics of tables evaluation, or lazy evaluation, or errors of translating M code in the internal (because Source step is OK, the errors takes place with the next steps), but this is wrong behavior anyway.

    Replacing null values doesn't work either.

    But Table.Buffer and Table.AddIndexColumn solves then problem.

    Please could you comment on that? Isn't that a bug? If yes - this is quite a fundamental problem for ETL tool, isn't it? When can we expect the fix?

    Question 2.

    This code doesn't solve the problem:

        Source = Table.NestedJoin(#table(type table [k = number, w = text], {{1, "w"}, {2, "w"}}), {"k"}, #table(type table [k = number, w = text], {{2, "k"}, {4, "w"}}), {"k"}, "ww", JoinKind.FullOuter),
        Custom4 = let k = (r) => r in k(Source),
        #"1.Zombie Values if filter Null" = Table.SelectRows(Custom4, each ([w] = null)),
        #"2.Zombie Null Rows when filter not null" = Table.SelectRows(Custom4, each ([w] <> null)),

    Key difference is that line:

    let k = (r) => r in k(Source)

    I read in the Internet that parameters, passed to functions, are eagerly evaluated. In case of a table that means (as far as I understand) the same effect as Table.Buffer. But, as I wrote above, Table.Buffer solves the problem, while passing the table as function argument doesn't.

    The question is - are parameters of function eagerly evaluated?

    UPDATE - SCALAR values are definitely eagerly evaluated, e.g. this code helps with the problem here:

    Table.TransformColumns(.... each let f = (x) => x...

    But if I transfer table value, virtually nothing happens. So are non scalar values not evaluated at all when they are passed as a function parameter?

    Question 3.

    Why Table.AddIndexColumn solves the problem? As far as I understand, Table.Buffer helps with the problem due to making the table immutable, so errors in streaming semantics algorithm / lazy evaluation cannot affect the rest of steps. But why Table.AddIndexColumn may force any of the two? Or Table.AddIndexColumn influences the PQ behavior in the different way?



    Sunday, June 23, 2019 8:53 PM

Answers

All replies

  • Hi Andrey. This definitely seems like a bug. What happens if you make the types of the columns in the Source step nullable instead of non-nullable?

    Ehren

    Tuesday, June 25, 2019 7:36 PM
    Owner
  • Works with nullable types if an index column is added after the join step. Both conditions must be there for the query to work. Too many minefields for my liking. Buffering the tables isn't necessary.

    Edit: nullable types with buffering of joined table (but not source tables) also works (without the need for an index column).

    Tuesday, June 25, 2019 10:32 PM
  • I do not see any difference in results between nullable text and text
    The index column or Table.Buffer seem to be the only change maker

    let
        table1 = #table(type table [k = number, w = /*nullable*/ text], {{1, "w"}, {2, "w"}}),
        table2 = #table(type table [k = number, w = /*nullable*/ text], {{2, "k"}, {4, "w"}}),
        Source = Table.NestedJoin(table1, {"k"}, table2, {"k"}, "TOTO", JoinKind.FullOuter),
    
        #"1.Zombie Values if filter Null" = Table.SelectRows(#"Source", each [w]=null),
        #"2.Zombie Null Rows when filter not null" = Table.SelectRows(#"Source", each [w]<>null),
    
        #"1.with Buffer" = Table.SelectRows(Table.Buffer(#"Source"), each [w]=null),
        #"2.with Buffer" = Table.SelectRows(Table.Buffer(#"Source"), each [w]<>null),
    
        Index = Table.AddIndexColumn(Source, "Index", 0, 1),
        #"1.with Index" = Table.SelectRows(#"Index", each [w]=null),
        #"2.with Index" = Table.SelectRows(#"Index", each [w]<>null)
    
    in #"2.with Index
    • Edited by anthony34 Wednesday, June 26, 2019 7:14 AM
    Wednesday, June 26, 2019 7:11 AM
  • Ok, thanks for the additional info. What version of PQ are you all seeing this in?

    Ehren

    Wednesday, June 26, 2019 4:02 PM
    Owner
  • Version 2.70.5494.701 64-bit

    Thursday, June 27, 2019 6:13 AM
  • Hi Ehren! I didn’t try to make column types to nonnullable. I have to say that nulls happen almost at every join, but only Full/RightOuter works with the error. But anyway in my case both buffering and indexing helped with the issue like I wrote above. Making Table.TransformColumns(.... each let f = (x) => x... also helped _in this case_.

    By the way - replacing the null values doesn't work either.

    But I also see an error with Table.Partition , if I perform it after a FullJoin. I cannot reproduce it on a simple code snippet, by in my complex code it happens. And buffering and indexing help me with Partition, but not the function I described here. I have to say that indexing helps with Partition even if I just add the indexing step and on the next step after that delete it, thus not using the values at all, so as far as I understand it has to have no influence on anything (except query folding, which definitely doesn't takes place here in any way). While it’s good and relatively cheap for tackling the error, I’m very curious what is the magic with indexing in PQ?

    My version in PBID - 2.70.5494.761 64-bit (June 2019)

    And - surprise - in Excel all works perfectly! Excel 2016 MSO (16.0.4849.1000) 64 bit.

    Thursday, June 27, 2019 11:26 AM
  • Excel 365 - 16.0.11727.20222 64 bit
    Thursday, June 27, 2019 3:03 PM
  • Thanks everyone. I am able to repro the issue with step "2." (<> null) but not step "1." (= null). I've filed a bug and we'll look into it.

    Ehren

    Thursday, June 27, 2019 6:55 PM
    Owner