none
Same query sometimes folds and sometimes not

    Question

  • Hi there,

    I have a query that folds nicely back to the Server - mostly. In case I increase the number of attributes in the filter it will stop folding.

    Is there a limit for this somewhere? How can I predict under which conditions the query will fold or not?


    Imke Feldmann TheBIccountant.com

    Tuesday, August 18, 2015 2:48 PM
    Moderator

Answers

  • This is a bit hacky...but could you do the joins in chunks of 200, then merge the results together?

    Ehren

    Monday, August 24, 2015 10:14 PM
    Owner
  • Hi Imke.

    When you're joining non-SQL and SQL data, the way it gets folded currently depends on the size of the non-SQL data, as well as how well it matches the first 200 rows of the SQL data.

    We'll fold individual lookup batches if:

    • There are between zero and 100 non-SQL rows
    • OR there are <= 200 non-SQL rows, and the data sparsely matches the first 200 rows of the SQL side (< 2%).

    Otherwise, we'll pull the data down and join locally.

    It’s a heuristic meant to balance performance (latency per lookup query vs. time to download the entire dataset).

    It sounds like this heuristic isn't working as intended for your scenario. Any additional info you could provide that might help us improve it?

    Ehren

    Wednesday, August 19, 2015 5:27 PM
    Owner

All replies

  • Hi Imke. Can you share the M query script you're using, as well as the change which causes it not to fold?

    Ehren

    Tuesday, August 18, 2015 6:13 PM
    Owner
  • Hi Ehren, thank's for you quick supply.

    This is the M-code:

    let
        Quelle = Sql.Databases("..."),
        DBName= Quelle{[...,
        dbo_Daten = ...,
        Zusammenführen = Table.NestedJoin(dbo_Daten,{"ZRId"},ZRIdFilter,{"Filter"},"NewColumn",JoinKind.Inner),
        Zusammenführen1 = Table.NestedJoin(Zusammenführen,{"Datum"},Datumsfilter,{"Zeilenbeschriftungen"},"NewColumn.1",JoinKind.Inner)
    in
        Zusammenführen1

    Below comes the profiler query. So when merging against the list that contains the filter attributes the code gets quite long because it shows every single ID. So when I add additional lines to this filter-table, at a certain lenght the query will stop folding. I thought that this might have sth do with a Maximum length of the SQLstatements or similar.

    As I'm evaluating this for a customer project where this technique would be used on different data, I need to know which rule I can follow to be sure that if once a query folds it will also do with different data input. The performance difference between folding and not folding is so huge on this customers data that not folding would make the solution not useable.

    select [$Ordered].[DBTabId],
        [$Ordered].[ZRId],
        [$Ordered].[Datum],
        [$Ordered].[Index],
        [$Ordered].[IsValid],
        [$Ordered].[Wert],
        [$Ordered].[Qualitaet],
        [$Ordered].[VWert]
    from
    (
        select [_].[DBTabId],
            [_].[ZRId],
            [_].[Datum],
            [_].[Index],
            [_].[IsValid],
            [_].[Wert],
            [_].[Qualitaet],
            [_].[VWert]
        from [dbo].[Daten] as [_]
        where ((((((((((((((((((((((([_].[ZRId] = 1069176 or [_].[ZRId] = 1069182) or [_].[ZRId] = 1069191) or [_].[ZRId] = 1069197) or [_].[ZRId] = 1069202) or [_].[ZRId] = 1075509) or [_].[ZRId] = 1075510) or [_].[ZRId] = 1075511) or [_].[ZRId] = 1075512) or [_].[ZRId] = 1075515) or [_].[ZRId] = 1075516) or [_].[ZRId] = 1075517) or [_].[ZRId] = 1075518) or [_].[ZRId] = 1243781) or [_].[ZRId] = 1243799) or [_].[ZRId] = 1243841) or [_].[ZRId] = 1649145) or [_].[ZRId] = 1649146) or [_].[ZRId] = 1649152) or [_].[ZRId] = 1652117) or [_].[ZRId] = 1652119) or [_].[ZRId] = 1652121) or [_].[ZRId] = 1652123) or [_].[ZRId] = 1652125) or [_].[ZRId] = 1652127
    ) as [$Ordered]
    order by [$Ordered].[DBTabId],
            [$Ordered].[ZRId],
            [$Ordered].[Datum],
            [$Ordered].[Index]


    Imke Feldmann TheBIccountant.com

    Wednesday, August 19, 2015 9:23 AM
    Moderator
  • Hi Imke.

    When you're joining non-SQL and SQL data, the way it gets folded currently depends on the size of the non-SQL data, as well as how well it matches the first 200 rows of the SQL data.

    We'll fold individual lookup batches if:

    • There are between zero and 100 non-SQL rows
    • OR there are <= 200 non-SQL rows, and the data sparsely matches the first 200 rows of the SQL side (< 2%).

    Otherwise, we'll pull the data down and join locally.

    It’s a heuristic meant to balance performance (latency per lookup query vs. time to download the entire dataset).

    It sounds like this heuristic isn't working as intended for your scenario. Any additional info you could provide that might help us improve it?

    Ehren

    Wednesday, August 19, 2015 5:27 PM
    Owner
  • Thanks, I try, but there's only 1 database table involved here:

    • Table has a about 60 Mio rows and the columns you see above
    • 700k distinct values in column ZRId,
    • Dates ranging from 1940 up to today (31k)
    • not many distinct values in the other columns

    Although the number of rows per ZRId varies a lot, the query stopped folding at exactly 200 filter attributes/distinct ZRIds (target was to select 2,5 k distinct IDs). Tried this with 3 random selections - so this might be sth to look into. Will try to get a count per distinct ZRId tomorrow in order to relate it to the actual number of rows returned here.

    Just to better understand the engine: Would it make a difference if I split up the filter list into different queries with "Only create Connection" and then append them all?


    Imke Feldmann TheBIccountant.com

    Wednesday, August 19, 2015 8:44 PM
    Moderator
  • Ok. What's happening in ZRIdFilter and DatumsFilter?

    Ehren

    Thursday, August 20, 2015 12:03 AM
    Owner
  • They are 1 column tables coming from namged ranges; Using an Offset formula in order to dynamically retrieve the slicer selections from a Pivot table (as these cannot be passed directly into PQ).

    No load: "Only create Connection".


    Imke Feldmann TheBIccountant.com


    Thursday, August 20, 2015 4:06 AM
    Moderator
  • If they're named ranges, then you're joining non-SQL and SQL data, and the above heuristic is coming into play.

    Ehren

    Thursday, August 20, 2015 5:00 PM
    Owner
  • Hi Ehren,

    thanks for your answer but I'm having difficulties to understand it:

    I thought we were looking into why the heuristic didn't work as advertised in my example: Stopped query folding well below the 2% threshold (actually well below 0,5% in my case)

    ?


    Imke Feldmann TheBIccountant.com

    Thursday, August 20, 2015 7:28 PM
    Moderator
  • Hi Imke. I was thrown off by your statement "there's only 1 database table involved here", which I initially took to mean you weren't joining SQL and non-SQL data.

    Now that we're on the same page... :)

    I reached out to the M devs again. There's one other piece to the heuristic that is likely affecting your results: We only use the first 200 rows from each side to compute the match rate. How many of the first 200 rows from the SQL side match keys in the non-SQL side?

    (I've also updated my answer above to include this info.)

    Ehren

    Thursday, August 20, 2015 8:10 PM
    Owner
  • Hi Ehren,None of the first 200 rows of the SQL fact table are matching.

    There are around 700 thousand distinct keys in the table where the first couple of thousands ones are so old that they haven't been in any selection at all.

    But isn't it a strange coincidence that my tests identified 200 as the threshold on the non-SQL-side where the query stopped folding? I repeated it again today with more random selections that deliver quite different return amounts and always ended up at 200 being the Maximum amount on the non-SQL-side to be returned.

    Buffled about this I took the same query to a different test-database with the same data structure and created the selection list directly from the SQL-table. Thereby unintendedly selecting the first 200 rows from the SQL-side as my initial test selection for the non-SQL-side: And it didn't fold. Returned to 100: Didn't fold, so maybe the 200 threshold was just "valid" for the original database.

    Then I returned the selection further in order to identify the specific threshold for this database. And ended up at 0! This query wouldn't fold at all - same query like in the other database. Closed the workbook, closed Excel opended again - same: no folding.

    Restarted my Computer: No folding.

    Created another query that should return the same data but by a different filter method:

    = Table.SelectRows(#"Entfernte Spalten", each [ZRId] = 10001 or [ZRId] = 20020)

    This time the query folded & values were returned. So basically asking for the same result in a different way made the query fold or not.

    ?


    Imke Feldmann TheBIccountant.com

    Thursday, August 20, 2015 8:58 PM
    Moderator
  • Ok, so it turns out there is a 200 row limit for the non-SQL data, at which point we fall back to a local join rather than folding. There are also some other bits of logic which come into play. I've updated the answer above to reflect the current code.

    According to the M devs, the 200 row limit is a bug. We'll be addressing it in an upcoming release.

    Thanks for your patience, and hopefully the above answer will help you make sense of the behavior you're seeing. Let me know if there's anything that's still unclear.

    Ehren

    Thursday, August 20, 2015 10:45 PM
    Owner
  • Thanks a lot for your answer – so this wouldn’t work in my case here, very sad. How about an option to turn off the heuristic manually?

    An alternative for me would be to go via the cube (tabular). Any idea how to pass parameters from a non-sql-environment (excel sheet) to the query there? I’ve tried a couple of things but as long as the query is in “cube-mode” (which seems to be prerequisite for query folding to happen), the usual solutions for this (merge with filter table or pass in as filter attributes) don’t seem to work.

    Thx!


    Imke Feldmann TheBIccountant.com


    Monday, August 24, 2015 5:54 AM
    Moderator
  • This is a bit hacky...but could you do the joins in chunks of 200, then merge the results together?

    Ehren

    Monday, August 24, 2015 10:14 PM
    Owner
  • Thanks, this worked: Managed to create a procedure that automatically splits up the list & consolidates the result, no matter how long it is so the client can use it wherever he wants. Query folding works in the function and thanks to a buffer at the right place it’s also pretty fast.

    Thanks again for your help!


    Imke Feldmann TheBIccountant.com

    Wednesday, August 26, 2015 2:30 PM
    Moderator
  • Hi Ehren,

    this bug is still there. Is it possible to tell when we can this to be fixed?

    Thx - Imke


    Imke Feldmann - www.TheBIccountant.com - MVP Data Platform (PowerBI) If you have posted a question, please mark the answer(s) that helped you solve your problem as answer. If you found an answer helpful, please vote it as helpful.

    Thursday, June 29, 2017 6:22 AM
    Moderator
  • Hi Imke. Unfortunately this bug was deemed as not high-pri enough to warrant a fix. :(

    Ehren

    Wednesday, July 19, 2017 8:00 PM
    Owner
  • Thanks Ehren,

    that's indeed very unfortunate.

    I've improved my solution for the bug-fix here: http://www.thebiccountant.com/2017/07/20/sql-query-folding-bug-still-alive-sucking-powerbi-powerquery-excel/


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Wednesday, July 19, 2017 10:55 PM
    Moderator
  • Unfortunately this bug was deemed as not high-pri enough to warrant a fix. :(

    I think that your marketing folks would argue that its been a feature all along...with a 'well considered' limitation. :)

    Thursday, July 20, 2017 12:46 AM
  • Hi Imke and Colin. I hear you. I was also surprised and disappointed by the decision. Imke, it would help me make the case to reactivate the bug if you could create an entry for this on the Power BI Ideas site, and ask the folks reading your article to vote for it.

    Roughly how many hits a month does your article get? That number might help as well.

    Ehren

    Thursday, July 20, 2017 4:19 PM
    Owner
  • Thx Ehren, I know you're on our side ;-)

    Short version: Campaigning for bug-fixes is not my thing and I'm not good at it. Will rather spend my time on creating sth useful instead.

    (My article has been read nearly 2300 times since published)


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Thursday, July 20, 2017 8:18 PM
    Moderator
  • Have finally filed a bug-report: http://community.powerbi.com/t5/Issues/Bug-Query-folding-not-working-with-non-SQL-datasource/idi-p/225100#M11611

    Please upvote!! Thx.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Thursday, August 03, 2017 6:39 PM
    Moderator