none
Merging tables in Power Query severely reduces performance RRS feed

  • Question

  • I am denormalizing tables in Power Query and loading to data model for Power Pivot.  Most of the data is in a SQL ERP.  I try to maximize the number of steps that use Query Folding.  I frequently can Expand columns to get the additional fields and put those steps in the beginning of the query.  Those steps seem to process quickly.  Some columns that can be expanded show Table, indicating I am on the one side of a one-to-many relationship.  Some columns show Value, indicating I am on the many side of a relationship.  All is fine.

    Other times I have to merge queries.  That step breaks Query Folding so I put them closer to the end of the query.  The merge queries step adds a new column that can be expanded.  The new column always seems to have Table on each line, rather than Value.  Apparently Power Query defaults to Table as it is not capable of detecting if there is a one-to-one or many-to-one relationship to the merged second table.  

    The issue is Power Query takes a long time and a lot of RAM and CPU computing power to complete the merge and expanding steps.  I can see this by viewing Windows Task Manager.  I also notice it because manual refreshes take a long time.  Also, when editing the query, it takes a long time for steps to complete.  A query can be working fine and fast.  As soon as I add a merge query step, performance is terrible.

    I played around with an idea posted by Chris Webb to Remove Duplicates on the tables as a last applied step.  I also tried adding an Index column.  Maybe some improvement but not significant.

    I do not think the issue is with the amount of data.  No table has more than 400,000 rows and no more than 15 columns.  Worse case I am merging 400,000 rows with left outer join to 400,000 rows with a one-to-one relationship.  Or merging 400,000 rows with left outer join to 100,000 rows with a many-to-one relationship.  The problem even occurs when merging a small imported Excel table (20 rows and 6 columns) with a table from the ERP.  With nearly every merge, there is not a match for every record so null values are often present.

    I do not think the issue is with my computer.  It is newer.  Intel Xeon 3.70 GHz.  Ram 32 GB.  64 bit operating system.  64 bit based processor.  Software is Windows 10 Pro for Workstations.  Using Office 365 ProPlus Version 1908 (Build 11929.20254) with automatic download of updates.

    I do not have the ability to create a SQL view on the server.

    Please help as I have struggled with this for a long time.  Thank you!!!

    Tuesday, September 10, 2019 11:16 PM

Answers

  • Query folding is designed to break if you merge with non-SQL tables that are larger than 200 rows. 

    But you can use this workaround: https://www.thebiccountant.com/2017/07/20/sql-query-folding-bug-still-alive-sucking-powerbi-powerquery-excel/ 


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Monday, September 16, 2019 5:55 AM
    Moderator

All replies

  • Using Table.Buffer() on the table you're adding right before the merge might save some time. It usually does for me when using custom joins with Table.AddColumn(), but perhaps it will improve the performance of your regular join.

    Alternatively, if your queries manage to query fold up to the join you could copy the folded query code (by right-clicking the steps and viewing native query), and use it as an input to the database connection directly. After that you could write your join in SQL.

    EDIT: It is possible that using Table.Join() instead of Table.NestedJoin() won't break your query folding, though I can't find the button for it so you may have to write it in M

    • Edited by S.Risemann Wednesday, September 11, 2019 6:26 AM
    Wednesday, September 11, 2019 6:24 AM
  • Can you try saving your small excel table as a csv and use the csv file in your join instead using the excel file.

    Performance with excel file can be very low, while very good with csv file.

    check this: Comparing The Performance Of CSV And Excel Data Sources In Power Query

    Wednesday, September 11, 2019 9:21 AM
  • Merge (join) is a basic operation for folding, so it's surprising that the folding is not occurring. However, I've had issues where two steps would fold independently, but not when put together. In my case, changing the order of the steps allowed both steps to fold. If possible in your scenario, reordering the steps is worth a try.
    Wednesday, September 11, 2019 1:21 PM
  • Query folding is designed to break if you merge with non-SQL tables that are larger than 200 rows. 

    But you can use this workaround: https://www.thebiccountant.com/2017/07/20/sql-query-folding-bug-still-alive-sucking-powerbi-powerquery-excel/ 


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Monday, September 16, 2019 5:55 AM
    Moderator