none
Using Table.Join formula takes extremly long time to get results.(Why Query Folding doesn't work?) RRS feed

  • Question

  • Hi,

    I built a query with 4 tables inside (load from Oracle DB and two of them are quite big, more than millions of rows). After filtering, I tried to build relationships between tables using Table.Join formula. However, the process took extremly long time to bring out results (I ended the process after 15 mins' processing). There's a status bar kept updating while the query was processing, which is showed as  . I suppose this is because the query folding didn't working, so PQ had to load all the data to local memory first then do the opertion, instead of doing all the work on the source system side. Am I right? If yes, is there any ways to solve this issue?

    Thanks.

    Regards,

    Qilong 

    Wednesday, March 18, 2015 3:00 AM

Answers

  • After an initial look, I'd guess that this

       #"Added Custom" = Table.AddColumn(Filter_AlotHists, "Custom", each Table.SelectRows(Filter_WorkWeek, (table2Row) => [PROCESS_END_TIME] >= table2Row[WRWK_START_DATE] and [PROCESS_END_TIME] <= table2Row[WRWK_END_DATE])),

    is the biggest problem. We don't have well-defined patterns around theta joins, but I'm pretty sure that Table.AddColumn does not expect to be doing a foldable join. The best reference for this today is probably Chris Webb's blog entry at https://cwebbbi.wordpress.com/2014/06/04/join-conditions-in-power-query-part-2-events-in-progress-performance-and-query-folding/

    You might also find it helpful to turn on tracing and then look at the traces; I believe we'll write the actual SQL query into the trace log, and that should indicate where we gave up trying to fold.
    Thursday, April 2, 2015 4:50 PM

All replies

  • It's not possible to answer your question without seeing the full query you're running.

    Wednesday, March 18, 2015 10:42 AM
  • Hi Curt,

    Here's the query that I'm refering,

    let
        Source = Oracle.Database("reporting"),
        AOLOT_HISTS = Source{[Schema="GEN",Item="MVIEW$_AOLOT_HISTS"]}[Data],
        WORK_WEEK = Source{[Schema="GEN",Item="WORK_WEEK"]}[Data],
        DEVICES = Source{[Schema="GEN",Item="MVIEW$_DEVICES"]}[Data],
        AO_LOTS = Source{[Schema="GEN",Item="MVIEW$_AO_LOTS"]}[Data],

        Filter_WorkWeek = Table.SelectRows(WORK_WEEK, each ([WRWK_YEAR] = 2015) and (([WORK_WEEK] = 1) or ([WORK_WEEK] = 2) or ([WORK_WEEK] = 3))), 
        Filter_AlotHists = Table.SelectRows(AOLOT_HISTS, each ([STEP_NAME] = "BAKE" or [STEP_NAME] = "COLD TEST-IFLEX" or [STEP_NAME] = "COLD TEST-MFLEX") and ([OUT_QUANTITY] <> 0)),
        #"Added Custom" = Table.AddColumn(Filter_AlotHists, "Custom", each Table.SelectRows(Filter_WorkWeek, (table2Row) => [PROCESS_END_TIME] >= table2Row[WRWK_START_DATE] and [PROCESS_END_TIME] <= table2Row[WRWK_END_DATE])),
        #"Expand Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"WRWK_YEAR", "WORK_WEEK", "WRWK_START_DATE", "WRWK_END_DATE"}, {"WRWK_YEAR", "WORK_WEEK", "WRWK_START_DATE", "WRWK_END_DATE"}),
        Filter_AolotHists_byWeek = Table.SelectRows(#"Expand Custom", each ([WORK_WEEK] <> null)),
        SelectColumns_AolotHists = Table.SelectColumns(Filter_AolotHists_byWeek,{"ALOT_NUMBER", "STEP_NAME", "PROCESS_START_TIME", "PROCESS_END_TIME", "START_QUANTITY", "OUT_QUANTITY", "REJECT_QUANTITY", "WRWK_FISCAL_YEAR", "WRWK_WORK_WEEK_NO"}),
        Filter_Devices= Table.SelectRows(DEVICES, each ([DEPARTMENT] = "TEST1")),
        SelectColumns_Devices = Table.SelectColumns(Filter_Devices,{"DEVC_NUMBER", "PCKG_CODE"}),
        Filter_AoLots = Table.SelectRows(AO_LOTS, each Text.Contains([DEVC_NUMBER], "MC09XS3400AFK") or Text.Contains([DEVC_NUMBER], "MC09XS3400AFKR2") or Text.Contains([DEVC_NUMBER], "MC10XS3412CHFK") or Text.Contains([DEVC_NUMBER], "MC10XS3412CHFKR2")),
        SelectColumns_AoLots = Table.SelectColumns(Filter_AoLots,{"ALOT_NUMBER", "DEVC_NUMBER", "TRACECODE", "WAFERLOTNUMBER"}),

        TableJoin = Table.Join(SelectColumns_AolotHists, "ALOT_NUMBER", Table.PrefixColumns(SelectColumns_AoLots, "AoLots"), "AoLots.ALOT_NUMBER"),
        TableJoin1 = Table.Join(TableJoin, "AoLots.DEVC_NUMBER", Table.PrefixColumns(SelectColumns_Devices, "Devices"), "Devices.DEVC_NUMBER")
    in
        TableJoin1

    Could you please give me some hints why it needs so long to process?

    Thanks.

    Wednesday, March 18, 2015 3:11 PM
  • After an initial look, I'd guess that this

       #"Added Custom" = Table.AddColumn(Filter_AlotHists, "Custom", each Table.SelectRows(Filter_WorkWeek, (table2Row) => [PROCESS_END_TIME] >= table2Row[WRWK_START_DATE] and [PROCESS_END_TIME] <= table2Row[WRWK_END_DATE])),

    is the biggest problem. We don't have well-defined patterns around theta joins, but I'm pretty sure that Table.AddColumn does not expect to be doing a foldable join. The best reference for this today is probably Chris Webb's blog entry at https://cwebbbi.wordpress.com/2014/06/04/join-conditions-in-power-query-part-2-events-in-progress-performance-and-query-folding/

    You might also find it helpful to turn on tracing and then look at the traces; I believe we'll write the actual SQL query into the trace log, and that should indicate where we gave up trying to fold.
    Thursday, April 2, 2015 4:50 PM