none
Load first then filter? RRS feed

  • Question

  • Hi,

    Each time when I load the table that is sourced from the Table.Join operation of two tables that have been filtered in the query editor, it seems that the power query doesn't load the filtered section directly, but load all the data first then do the filtering. For instance, I get a 20,0000 rows table and filter it to 6000+ rows. The other table is also about 6000 rows. I do a join for these two tables. It's supposed to get a 6000+ rows table directly, but the fact is Power Query will ignore the filtering and load the data first then filter the data at last. The result is correct but the process takes too much time.  I don't know whether I understand this correctly or not.

    This matters because the data that I normally deal with can be more than ten thousands' rows. It will take too much time if the data are loaded first.

    Could anyone help me out here?

    Thanks.

    Regards,  

    Tuesday, January 6, 2015 1:05 AM

Answers

  • Hi Qilong,

    If you use a SQL statement directly, then PQ will not do any further optimizations over that query, meaning that all additional transformations will be done locally, including join operations against other tables in the same database. We could do some optimizations in theory, but typically people use this feature when they just want to do most operations directly in SQL. So if you want efficiently, I would suggest that you include all the major filtering that you need to do in the SQL statement or do the operations through the PQ UI or language, but not both.

    Hopefully this helps,
    Tristan

    • Marked as answer by Qilong L Friday, January 9, 2015 12:25 AM
    Thursday, January 8, 2015 2:42 AM
    Moderator

All replies

  • Hi Qilong,

    Not all expressible filters in the Power Query Formula Language will or can be optimized to the target data source's language. In those cases, PQ will run the filter locally.

    What data source are you using?
    What does your filter look like?

    Thanks!
    Tristan

    Tuesday, January 6, 2015 2:26 AM
    Moderator
  • Hi Tristan,

    I'm using the Teradata as the data source.

    Normally, I do two kinds of filtering. One is using the SQL statement while I was fetching the data from the DB. The other is using the pull-down button right next to the column name in the query editor to do the filtering. Both of these two approachs have the load problem I described.

    What kind of filters should I use to avoid the issue?

    Thanks.

    Regards,

    Qilong 

    Tuesday, January 6, 2015 3:26 AM
  • Could anybody please give me some hints to solve this?

    Thanks in advance.

    Regards,

    Thursday, January 8, 2015 12:53 AM
  • Hi Qilong,

    If you use a SQL statement directly, then PQ will not do any further optimizations over that query, meaning that all additional transformations will be done locally, including join operations against other tables in the same database. We could do some optimizations in theory, but typically people use this feature when they just want to do most operations directly in SQL. So if you want efficiently, I would suggest that you include all the major filtering that you need to do in the SQL statement or do the operations through the PQ UI or language, but not both.

    Hopefully this helps,
    Tristan

    • Marked as answer by Qilong L Friday, January 9, 2015 12:25 AM
    Thursday, January 8, 2015 2:42 AM
    Moderator
  • Hi Tristan,

    Thanks so much for giving me such a great help. Now I know what's the root cause for this problem.  

    Regards,

    Qilong

    Friday, January 9, 2015 12:25 AM