locked
Performance/Speed : Merge vs Custom Function RRS feed

  • Question

  • Hi PowerBI Users,

    Context/Scenario : I import Salesforce opportunities and some customers have several opportunities for a same product. In order to obtain only one "stage" for the customer-product combination, I need to aggregate the stages of the different opportunities for one customers on one specific product.

    The steps of my query is the following :

    1. Convert each stage (text) to a number (ie Closed/Lost => 0 {...} Closed/Win => 10)
    2. Group by customer-product with a min/max function on the number which represents the opportunity stage
    3. Convert the new number to the stage (text)

    Example : one customers which is Closed/Won and Closed/Lost will appear as Closed/Won

    My question is to know which is the best way to do steps 1 and 3, ie. a "mapping". I use two approaches in my models which are (1) Merging my data set with a mapping table and (2) Use a custom function which execute the conversion. My concern is to know which one is the fastest.

    Note that it concerns mapping table for which the cardinality is low (5 to 20 rows max).

    Note that the function executes "easy" operations as returning an element of a list ( List.PositionOf() or List{parameter} ).

    Note : I can't do query folding


    My only concern here is performance (as it is for everybody I guess) so my question is the following : which of the two approaches is the fastest ? I have done some tests and it seems that using a function is a little bit / hardly faster.

    Second question : is there a better solution you could think about to perform these kind of task ?

    Extra question : how do you test query performance in Power Query (PowerBI) ?


    Thank you in advance for your answers !


    Tuesday, September 4, 2018 6:05 PM

Answers

  • In general terms merging (joins) are faster than any other approach.

    Sadly, there is no way to test query performance in Power Query. It's all trial and error - you'd need to time your queries (somehow) and then figure out if the performance is bad because of your connection to the datasource, your local resources, or because somehow your query is not created in the most optimal fashion. It's more an art than anything else

    Wednesday, September 5, 2018 10:45 PM
  • Actually, Chris Webb found a tool to test performance query. It really helps a lot to know what is read and how much time it takes. Check Chris Webb's blog:

    Using Process Monitor To Troubleshoot Power Query Performance Issues With File Data Sources

    Using Process Monitor To Find Out How Much Data Power Query Reads From A File
    Comparing The Performance Of CSV And Excel Data Sources In Power Query


    Thursday, September 6, 2018 9:33 AM
  • Usually, and this is just based on my personal experience with REST APIs and Custom Connectors, it's best to rely on Fiddler to see what gets sent to the REST API and if, perhaps, PQ is making multiple duplicate calls that are totally unnecessary. That's usually a culprit of performance when connecting to a REST API and there are some tradeoffs that you need to do with M and DAX as I talk about it here. Looking at just the local resources might not give you the full picture and might make you work on areas that you shouldn't, but it sounds like a good start for local files. SaaS and databases should use their own monitoring tools like SQL Server Profiler and the developer monitor for API of something like SalesForce.
    Thursday, September 6, 2018 12:48 PM

All replies

  • In general terms merging (joins) are faster than any other approach.

    Sadly, there is no way to test query performance in Power Query. It's all trial and error - you'd need to time your queries (somehow) and then figure out if the performance is bad because of your connection to the datasource, your local resources, or because somehow your query is not created in the most optimal fashion. It's more an art than anything else

    Wednesday, September 5, 2018 10:45 PM
  • Actually, Chris Webb found a tool to test performance query. It really helps a lot to know what is read and how much time it takes. Check Chris Webb's blog:

    Using Process Monitor To Troubleshoot Power Query Performance Issues With File Data Sources

    Using Process Monitor To Find Out How Much Data Power Query Reads From A File
    Comparing The Performance Of CSV And Excel Data Sources In Power Query


    Thursday, September 6, 2018 9:33 AM
  • Actually, Chris Webb found a tool to test performance query. It really helps a lot to know what is read and how much time it takes. Check Chris Webb's blog:

    Using Process Monitor To Troubleshoot Power Query Performance Issues With File Data Sources

    Using Process Monitor To Find Out How Much Data Power Query Reads From A File
    Comparing The Performance Of CSV And Excel Data Sources In Power Query


    Sadly, in the context of the scenario described by the OP, there are a few variables that you can't really monitor using the Process monitor as they rely on network related variables and more (related to SalesForce).

    You also don't have the tools to monitor the performance or lag times of each step in your query, so while even the task manager can you give you some of your local resources being used, some of the processes sometimes also occur at the origin, not to mention that if you're doing things locally then it also rely on things like what type of hard drive you're using (ssd or hybrid) and similar things. It's always just a trial and error - other tools do provide a full waterfall view of the query time for each step/task and that's where we should be aiming for.

    Thursday, September 6, 2018 11:11 AM
  • Usually, and this is just based on my personal experience with REST APIs and Custom Connectors, it's best to rely on Fiddler to see what gets sent to the REST API and if, perhaps, PQ is making multiple duplicate calls that are totally unnecessary. That's usually a culprit of performance when connecting to a REST API and there are some tradeoffs that you need to do with M and DAX as I talk about it here. Looking at just the local resources might not give you the full picture and might make you work on areas that you shouldn't, but it sounds like a good start for local files. SaaS and databases should use their own monitoring tools like SQL Server Profiler and the developer monitor for API of something like SalesForce.
    Thursday, September 6, 2018 12:48 PM