locked
Incremental or scheduled power query refresh RRS feed

  • Question

  • I have a power pivot model now linked to 3 largish power queries.  This is transnational data going back about a year. The a main table is around 10mio rows with 2 supplemental tables of 1mio and 4mio respectively.  The 2 other tables don't have a 1-to-1 relationship with the main table. However hitting refresh to get a new days data reloads the whole thing and takes about 45 mins to complete recalculating all the pivots.  Loading the data seems to take most of the time.  So my questions are:

    1. Is it possible to do an incremental data load since the existing data is not changing so a huge waste of processing power?  I have read some method for doing this, but only by loading the PQ into the workbook which would obviously be impossible for 10mio rows. 
    2. If this isn't possible can the workbook be put on a timer to refresh at night?  I have read something about doing this within Power BI, but I am struggling to make use of Power BI as it stands. Although I can load my data in PBI the lack of pivot tables makes it pretty useless for the reports I'm using.  I would be able to use it if i could "connect" the workbook and see the pivot tables in excel online, but for some reason, despite my workbook meeting the size criteria, the slicers throw an error. I have this out as an error with MS tech support.  So i assume the Power BI refresh only refreshes what is loded up and not the actual workbook?

    Thanks for any advice

    Mike 

    Thursday, December 3, 2015 4:17 PM

Answers

  • Hi Mike,

    Thanks for your feedback.

    Currently, there is no support for incremental refresh in Power Query or Power BI Desktop (besides the workaround that you mention based on loading to the worksheet). Please vote for this feature in our Feature Suggestions forum if you consider it important. This will help us prioritize this feature based on customer demand.

    Feature Suggestions Forum: https://ideas.powerbi.com/forums/265200-power-bi/category/91861-desktop

    Regarding Scheduled Refresh options - if Power BI is not an option for you due to presentation limitations (such as lack of PivotTable support) you could explore other solutions. For instance, you could use the Excel Object Model (from VBA, PowerShell, etc.) to refresh connections within a workbook on a given schedule. You can either build this yourself or leverage other 3rd party solutions such as Power Update: http://www.powerpivotpro.com/2015/02/introducing-power-update/

    Hope this helps.

    Friday, December 4, 2015 6:13 PM

All replies

  • Hi Mike,

    Thanks for your feedback.

    Currently, there is no support for incremental refresh in Power Query or Power BI Desktop (besides the workaround that you mention based on loading to the worksheet). Please vote for this feature in our Feature Suggestions forum if you consider it important. This will help us prioritize this feature based on customer demand.

    Feature Suggestions Forum: https://ideas.powerbi.com/forums/265200-power-bi/category/91861-desktop

    Regarding Scheduled Refresh options - if Power BI is not an option for you due to presentation limitations (such as lack of PivotTable support) you could explore other solutions. For instance, you could use the Excel Object Model (from VBA, PowerShell, etc.) to refresh connections within a workbook on a given schedule. You can either build this yourself or leverage other 3rd party solutions such as Power Update: http://www.powerpivotpro.com/2015/02/introducing-power-update/

    Hope this helps.

    Friday, December 4, 2015 6:13 PM
  • Hi Mike,

    in addition to what Miguel suggested you could check if you can make Power Query's life easier (thereby improving transformation speed which is probably the bottleneck here) by reducing the number of columns that have to be dragged through:

    1. Normalization and/or if this isn’t an option:
    2. Vertical split of tables: Only take those columns through PQ that are needed for the transformation process (incl. the unique identifier) and connect it in the Power Pivot data model 1:1 with the other columns that you've quickly imported there directly. With average tables a direct load load of 10 Mio rows into PP shouldn’t take longer than 2 minutes, so huge part of the time must be PQ transforming. And this can be reduced if you reduce the amount of data being passed through.
    Also if there are transformations that address the same table multiple times (like in functions or references in added custom columns) a Table.Buffer could improve speed dramatically.

    Imke Feldmann TheBIccountant.com

    Saturday, December 5, 2015 7:33 AM
  • Hi Imke

    So I don't have any funky functions.  Basically my main table is structured like this

    Transaction table referenced from Reporting_Plus_Fact which has most of the relevant data. However this table has:

    • BusDateID instead of a date so i merged it with Reporting_BusDate to convert the ID to a real date
    • TimeID instead of time so merged with Reporting_Times to convert to a real time
    • LocationID is in another table, Reporting_Trans_Fact, that is related by TransID so also merged that

     In each of the above tables i removed all of the unnecessary columns before merging plus any rows that are not relevant.  You might recall you helped me with parameterising the Date and Location which I execute on the Reporting_BusDate and Reporting_Trans_fact i.e. before doing the merge. 

    Apart form that I renamed some columns and finally changed some data type before loading into PP.

    So the question is what is vertical split?  I think I have to do all these merges in order to get a single table with all the meaningful data. I could leave out the time merge and do a related table within PP if that is what you are suggesting?  I have to do the date merge in order to paramaterise, but guess I could do the conversion of dates into BusDateID in excel so again could drop that merge and converted in a related table.  The time table never changes, but I think the BusDate Table adds a new row every day so would have to refresh it each time anyway. 

    I didn't understand what you meant by "connect it in the Power Pivot data model 1:1 with the other columns that you've quickly imported there directly". Did you mean using the RELATED function instead of merging in PQ?

    Thanks

    Mike

    Saturday, December 5, 2015 11:00 AM
  • Hi Mike,

    sorry, missed your question. What I meant with vertical split was to split 1 table into 2: Drag the first part through PQ and the other part into PP directly. Both parts have to contain the unique key field that you use in PP to create a connection between these tables.

    This will cost query performance but could eventually gain loading performance. But if all the PQ excercise is about filtering the rows in order to reduce the number of rows in PP, this will actually not help you here. Sorry, didn't get that aspect before.

    But from what I understood from your model now, your problem might lie in query folding not happening: https://www.mssqltips.com/sqlservertip/3635/query-folding-in-power-query-to-improve-performance/

    This is probably due to multi-step merges. So the question is why you need all your "relevant" data in one table. The queries would probably be quite fast, if you would just apply the filters that reduce the number of rows to your tables , load them to PP and then "merge" the tables in PP like you probably did before?


    Imke Feldmann TheBIccountant.com

    Tuesday, December 8, 2015 8:30 PM
  • Hi Imke

    As part of this process I'm also trying to improve my powerpivot building as, bring self-taught, i have made lots of errors in my earlier models.  The 2 things i'm trying to avoid is high cardinality and having a snow-flake structure of relationships as understand these are the worst things to do for PP performance.

    In this particular case I was trying to create a single fact table that achieved the above hence all the merging.  I guess it all depends where you want the bottleneck, in the data loading or in the actual PP model. If you load data frequently then as you suggest it is better to build relationships within PP than merge in PQ.

    In this case the desire to parameterise the data to chose certain data ranges and locations means I have to merge 2 out of the 3 possible tables since they don't exist in the main fact table.   The 3rd table is a time ID where there are 30,000 relevant entries (one for every h:m:s) so I merged it to create an hour column of 8 values and minute column of 60 values.

    One thing that seems to be sowing things down is I have 3 queries to run. As far as i can tell there are 3 steps in the process "Retrieving", "Processing" and "Reading". I'm not sure what the middle one is, but last one is recalculating all the workbook. It seems that it runs the first query, processes it and reads it, then runs the 2nd query etc etc. This seems a massive waste of time as surely it should retrieve all the tables first then process and read just once.   I assume this is related to the awful feature in 2013 where it recalculates even if you change a date format. i use the OLAP Extension addin to disable all the pivot refreshes so maybe that woudl speed up the data loading.

    Thanks again for your advice

    Mike

    Wednesday, December 9, 2015 11:35 AM
  • Last idea is to create the view that does the merge in your SQL DB itself - again use PQ only to apply the dynamic date filter.


    Imke Feldmann TheBIccountant.com

    Wednesday, December 9, 2015 5:45 PM
  • Hi Imke

    Sadly the DB is run by the 3rd party software provider for the company and i can't even get them to tell me what the fields mean!!!  No chance I will get any help from them.   Anyway as always an education as need to keep in mind the issues around loading vs workbook size and speed.  

    Thnaks

    Mike 

    Thursday, December 10, 2015 12:59 PM