none
How to only load fresh records, and leave previous rows unchanged RRS feed

  • Question

  • Hi guys,

       I have a scenario that might be a bit odd, but bear with me here.

    In my job, I have to provide forecasts for call volume per day at least 6 weeks in to the future. These forecasts are recorded in a system that then stores to an SQL table.  I can retrieve the forecasts from the table via PQ, and that's all fine. Where it gets tricky is that I can forecast as far out as I like, and I can adjust forecasts even in the current week, however I am measured on my performance for how the 6 week out forecast matches actual performance on the day.

    So, what I want to do is retrieve all the data, but if the date is <6 weeks from today, don't refresh those rows

    that way everything 5 weeks away or less (or in the past) will be my "locked down" 6 week forecast, but everything from week 6 onwards will still be editable.

    I know how to only retrieve the data if the date is >6 weeks away, but not how to store that and not update it..

    Any help would be greatly appreciated.

    Thanks,


    Adam

    Wednesday, May 17, 2017 2:50 AM

Answers

  • Hi Adam,

    I reviewed my queries and overlooked something:
    -Query1 is to start the history file and create an excel table.
    -You need to create a query1b that has as source the table that results from query1 (history file).
    -In the append in query2 you need to append query1b (the history file).
    -This way you only refresh query2 (getting data from sql) and query1b (getting data from the excel table).

    I don't understand your second question. You keep a history file and you can update this with new records from the SQL database. In the case you described you want some kind of moving history file (freeze data that is >X-weeks); this must be possible with some filtering? I need to see the case in more detail how to solve this.

    Keyfield : the unique identifier of each record. We first append the data from the SQL database to the history file. To filter out the duplicate data we need to sort the data on the column that is the keyfield for that record and sort on the query date.

    I hope you can apply this in you case.

    Greetings,

    Dirk.

    Thursday, May 18, 2017 6:33 AM
  • I've done something very similar, with a cool twist: I read in the history table one time and output it to a table. I then changed the source of that query to the table it outputs. That way I have a history table that can update itself.
    Friday, June 9, 2017 4:28 AM

All replies

  • Hi Adam,

    We don't have any built-in way to do this today. Each refresh pulls (and replaces) all the data.

    You could, however, copy the loaded results into a separate worksheet so that you can compare them with the newly refreshed results at a later time.

    Ehren

    Wednesday, May 17, 2017 6:53 PM
    Owner
  • Hi Adam,

    We don't have any built-in way to do this today. Each refresh pulls (and replaces) all the data.

    You could, however, copy the loaded results into a separate worksheet so that you can compare them with the newly refreshed results at a later time.

    Ehren


    Thanks Ehren - I'm trying to avoid the need to manually archive data.. :/
    Wednesday, May 17, 2017 10:38 PM
  • That sounds as if once you make a change, then the existing value in the database will be overwritten? Is that really the case?


    Imke Feldmann
    MVP Data Platform
    TheBIccountant.com

    Thursday, May 18, 2017 5:21 AM
    Moderator
  • That sounds as if once you make a change, then the existing value in the database will be overwritten? Is that really the case?


    Imke Feldmann
    MVP Data Platform
    TheBIccountant.com


    That's 100% correct. any changes I make reflect in the database overnight
    Thursday, May 18, 2017 5:29 AM
  • Then this approach could actually be sth for you: http://www.thebiccountant.com/2016/02/09/how-to-create-a-load-history-or-load-log-in-power-query-or-power-bi/

    You take that to create your "data to be stored" and create a 2nd query that pull only the data from the SQL-source that is >6 weeks. Append both to have a full view for your projection.

    In a 3rd query you could pull the full data from the server again and compare them to the original ones stored in your xls (don't forget backups - you have your original values only in there ;-) )


    Imke Feldmann
    MVP Data Platform
    TheBIccountant.com

    Thursday, May 18, 2017 5:40 AM
    Moderator
  • Hi Adam,

    I had encountered a simular issue and solved it like this:

    1/ Query1 retrieves the columns from your SQL

    - add a column to this table with the date & time of the query  - use DateTime.LocalNow()

    2/ Query2 is to update your table in Query1:

    - same query as query1 to get all the data from SQL

    - add a column to this table with the date & time of the query - use DateTime.LocalNow()

    - append query2 to query1

    - sort on the keyfield + querydate column ascending

    - remove duplicates (power query will keep first line - oldest in this case)

    - now you have updated the table without changing the lines that were downloaded previously

    BEWARE that a bug in power query may require you to buffer the sorted rows before you remove the duplicates: simply add the line in M :

       Buffered = Table.Buffer(#"Sorted Rows"),

    #"Removed Duplicates" = Table.Distinct(Buffered, {"Recordkey"}),

    I hope this works for you.

    Dirk.

    Thursday, May 18, 2017 5:45 AM
  • Hey Dirk,

      Thanks for this. 3 questions:

    Won't both queries just refresh when I refresh one of them?

    If the intent is to never refresh one of the queried (the history) then won't that mean that it only works for 1 week, then next week i'll have a gap in the data?

    What do you mean by keyfield?

    Thanks,


    Adam

    Thursday, May 18, 2017 5:56 AM
  • Hi Adam,

    I reviewed my queries and overlooked something:
    -Query1 is to start the history file and create an excel table.
    -You need to create a query1b that has as source the table that results from query1 (history file).
    -In the append in query2 you need to append query1b (the history file).
    -This way you only refresh query2 (getting data from sql) and query1b (getting data from the excel table).

    I don't understand your second question. You keep a history file and you can update this with new records from the SQL database. In the case you described you want some kind of moving history file (freeze data that is >X-weeks); this must be possible with some filtering? I need to see the case in more detail how to solve this.

    Keyfield : the unique identifier of each record. We first append the data from the SQL database to the history file. To filter out the duplicate data we need to sort the data on the column that is the keyfield for that record and sort on the query date.

    I hope you can apply this in you case.

    Greetings,

    Dirk.

    Thursday, May 18, 2017 6:33 AM
  • I've done something very similar, with a cool twist: I read in the history table one time and output it to a table. I then changed the source of that query to the table it outputs. That way I have a history table that can update itself.
    Friday, June 9, 2017 4:28 AM
  • Hi Adam,

    have you been able to get this going? If yes, please mark the answers that helped you to help other people with the same problem.

    Thx!


    Imke Feldmann
    MVP Data Platform
    TheBIccountant.com

    Sunday, June 25, 2017 5:44 AM
    Moderator
  • Hey Imke - None of the solutions really met my needs.

    Thanks,

    Adam

    Sunday, June 25, 2017 10:13 PM
  • Then we might not have understood your request correctly.

    If you would tell us how the results from our suggestions differ from what you actually need, we might be able to help you further.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Sunday, July 2, 2017 6:02 AM
    Moderator
  • Did this bug get resolved?
    Thursday, February 1, 2018 12:14 PM
  • Hi Jerry,

    there is no bug in this thread so far: By design, Power Query queries will overwrite their previous results. So if you want to "keep" results who will not be returned by the next refresh, you have to store them somewhere else (and append them to the next refresh using any of the techniques described here).

    Microsoft plans to offer incremental load as a feature within Power BI premium shortly (but not in Excel to my knowledge).


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Saturday, February 3, 2018 9:24 PM
    Moderator
  • Hello,

    I am trying to follow your instructions but not succeeding - sadly. I very much need to follow this through.In creating Query 1, Query1b and Query2, I am getting the same data all over. The question is where do these individual queries load to? To the actual source or no connection?

    I am trying to link a Google spreadsheet to Excel. I can open the link via the Web and add the DateTtime column - but after that I am not getting anywhere with the queries.

    Please help.

    Sunday, March 4, 2018 7:33 PM
  • I replied to Dirk's post above - I hope you will be able to help.

    Many thanks.

    Sunday, March 4, 2018 7:34 PM
  • I have taken the ideas presented in this thread and documented a working solution.

    https://blog.jamesbayley.com/2018/04/23/power-query-how-to-load-only-fresh-rows-and-create-an-excel-history-table/

    • Proposed as answer by LlamaLlamaLamp Wednesday, November 6, 2019 11:47 AM
    Monday, April 23, 2018 6:27 PM
  • Awesome, thanks James. Thanks all for the discussion and ideas, but I was having trouble getting it up and running. James' step-by-step was at my level!
    Saturday, December 15, 2018 7:18 AM