none
CSV Parsing repeat RRS feed

  • Question

  • I'm using Excel version of Power Query to parse big CSV-files properly, combine them and load columns that I need and skip everything else.

    Almost any action with it cause PQ to parse CSV-file again and again and this slow work a lot. How can I set PQ to Parse CSV-File once?

    Tuesday, September 19, 2017 2:53 PM

Answers

  • My suggestion won't prevent re-parsing the file, but it should cause PQ to only pull the first N rows instead of the entire file (unless you're doing an upstream operation that involves scanning the entire file). Sharing your formulas would allow us to see if there's something else possibly causing the slowness.

    Ehren

    Monday, October 23, 2017 7:20 PM
    Owner

All replies

  • Hi there. While there's no way currently to cache a query result, you may be able to improve performance by using Keep Rows to keep just the top N rows (10, 100, whatever makes sense). Do your transformations on top of that, and then remove the Keep Rows step when you're done to apply the steps to the full data set.

    Does that help?

    Ehren

    Friday, September 22, 2017 10:01 PM
    Owner
  • Sometimes also deselecting "Enable background refresh" helps:


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Sunday, October 8, 2017 6:50 AM
    Moderator
  • It didn't work in most scenarios I'm working when I need to filter some part of data and work with it.
    Monday, October 9, 2017 8:38 PM
  • I'll try but so far it doesn't. It still parsing each time I making some action with filtered data
    Monday, October 9, 2017 8:39 PM
  • My suggestion won't prevent re-parsing the file, but it should cause PQ to only pull the first N rows instead of the entire file (unless you're doing an upstream operation that involves scanning the entire file). Sharing your formulas would allow us to see if there's something else possibly causing the slowness.

    Ehren

    Monday, October 23, 2017 7:20 PM
    Owner
  • The one way to work with a big file is to cut it's tail. I just open it in Notepad++ and cut it after 20k lines. In most cases it's enought to work with but doesn't take much time.

    After you debug your scripts on demo-file, change the source to real data.

    I used this approach for 380Mb files on last PowerBI hackathon two days ago. It really works )


    ХЗ

    Monday, October 30, 2017 5:15 AM
  • But I need to be able to filter some of the data and then working with that part without parsing it. And your trick doesn't work in that case
    Monday, October 30, 2017 4:31 PM
  • It's not a solution to the problem. It often when I'm need to filter data by some parameters  and then work with them without each step cause to parse entire file that can be up to 1GB
    Monday, October 30, 2017 4:33 PM
  • If you need to see all data in Power Query editor, actually the only way to do it fast is to put your big CSV in some Database, where you can use query folding as an option.

    When you add a new step or edit existed one, Power Query need to generate a new preview for this step (even if you set off backgroud data load for preview), and, probably, for the next ones. There are no ways to say Power Query to not load preview at all, except the only one: you wrote your code in editor without manipulating with steps interface.

    Also it depends on how much queries, refreshed by preview generation, refers to the source. There was a good posts about it (by Ken Puls, for example), and also the topic there at Technet


    Maxim Zelensky Excel Inside

    Tuesday, October 31, 2017 1:49 PM
  • If you need to see all data in Power Query editor, actually the only way to do it fast is to put your big CSV in some Database, where you can use query folding as an option.

    When you add a new step or edit existed one, Power Query need to generate a new preview for this step (even if you set off backgroud data load for preview), and, probably, for the next ones. There are no ways to say Power Query to not load preview at all, except the only one: you wrote your code in editor without manipulating with steps interface.

    Also it depends on how much queries, refreshed by preview generation, refers to the source. There was a good posts about it (by Ken Puls, for example), and also the topic there at Technet


    Maxim Zelensky Excel Inside

    Other than Ehren's first response (i.e. working with a subset of the data), something along the lines of what Maxim has mentioned is the only other option; Power Query will still need to update the data preview by running through the applied steps. 

    Regards,

    Michael Amadi

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful :)

    Website: http://www.nimblelearn.com, Twitter: @nimblelearn

    Tuesday, February 20, 2018 12:44 AM
    Moderator