none
Processing 50 million records and beyond RRS feed

  • Question

  • I have multiple CSV files with a total of 50 million records, and I cannot get them into Power BI.
    (I've checked if something is wrong with the files, and it's not the case as I can load the smallest one with 4.5 million records - it's structured identically as the other 4 bigger files).

    Is there a limit to how much records Power BI can handle?

    If so, is there a way around it?

    Thanks.


    Rafael Knuth


    Sunday, May 13, 2018 7:32 AM

Answers

  • Here's the function that I was referring to. Try it out. It should make things faster when you're dealing with binaries.

    Any file is basically just a binary that gets passed to a "Translation" function such as Excel.Workbook, Csv.Document, Json.Document and others.

    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:29 AM
    Monday, May 14, 2018 10:18 PM
  • Hey,

    Are you receiving any sort of error message? if yes, could you please share your error message with us?

    Since you're using Power BI Desktop, when you load your data you're loading it directly to the Data Model which uses your RAM. Monitor your RAM usage to see if you're hitting the limits of your computer.

    Also, are those CSV files located locally on a shared network? are they on your hard drive? if they are in your hard drive, I'd highly recommend that you use an SSD. Beyond that, it might be a good idea to buffer your binaries before combining them.

    • Marked as answer by Rafael Knuth Monday, May 14, 2018 8:42 PM
    Monday, May 14, 2018 6:25 PM
  • It really depends on your dataset, but if you're having issues with 8GB then you'll need to upgrade to more RAM until it satisfies your requirements. The more RAM the better - I personally have 32GB on my desktop, 16GB on my work laptop and 8GB on my tablet.

    GPU shouldn't be used by Power BI Desktop - are you getting any readings on your GPU when using Power BI Desktop? that might be a bug.

    For the buffering part, please check the reply that I did to anthony34

    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:30 AM
    Monday, May 14, 2018 10:21 PM
  • The When is very subjective as it relies on your available resources, but the situation that Rafael is facing is one.

    The How is simply connecting to the folder that has all of the files and creating a new custom column with a formula like =Binary.Buffer([Content]) 

    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:11 PM
    Tuesday, May 15, 2018 3:26 PM
  • Honesty, I still do not understand in which situation to use it.
    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:11 PM
    Tuesday, May 15, 2018 4:00 PM
  • Honesty, I still do not understand in which situation to use it.
    the best way would be testing a scenario similar to the one that Rafael provided and doing a simple comparison of "without Binary.Buffer" vs "with Binary.Buffer". I highly encourage you to give that a test and let us know your findings.
    • Marked as answer by Rafael Knuth Wednesday, May 16, 2018 9:26 AM
    Wednesday, May 16, 2018 4:13 AM
  • Miguel, the solution I am presently working on is the following: Set up a SQL server on Azure and then acces the data via Power BI (preview mode) and then create measures in DAX. Do you have any experience with that type of solutions? I am currently discussing multiple billion record scenarios with a customer. Just trying to understand how far I can push the envelope with Power BI in conjunction with adjacent technologies.

    Rafael Knuth



    I'd probably create a new thread on the Power BI forum to get into the details of the Data Model and how much data it can hold and what would be the best architecture for your scenario. Create that new thread in the Power BI forum with as many details as possible so that more people can leave their feedback on what would be a good architecture for a solution over Power BI.
    • Marked as answer by Rafael Knuth Wednesday, May 16, 2018 9:26 AM
    Wednesday, May 16, 2018 4:19 AM

All replies

  • Hey,

    Are you receiving any sort of error message? if yes, could you please share your error message with us?

    Since you're using Power BI Desktop, when you load your data you're loading it directly to the Data Model which uses your RAM. Monitor your RAM usage to see if you're hitting the limits of your computer.

    Also, are those CSV files located locally on a shared network? are they on your hard drive? if they are in your hard drive, I'd highly recommend that you use an SSD. Beyond that, it might be a good idea to buffer your binaries before combining them.

    • Marked as answer by Rafael Knuth Monday, May 14, 2018 8:42 PM
    Monday, May 14, 2018 6:25 PM
  • Could you please detail what you mean by: "it might be a good idea to buffer your binaries before combining them"

    • Edited by anthony34 Tuesday, May 15, 2018 5:55 AM
    Monday, May 14, 2018 7:42 PM
  • I was getting a performance related error message - the machine I am currently using has 8GB RAM and a SATA disk. My files are actually stored on my hard drive. Any recommendations RAM performance wise? GPU? Also, can you please elaborate on buffering binaries? Thank you.

    Rafael Knuth

    Monday, May 14, 2018 8:54 PM
  • Here's the function that I was referring to. Try it out. It should make things faster when you're dealing with binaries.

    Any file is basically just a binary that gets passed to a "Translation" function such as Excel.Workbook, Csv.Document, Json.Document and others.

    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:29 AM
    Monday, May 14, 2018 10:18 PM
  • It really depends on your dataset, but if you're having issues with 8GB then you'll need to upgrade to more RAM until it satisfies your requirements. The more RAM the better - I personally have 32GB on my desktop, 16GB on my work laptop and 8GB on my tablet.

    GPU shouldn't be used by Power BI Desktop - are you getting any readings on your GPU when using Power BI Desktop? that might be a bug.

    For the buffering part, please check the reply that I did to anthony34

    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:30 AM
    Monday, May 14, 2018 10:21 PM
  • Unfortunately the official documentation is very frustrating. The details that I would like to know is when and how to use it.

    It would be very kind of you if  you could share some example of M code using Binary.Buffer where using it really makes a speed difference.

    • Edited by anthony34 Tuesday, May 15, 2018 6:03 AM
    Tuesday, May 15, 2018 5:54 AM
  • The When is very subjective as it relies on your available resources, but the situation that Rafael is facing is one.

    The How is simply connecting to the folder that has all of the files and creating a new custom column with a formula like =Binary.Buffer([Content]) 

    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:11 PM
    Tuesday, May 15, 2018 3:26 PM
  • Honesty, I still do not understand in which situation to use it.
    • Marked as answer by Rafael Knuth Tuesday, May 15, 2018 7:11 PM
    Tuesday, May 15, 2018 4:00 PM
  • Miguel, the solution I am presently working on is the following: Set up a SQL server on Azure and then acces the data via Power BI (preview mode) and then create measures in DAX. Do you have any experience with that type of solutions? I am currently discussing multiple billion record scenarios with a customer. Just trying to understand how far I can push the envelope with Power BI in conjunction with adjacent technologies.

    Rafael Knuth



    Tuesday, May 15, 2018 7:14 PM
  • Honesty, I still do not understand in which situation to use it.
    the best way would be testing a scenario similar to the one that Rafael provided and doing a simple comparison of "without Binary.Buffer" vs "with Binary.Buffer". I highly encourage you to give that a test and let us know your findings.
    • Marked as answer by Rafael Knuth Wednesday, May 16, 2018 9:26 AM
    Wednesday, May 16, 2018 4:13 AM
  • Miguel, the solution I am presently working on is the following: Set up a SQL server on Azure and then acces the data via Power BI (preview mode) and then create measures in DAX. Do you have any experience with that type of solutions? I am currently discussing multiple billion record scenarios with a customer. Just trying to understand how far I can push the envelope with Power BI in conjunction with adjacent technologies.

    Rafael Knuth



    I'd probably create a new thread on the Power BI forum to get into the details of the Data Model and how much data it can hold and what would be the best architecture for your scenario. Create that new thread in the Power BI forum with as many details as possible so that more people can leave their feedback on what would be a good architecture for a solution over Power BI.
    • Marked as answer by Rafael Knuth Wednesday, May 16, 2018 9:26 AM
    Wednesday, May 16, 2018 4:19 AM
  • thanks, will do!

    Rafael Knuth

    Wednesday, May 16, 2018 9:26 AM