Best way to use R for 8 GB dataset RRS feed

  • Question

  • Hi,

    what is the most efficient way to use R to analyse data , do EDA and apply ML models using R on a 8 GB Dataset. Im using a system with 8 GB RAM as of now but its a problem when trying to use around 80% of the data to fit the model. Would MRS help here?

    If it will help, what is the kind of IT infra i would need for this?



    Tuesday, April 4, 2017 9:53 AM

All replies

  • Yes, MRS would definitely help here. You will always need quite a bit more memory than the raw size of the data file on disk to analyze and process your data, because R makes copies of the data in memory.

    If you are using MRS you can work with the XDF file format and the RevoScaleR package and R functions. For a 8GB sized file on disk you can expect need between 25 and 50 GB of memory if you are working with dataframes to analyze the data in R with standard techniques, depending on the functions you are using.

    XDF files are processed in chunks row-wise, so you won't need as much memory.

    Stephen Weller

    Microsoft R Product Team

    Wednesday, April 5, 2017 3:27 PM