Practical Limits for Excel Get Data and Transform Queries RRS feed

  • General discussion

  • What machine would you pick (what are its characteristics) to do Queries on data sets of ~850,000 records of 4-fields each?

    Is Excel the best tool for this kind of problem?  If not, what would you use?

    I have an HP Envy, 16 GB memory, I7Core processor, 1 TB internal hard drive and 1 TB external USB drive. Windows 10, MS Office Subscription.

    The progress indicator in the lower right corner of Power query editor shows that the two-files that underlie the query have both read more than 1.2 GB from each and the process has been running more than an hour.  The current opperation is a conditional, indexed operation.

    Appreciate your thought and advice.


    Wednesday, November 28, 2018 1:29 PM

All replies

  • Might want to give more details on exactly what kind of operations you are doing and where your data sets are coming from? Just yesterday I setup a query doing a merge, adding a conditional column, and then aggregating between 1 million record and 200k record data sets coming from .csv's and it takes about 15 seconds to run, so I don't think the number of records itself is necessarily the issue.
    Wednesday, November 28, 2018 5:12 PM
  • I'm working with voter data in delimited text file format.  A new set of data shows up every month.  The data consists of two-files: registration infor with about 850,000 records of 36 fields, and voting history which has more than a million records of 5-fields.  

    My initial problem is to compare registration data sets pair-wise, to determine the changes in party affiliation from one month to the next.  Two files are select from the folder containing registration data using folder approach and then appended.

    I carry the file ID to the records then sort. From what Ihren told me a while back, the sorting is probably the killer. 

    I'll remove the sort and group/aggregate first.


    Wednesday, November 28, 2018 6:14 PM