I don't like the default data load options in Power Query. By default, the load to worksheet is turned on, and load to the data model is turned off. TO my mind, this is the reversed of best practices. My main argument is here but in a nutshell, I have two reasons for this.
1. The maximum file size for Excel Services continues to be 10 MB. Yes, the model can be 250 MB, but that is only if the data is stored in the model. Any data stored in the worksheet is counted against that 10 MB limit. Unaware users could easily hit that limit, become frustrated, and abandon the product. A change to the defaults, or even some warning messages would go a long way to preventing this.
2. Excel limits the total number of rows to just over 1 million. By loading the data into the worksheet, you are limiting what Power Query, and the model is capable of. Again, the perception could easily be that the limitation is that of Power Query, not of Excel itself.
We need to get away from thinking of Excel as a data repository, and start thinking of it as a BI client. The current default load options do not promote this.
- Edited by John P. White MVPMVP Tuesday, November 12, 2013 3:15 AM
Thanks for your feedback and for the great blog post (this one and many others you've written about PQ!).
I agree with your comments about a different combination of Load Settings being a better default for the scenarios that you mention, however, the reason why the default Load Settings in Power Query are what they are is because we try to make simple things simple. For a user who imports less than a million rows of data into the workbook and has no plans to upload the workbook to Excel Services, or furthermore, not use any other tool besides simple Excel capabilities (like charts, etc.), the current settings provide all they need.
For a more advanced user, it is possible to modify these Load Settings on a per-query basis, and we are also looking at ways to enable users to overwrite that default for all their queries, so they don't need to change them on each query.
Thanks Miguel! I understand what you mean about the simple things simple. What's the point of messing with data models if you only have 20 rows of data? It's an unnecessary complexity. Saved defaults will go a long way for power users (Yay!), but my concern is that of awareness.
When we have the more complex, or more accurately, large amounts of data, and we treat them simply, we run into trouble. What I think would be ideal is then when the query exceeds a certain size, maybe the user is presented with an instructional dialog that informs them of the issue, and ideally offers to put the data in the model instead. No idea what the level of effort will be, but I think that it would help with adoption.
In the meantime, I'll be going on a crusade about data loading best practices....