none
How to paginate requests for large data sets in OData from PowerQuery? RRS feed

  • Question

  • I have an OData feed that contains a number of large tables (tens of millions of rows). I need to configure PowerQuery (or PowerPivot, whichever is the best tool for the job) to access this OData feed, but to do so in a paginated way so that a single request doesn't try to return 10 million rows all at once, but instead builds up the complete result of tens of millions of rows with multiple paginated queries. I don't want to have to manually submit many different URLs with different values of $top and $skip each time to do my own manual pagination, instead I need PowerQuery or PowerPivot to handle the pagination for me.

    I was hoping that PQ/PP would just be smart enough to do pagination, perhaps by first issuing a "count" query to determine how many rows are present, but this appears not to be the case. When I give PQ/PP a URL to a large OData table, it just issues a query to retrieve all rows (actually, it issues 2 such identical queries, which seems odd), which crashes the DB on the server.

    In searching for an answer, I've seen hints that PQ/PP can do pagination, but no clue as to how to enable this behavior. So is there a way to tell PQ/PP to use some kind of pagination to access large data sets? If so, can I set the page size?

    Wednesday, January 7, 2015 5:02 PM

Answers

  • Our assumption is that the server has already picked a paging strategy which is optimal for it. A server which crashes instead of doing server-side paging for large requests is arguably a poorly-implemented server. That said, we have previously considered allowing some kind of manual control over client-side paging, so it's not impossible that this will be implemented in the future. But we currently have no specific plans to do so.

    The reason for the two requests is complicated to explain, and I won't try right now. If you navigate to the feed from the service document inside of Power Query instead of using the URL for the entity set directly, you'll only get a single request. And even if we did implement client-side paging control, putting in the URL for the entity set might well continue to result in a single unpaged request before any of the paged requests happen.

    Wednesday, January 7, 2015 5:11 PM
  • Power Query looks for the next link. That's required by the OData spec; we wouldn't be compliant with the spec if we didn't follow the "next" links.
    Thursday, January 8, 2015 5:38 PM

All replies

  • Our assumption is that the server has already picked a paging strategy which is optimal for it. A server which crashes instead of doing server-side paging for large requests is arguably a poorly-implemented server. That said, we have previously considered allowing some kind of manual control over client-side paging, so it's not impossible that this will be implemented in the future. But we currently have no specific plans to do so.

    The reason for the two requests is complicated to explain, and I won't try right now. If you navigate to the feed from the service document inside of Power Query instead of using the URL for the entity set directly, you'll only get a single request. And even if we did implement client-side paging control, putting in the URL for the entity set might well continue to result in a single unpaged request before any of the paged requests happen.

    Wednesday, January 7, 2015 5:11 PM
  • Thanks for the reply Curt.  I guess then what I need to know is this:

    I can easily implement paging in the OData feed, but if I only return say 1000 rows for a request that should return millions of rows, how does Power Query know to issue more queries to retrieve the remaining rows?  Does the reply from the OData feed have to contain some flag that says "Issue more queries to retrieve the remaining rows?"

    Some background: my OData server implementation is done in PHP using the POData library (which was a fork of MS's original OData PHP Producer Library). 

    Wednesday, January 7, 2015 5:18 PM
  • To make the question more focussed: Does PowerQuery/PowerPivot/Power BI look for the "odata.nextLink" element as the cue that there are more records to be retrieved, or does it look for the "odata.count" element that gives the total number of records for the query as the cue that there may be more records to be retrieved?
    Thursday, January 8, 2015 4:02 PM
  • Power Query looks for the next link. That's required by the OData spec; we wouldn't be compliant with the spec if we didn't follow the "next" links.
    Thursday, January 8, 2015 5:38 PM