none
Architectual Questions - Excel Power Query using SharePoint Online Lists as data source. RRS feed

  • Question

  • I need to better understand the communications and query architecture of Excel Power Query and SharePoint Online Lists as a data source. NOTE:  Some of the source lists have 100,000+ rows.  There are Indexes on the list to allow indexed access to subsets of the rows.

    1. Based on the Power Query to SharePoint interface it would seem that I'm allowed to select List objects but not particular SharePoint Queries associated with those lists.  Is this true?
    2. I'm am allowed to filter the columns and rows after I have selected the List.  Question:  are these filters part of the query against the source SharePoint List or are they applied after the list is downloaded into Excel?
    3. Assuming the row and column filters are part of some sort of ODATA query against the list how can I control the order of the statements to ensure that indexed columns are used to initially find a subset of the data.
    4. The Row Filter only allows me to add two filters.  This is very restrictive.  For example,  if the column is a date column I may need to exclude nulls and then have an after and before date statement.  [Note: when I was not excluding nulls first I was getting errors??].  A solution to this nightmare would be to allow us to pick a list AND then pick a VIEW.
    5. If the Excel Spreadsheet is opened online then is the data update done server side or client side.  In other words, is the excel spreadsheet power query data updated before the excel spreadsheet is opened or after.  I'm hoping before so that the update is done once rather than once per user.

    NOTE: An alternative is using Odata connections.  BUT, that requires me to build the query string and I can't find a good example of what the XML would look like.  Of course, if the ODATA interface allowed me to pick a VIEW on a SharePoint List that would make things a lot easier.

    HELP!!!  


    Cheers, Savin Smith


    Friday, June 19, 2015 1:54 PM

Answers

  • Hi Christopher,

    For the null value problem, can you please send a frown with a repro for us to investigate further :)?

    The investigation you done for listdata looks great! We don't track issue with listdata though. As I mentioned before, we push operation for as much as we can, and if the server can't handle it we would do it locally. I won't be able to tell you exactly what's happening in you case because each server is different in how it behaves. However, we are always looking to improve the performance, we will be introducing support to the SP2013 APIs soon, which I believe would be a good start.

    If you have specific scenarios where you would like to see it become faster, please send a frown with your suggestions, examples with queries are much appreciated. It will allow us to prioritize it accordingly.


    Thanks, Hadeel

    Tuesday, July 7, 2015 10:45 PM

All replies

  • Regarding your #4, the Power Query graphical interface only allows for two filters, but if you look at the formula bar above you can see the code that gets generated by the graphical interface. Just copy whatever filter clause you need however many times you need it. See picture below for an illustration of what I mean.


    Friday, June 19, 2015 3:10 PM
  • Hi Christopher,

    Thanks for the details :) Couple of notes before I answer your questions above:

    - With SharePoint connector, we call into the OData APIs then we apply our own view on top of it to give users better experience than seeing the raw feeds.

    - Depending on the operation performed on the data we would decided if we can do it on the server side or not. Unfortunately we are still using the old 2010 APIs of SharePoint which puts some limitations on what we can do. However, we should be moving on to newer APIs soon. If you see an operation that took a while because we did it locally it means we couldn't push it to the server.

    - UI limitation doesn't mean the M query language limitation, we can edit the query as much as we would like to.

    To answer you questions:

    1. As I mentioned, we talk to the OData APIs of SharePoint to return data, if those queries are not exposed by the SharePoint APIs then you won't be able to access it from PQ. If however, you see them listed in your table but you get an error when you click on it, then please send a frown for us to investigate.

    2. Filters that we could push to the service will be pushed, the rest will be done locally. It mostly depends on the service capabilities and order of events.

    3. The operations are executed in the order you choose them to. If you do a filter on a column "Name" then order the column, we will filter first then order second. There are some exceptions to this rule for optimization purposes, but nothing you should worry about. As a general practice, always perform the operation that involved the index column first, that way it minimizes the data being further processed.

    4. As Tim said below, you could always edit the query it self to indicate all the filter criteria you want. However, if you see error because of null values, please send a frown for us to investigate.

    5. PowerQuery won't update the sheet unless asked to do so. It has to be per user because of different credentials needed for the refresh. If you would like to schedule refresh for your workbook and have an auto updateable view of it that you can share, then you can use powerbi.com where you can upload your workbook and set it to refresh every hour for example, you can then share your dashboard :)

    Last thing regarding having to build the query string for OData, that's not true, if you start from the API endpoint of your site, you will get the raw responses but we will build the string for you as you perform operations, in other words, push operations to the server.

    Hope that helps :)


    Thanks, Hadeel

    Tuesday, June 30, 2015 10:55 PM
  • Hadeel,

    Thank you for the response.  My testing so far suggests that all the filters are being performed client-side.  But I will do some more testing.

    I also understand that the filters need to have filters against indexed columns applied first. 

    Question:  Should I select the columns I want first or after I filter using an indexed column?

    I will test today on a table with 20,000 rows and 120 columns.  My initial test will be to select just 10 columns and to filter on modified date (indexed) with a value of on or after 6/30/2015.  This should reduce me to about 2,000 rows.

    In theory, This should reduce the download from its current 350MB to about 3-5MB.

    Later,


    Cheers, Savin Smith

    Wednesday, July 1, 2015 12:25 PM
  • Hadeel,

    I can confirm the null value problem with a date field.  The date field is also indexed.

    Steps:

    1. First Filter:  After or equal to 7/1/2015.   [Rule:  = Table.SelectRows(myTableName, each [ReceptionVisitDateTime] >= #datetime(2015, 7, 1, 0, 0, 0))
    2. Error message:

    Expression.Error: We cannot convert the value null to type Logical.
    Details:
        Value=
        Type=Type


    Cheers, Savin Smith

    Wednesday, July 1, 2015 12:46 PM
  • Hadeel,

    I don't think the filter is happening server side at least when I filter on Modified Date.

    1. Test List has 120 columns and 38,000 rows.Modified is an indexed column
    2. First Filter on Modified after or equal to 7/1/2015.
    3. Download progress indicated at least 500 MB.   Only about 25 rows met the condition.

    Notes:  The preview data comes down very quickly.  I'm guessing it is using a top filter. It brought down all 120 columns for 15 rows in under 3-4 seconds.

    SECOND TEST:

    1. First Filter.  Select 8 columns.   NOTE: Didn't see interaction with server but preview now has over 1,000 rows.
    2. Second Filter.  On indexed Modified column.  After or equal to 7/1/2015.
    3. Only returned 30 rows but the time it took was the same when downloading 500 MB.  30 rows for 8 columns server side should be much quicker.

    THIRD TEST:

    1. First Filter.  Select 8 columns.  
    2. Second Filter.  On indexed FirstName column.  Equals 'John'.
    3. This happened in about 1-3 seconds.  SUCCESS.

    THOUGHTS

    Perhaps there is a bug with either how PowerQuery is handling a Date Filter or there is a problem with listdata.svc with date filters (I assume PowerQuery is using listdata.svc.)

    FINAL TESTS on listdata.svc direct.

    1. listdata.svc/MyTable?$filter=FirstName+eq+'John'     - Works as expected
    2. listdata.svc/MyTable?$top=100                                - Works as expected
    3. listdata.svc/MyTable?$filter=Modified+ge+datetime'2015-07-01'

    The last one fails with a webpage cannot be found -- but I suspect the issue is inside the listdata.svc service engine.

    ADDITIONAL TESTS

    1. listdata.svc/MyTable?$select=Id,CaseNumber,Modified,FirstName&$filter=FirstName eq 'John' and Modified ge datetime'2015-06-20'
    2. listdata.svc/MyTable?$select=Id,CaseNumber,Modified,FirstName&$filter=Modified ge datetime'2015-06-20' and FirstName eq 'John'   THIS FAILS

      WEIRD??
    3. NOTE:  I have confirmed that the problem only happens with lists with over 5000 rows.  Its like the listdata.svc does not use the indexes when searching on date columns -- even if they are indexed.

    *************

    I would think that the ability to filter on the modified data is critical?  Any ideas?

    I think the underlying problem is in listdata.svc itself.  But having said that, its a nasty limitation/performance issue for PowerQuery as well.

    HELP!!


    Cheers, Savin Smith


    Wednesday, July 1, 2015 1:32 PM
  • Hi Christopher,

    For the null value problem, can you please send a frown with a repro for us to investigate further :)?

    The investigation you done for listdata looks great! We don't track issue with listdata though. As I mentioned before, we push operation for as much as we can, and if the server can't handle it we would do it locally. I won't be able to tell you exactly what's happening in you case because each server is different in how it behaves. However, we are always looking to improve the performance, we will be introducing support to the SP2013 APIs soon, which I believe would be a good start.

    If you have specific scenarios where you would like to see it become faster, please send a frown with your suggestions, examples with queries are much appreciated. It will allow us to prioritize it accordingly.


    Thanks, Hadeel

    Tuesday, July 7, 2015 10:45 PM