none
How to know when Power Query will not work with a website RRS feed

  • Question

  • Hi! I am very new to Power Query and as such can't tell if I am doing something wrong, or if the website will not allow me to get the data. Here is my predicament...My (non-profit) org has always used a third party site to retrieve donor detail from (i.e. we go to their site, download a spreadsheet of donor names and info so we can upload to our system, thank the donors, etc.). Now we can see the data on the screen, but cannot export it in any way. It is in a table. I can copy and paste on to a spreadsheet manually, but the page only shows a few rows at a time, and therefore some files have hundreds, maybe thousands of pages, so it is not an option to copy and paste the data manually. I have to use my credentials to log in to the site, and once in the site, choose the file I need to look at that day. So, I thought I could use Power Query to grab the data. I logged in to the site, went to the page I wanted, copied the URL. Went to Power Query>From Web, pasted the URL. In the Data Source Settings, I went to Basic and entered the login info (choosing the specific URL of the data I wanted and not the home page which was also an option). Hit Done. All it ever says on the Navigator is "Document" under the URL on the left, and the table it shows isn't a real table (it just says Element, Null, etc.). Unfortunately since this is all private info I cannot provide screenshots which I know means this post may be fruitless. My question is, if the website is somehow blocking Power Query from seeing the table, how would I know? Is there some troubleshooting way of knowing whether I am doing something wrong, or if it will never be able to do it no matter what? I have looked through books, videos, etc. and if it is possible to do this I need to learn how and will, but I don't want to waste more time than I already have if this is never going to be possible from this site. Any help you can give would be greatly appreciated. Thank you for your time.

    Thursday, June 11, 2015 6:18 PM

Answers

  • Hi Spacle,

    Can you share the page you're trying to access?

    It's possible that the page you're visiting requires some user interaction or that the table you're looking for is rendered asynchronously from Javascript. In some of those cases, Power Query believes that the page is done rendering and returns the page without the content you expect.

    Unfortunately, there is no nice workaround for this. One thing you can be able to do is open the page in your browser, open the source and copy-paste the HTML into your query:

    = Web.Page("<html>...</html>")

    If you're not an intermediate user of the Power Query Formula Language, you might find this tricky. You'll need to do things like escaping quotes in the HTML by double quoting them.

    Another workaround is to import the page from Excel's "From Web" feature in the "Data" tab, dump the content into a sheet and then import the sheet into Power Query using "From Table".

    Tristan

    Friday, June 12, 2015 9:59 PM
    Moderator

All replies

  • Hi spacle,

    in many cases, Power Query web queries will be returned in XML-Format. This means multi-level-nesting.

    Chris Webb has developed a very useful code that expands all steps at once, so you can see at a glance if your web-query consists of anything of interest:

    http://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/

    Simply check the whole table that this query produces and shrink all columns to the min - that way you'll be able to get the best overview of what's actually in there.


    Imke

    Thursday, June 11, 2015 7:16 PM
    Moderator
  • Hi! Thanks for the response! If I were actually getting a viable result though from the query, wouldn't it say something that indicated the data was there somewhere? For example, when I run a query using something generic like Wikipedia, I see the table in the Navigator, I can see the table name on the left hand side, etc. When I run the query as I did above, Excel doesn't seem to actually see anything on the webpage, even though the info is there and is in a table (if I look at the source code for the page I can see the table).  Since it gives me no result, I can't tell if I am doing something wrong or if the website is blocking me from doing the query.

    Thanks again for your time!


    • Edited by spacle Thursday, June 11, 2015 9:03 PM
    Thursday, June 11, 2015 7:31 PM
  • Hi Spacle,

    Can you share the page you're trying to access?

    It's possible that the page you're visiting requires some user interaction or that the table you're looking for is rendered asynchronously from Javascript. In some of those cases, Power Query believes that the page is done rendering and returns the page without the content you expect.

    Unfortunately, there is no nice workaround for this. One thing you can be able to do is open the page in your browser, open the source and copy-paste the HTML into your query:

    = Web.Page("<html>...</html>")

    If you're not an intermediate user of the Power Query Formula Language, you might find this tricky. You'll need to do things like escaping quotes in the HTML by double quoting them.

    Another workaround is to import the page from Excel's "From Web" feature in the "Data" tab, dump the content into a sheet and then import the sheet into Power Query using "From Table".

    Tristan

    Friday, June 12, 2015 9:59 PM
    Moderator