none
Web Scraping RRS feed

  • Question

  • My Query was previously working and something changed over the last week... 

    The web scrape my Query is pulling is loading the mobile version of the website which contains different formats/tables than the desktop version. Does anyone know how to change this so the Query is web scraping from the desktop version?

    Many thanks.

    Friday, December 29, 2017 2:38 AM

Answers

  • Hi Justin,

    I wish I had other suggestions, but without seeing the webpage or how it behaves, it is near impossible to tell how to deal with it. On top of my head, I would say that you recreate the solution in a different computer to see if you get the same behavior - that's the last resort. If that fails, then you'd need to create a solution against the mobile side version of the website.

    Wednesday, January 10, 2018 6:58 AM

All replies

  • A few suggestions:

    • Find out if the website is defining this behavior based on a user-agent, the size of the window or perhaps a url query parameter that you can define
    • Power Query uses Inter Explorer to render the webpage using the Web.Page function. Check out and see what you get only using the Web.Contents function and seeing the html code as pure text. That would be an easier workaround.
    Friday, December 29, 2017 6:52 PM
  • Hi Miguel,

    I replicated all the headers of my browser in the Query but didn't have any luck. The site might be using Java or CSS to determine mobile/desktop because there's nothing distinguishable in the URL. 

    When I tried Web.Contents only, the query pulled an image of an internet explorer icon with the website name. 

    Any further ideas would be appreciated!


    • Edited by Justin927 Wednesday, January 10, 2018 11:16 PM Wrong name
    Monday, January 8, 2018 11:03 PM
  • Hi Justin,

    I wish I had other suggestions, but without seeing the webpage or how it behaves, it is near impossible to tell how to deal with it. On top of my head, I would say that you recreate the solution in a different computer to see if you get the same behavior - that's the last resort. If that fails, then you'd need to create a solution against the mobile side version of the website.

    Wednesday, January 10, 2018 6:58 AM
  • Hi Miguel and Imke :-)

    Power query should make a lit of a deep dive about getting data from web. A lot of people think that is just paste the link for example like pasting a more "traditional" web page like this one

    http://www.nationsonline.org/oneworld/country_code_list.htm

    But as soon as they try to get to other webpages like this one :

    http://www.2018-exhibitors-list.vinexpohongkong.com/en/accueil/page1.html

    They get into trouble. I think either doing a more advance tutorial for power query and adding some other function will do. Else the best is to use open source tools as Scrapy.

     https://docs.scrapy.org/en/latest/intro/tutorial.html#a-shortcut-for-creating-requests

    Cheers and thank both for the time and effort you put for our community!

    / Patricio


    Patricio

    Friday, April 13, 2018 8:34 AM
  • Thanks for the kind words, Patricio!

    From my own personal experience, I've found out that Power Query is not a tool designed for Web Scrapping. You can do wonders if your web source is an API or a web service, but the moment that is a webpage or just some html, then is not really a great experience and other tools are far better suited for those scenarios.

    They might release more features to handle web scrapping scenarios, but, currently, your best bet is that your webpage has some table tags otherwise is going to be a tedious task. Not to mention that there are other highly important limitations for more experienced users like the fact that you can't get pass a credentials or login form.

    Friday, April 13, 2018 11:02 AM