none
PQ vs Browser HTML Variance RRS feed

  • Question

  • Hi team

    I'm trying to view the following website in Power Query v2.46.4732.721 64-bit:

    Text.FromBinary(Web.Contents("http://www5.statcan.gc.ca/cansim/a25?id=520005&lang=eng&smonth=7&syear=2017&emonth=7&eyear=2063&stByVal=2&whenConvertingFrequency=USE_CALENDAR_YEAR&manipulationOption=DATA_AS_RETRIEVED&outputFrequency=UNCHANGED&MBR%5B%27GEO%27%5D=6.2&MBR%5B%27AGE%27%5D=1&MBR%5B%27SEX%27%5D=1&MBR%5B%27PROJECT%27%5D=3*4.5&retrLang=eng&exporterType=DATABASE_DATA_QUALITY&fileSeparator=CSV&verificationOption=NORMAL_RETRIEVAL&entireTableFileType=CSV"))

    Unfortunately, PQ returns completely different HTML than Edge, IE or Chrome:

    Does anyone have an idea why this is happening?

    Thanks,
    Simon



    • Edited by Simon Nuss Wednesday, July 19, 2017 3:00 PM
    Wednesday, July 19, 2017 2:56 PM

Answers

  • ....but they're also returning a link to the CSV in the HTML itself, so you could theoretically do this:

    let
        Source = Text.FromBinary(Web.Contents("http://www5.statcan.gc.ca/cansim/a25?id=520005&lang=eng&smonth=7&syear=2017&emonth=7&eyear=2063&stByVal=2&whenConvertingFrequency=USE_CALENDAR_YEAR&manipulationOption=DATA_AS_RETRIEVED&outputFrequency=UNCHANGED&MBR%5B%27GEO%27%5D=6.2&MBR%5B%27AGE%27%5D=1&MBR%5B%27SEX%27%5D=1&MBR%5B%27PROJECT%27%5D=3*4.5&retrLang=eng&exporterType=DATABASE_DATA_QUALITY&fileSeparator=CSV&verificationOption=NORMAL_RETRIEVAL&entireTableFileType=CSV")),
        URLStart = Text.PositionOf(Source, "http%3a"),
        URLEnd = Text.PositionOf(Source, ".csv") + 4,
        URL = Text.Replace(Text.Replace(Text.Range(Source, URLStart, URLEnd - URLStart),"%3a",":"),"%2f","/"),
        Data = Csv.Document(Web.Contents(URL))
    in
        Data
    They're pretty clear that they don't support this, so I'd expect this "loophole" to close at any time.

    Wednesday, July 19, 2017 6:19 PM

All replies

  • When I tried that URL in the browser, I was redirected to:

    http://www5.statcan.gc.ca/access_acces/alternative_alternatif?l=eng&keng=52.154&kfra=52.154&teng=Download%20file%20from%20CANSIM&tfra=Fichier%20extrait%20de%20CANSIM&loc=http://www5.statcan.gc.ca/cansim/results/cansim4261460228734557126.csv

    I briefly looked around the CANSIM website to see if they had a web API, and there was this warning on the front page. What you're seeing is almost certainly designed to prevent what you're trying to do.

    Wednesday, July 19, 2017 4:00 PM
  • ....but they're also returning a link to the CSV in the HTML itself, so you could theoretically do this:

    let
        Source = Text.FromBinary(Web.Contents("http://www5.statcan.gc.ca/cansim/a25?id=520005&lang=eng&smonth=7&syear=2017&emonth=7&eyear=2063&stByVal=2&whenConvertingFrequency=USE_CALENDAR_YEAR&manipulationOption=DATA_AS_RETRIEVED&outputFrequency=UNCHANGED&MBR%5B%27GEO%27%5D=6.2&MBR%5B%27AGE%27%5D=1&MBR%5B%27SEX%27%5D=1&MBR%5B%27PROJECT%27%5D=3*4.5&retrLang=eng&exporterType=DATABASE_DATA_QUALITY&fileSeparator=CSV&verificationOption=NORMAL_RETRIEVAL&entireTableFileType=CSV")),
        URLStart = Text.PositionOf(Source, "http%3a"),
        URLEnd = Text.PositionOf(Source, ".csv") + 4,
        URL = Text.Replace(Text.Replace(Text.Range(Source, URLStart, URLEnd - URLStart),"%3a",":"),"%2f","/"),
        Data = Csv.Document(Web.Contents(URL))
    in
        Data
    They're pretty clear that they don't support this, so I'd expect this "loophole" to close at any time.

    Wednesday, July 19, 2017 6:19 PM