none
Powershell - Trouble getting web page content from IE object RRS feed

  • Question

  • Hi, I am pretty new to Powershell and just using it for personal stuff.  I have been experimenting with pulling specific info from websites to include in emails to family.  By reading the forums I got pretty good using the Invoke-WebRequest cmdlet, but soon hit upon its limitation of not having access to content constructed dynamically at the time the page is loaded.

    Thanks to these forums, I then discovered the IE object and how to pull the data.  I had luck with one website, but another I tried does not work the same.  Hoping for a little help figuring it out.

    Here is a snippet of the inspected code for the page, with my target of interest highlighted.

    Below is the code where I am trying to extract that text string.  I have tried many iterations and approaches with no success.  What is odd, though, the $ie.Document object supposedly has a "body" object, but when I tried to access it, I get a null object error.  I notice the Document object itself has a getElementsByTagName method, so I tried that.  It does not have a getElementsByClassName method.

    Note that the URL I am loading is "https" so I am wondering if this is causing issues.  Suggestions appreciated!  If I can just get a surrounding chunk of the HTML, I am fine doing some string manipulation to get what I want.

    #  Create IE object and load URL
    
    $WeatherURL = "https://weather.com/weather/today/l/77630"
    $ie = New-Object -comobject "InternetExplorer.Application"
    $ie.visible = $true
    $ie.navigate($WeatherURL)
    
     # Wait for the page to load 
    
    while ($ie.Busy -eq $true -Or $ie.ReadyState -ne 4) {Start-Sleep 2}
    
    $Doc = $ie.Document
    
    $Weather0 = $Doc.getElementsByTagName('span') `
    | ?{$_.getAttribute('class') -eq "today-wx-description"} | Select-Object -First 1
    

    Tuesday, September 13, 2016 1:48 PM

Answers

  • Screen scraping is never reliable especially for the Weather site as it is changed very frequently.

    Try using the NOAA web service as it is more direct and returns objects.


    \_(ツ)_/

    • Marked as answer by tkcas Monday, September 19, 2016 2:26 PM
    Tuesday, September 13, 2016 3:57 PM

All replies

  • Screen scraping is never reliable especially for the Weather site as it is changed very frequently.

    Try using the NOAA web service as it is more direct and returns objects.


    \_(ツ)_/

    • Marked as answer by tkcas Monday, September 19, 2016 2:26 PM
    Tuesday, September 13, 2016 3:57 PM
  • Screen scraping is never reliable especially for the Weather site as it is changed very frequently.

    Try using the NOAA web service as it is more direct and returns objects.


    \_(ツ)_/

    At first I did not take your advice and kept banging my head against weather.com.  I finally gave up and found that weather.gov has the same kind of daily weather commentary I was looking for and was able to incorporate this into my morning email pretty quickly.  Below is the code snippet that retrieves the forecast and preps it for inclusion in the email.

    $ieError = $false
    Try
    {
        $WeatherURL = "http://forecast.weather.gov/MapClick.php?CityName=Beaumont&state=TX&site=LCH&textField1=30.0878&textField2=-94.1445&e=1#.V987zfDx5hF"
        $ie = New-Object -com "InternetExplorer.Application"
        $ie.visible = $true
        $ie.navigate($WeatherURL)
    }
    Catch
    {
        #  error condition handled silently - no error message displayed.  Weather will be omitted.
        $ieError = $true
    }
    
    If (-Not $ieError)
    {
        # Wait for the page to load 
    
        while ($ie.Busy -eq $true -Or $ie.ReadyState -ne 4) {Start-Sleep 2}
    
        $Doc = $ie.Document.body
        $Weather0 = $Doc.getElementsByClassName("col-sm-10 forecast-text")[0].innerHTML
        $Weather1 = ($Weather0.Substring(0,1)).tolower()+($Weather0.Substring(1,($Weather0.length -1)))
        $WeatherBlurb = -join "<p>", "My weather says ", $Weather1, "</p>"
        
        $ie.Quit()
        $Processes = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($ie)
        Remove-Variable ie
    }


    • Edited by tkcas Monday, September 19, 2016 2:30 PM
    Monday, September 19, 2016 2:29 PM