locked
Invoke-WebRequest to download in-browser pdf RRS feed

  • Question

  • Hello,

    I'm trying to figure out how to download a pdf from a url. I've tried the following code, however the file is always corrupt.  When I put the url in a browser, it shows the pdf inline with the option to save as a .pdf.  I'm not sure how to mirror the save action and working with web pages in PowerShell is new to me.  I've noticed that there seems to be a redirect url for login authorization ($redirectLogin) but I'm not certain how to use that (or if it's needed). 

    $LoginURL = 'https://login.procore.com'
    $redirectLogin = "https://app.procore.com/auth/procore"
    $downloadURL = 'https://app.procore.com/352325/project/checklists/lists/1771201.pdf'
    $outputLoc = "$PSScriptRoot\Test.pdf"
    $WebResponse = Invoke-WebRequest $LoginURL -SessionVariable 'session'

    $LoginForm = $WebResponse.Forms[0]
    $LoginForm.Fields["utf8"] = $LoginForm.Fields["utf8"]
    $LoginForm.Fields["authenticity_token"] = $LoginForm.Fields["authenticity_token"]
    $LoginForm.Fields["session_sso_target_url"] = $LoginForm.Fields["session_sso_target_url"]
    $LoginForm.Fields["session_email"] = $user
    $LoginForm.Fields["session_password"] = $pass
    $LoginForm.Fields["session_remember_me"] = $false
    $Url = $LoginURL+$LoginForm.Action

    #Wasn't sure if I can call the site once with the credentials and then call without

    $WebResponse = Invoke-WebRequest -Uri $url -Method Post -Body $LoginForm -SessionVariable $session

    $WebResponse = Invoke-WebRequest -Uri $downloadURL -Credential Get-Credential -OutFile $outputLoc

    Any pointers?

    Friday, November 1, 2019 8:29 PM

Answers

  • I appreciate the response. I'm not sure what you mean by screen scraping.  We pull data through API calls and store it in a database. The pdf url is just a static static string constructed with the project id.  We need to get the pdf's for many projects and the app doesn't provide this (single click). So  I was just looking to dynamically download the pdf's based on supplementing the project id's into the url string.

    I tried your suggested method of using the web client class and get an error of: "too many automatic redirections were attempted."

    This indicates that the web site has obfuscated downloads or that you need to use their proprietary API. Contact the site owners to find out how to use their API. IE and WebClient cannot work against an API which would require setting up the Json or Soap API calls to use Invoke.

    We cannot help you with this as it is proprietary to the site so you will have to get them to give you the info and API docs.  In most cases you will be required to register for API usage and get a special token to use in your calls.

    PS:  What you are doing by trying to fill in the form through the object is known as screen scraping.  It will not work on most modern secure sites.


    \_(ツ)_/


    • Edited by jrv Saturday, November 2, 2019 12:02 AM
    • Marked as answer by Binary Creations Thursday, November 7, 2019 3:59 PM
    Saturday, November 2, 2019 12:01 AM

All replies

  • You would use the webcliient class to do this.

    $wc = [System.Net.WebClient]::New()
    $wc.Credentials = Get-Cerdential
    $wc.DownloadFile($url,$localfilepathname)
    

    We don't support or debug screen-scaping methods using WebRequest.  The WebClient does all of this for sites tha6t do not have logins obfuscated to prevent you from doing what you are trying to do.

    If the WebRequest login is successfule then just save the SessionVariable and use it in the web request that downloads the file.

    Invoke-WebRequest -WebSession $session  … other parameters


    \_(ツ)_/

    Friday, November 1, 2019 9:38 PM
  • I appreciate the response. I'm not sure what you mean by screen scraping.  We pull data through API calls and store it in a database. The pdf url is just a static static string constructed with the project id.  We need to get the pdf's for many projects and the app doesn't provide this (single click). So  I was just looking to dynamically download the pdf's based on supplementing the project id's into the url string.

    I tried your suggested method of using the web client class and get an error of: "too many automatic redirections were attempted."

    Friday, November 1, 2019 10:04 PM
  • One: Have you asked them directly if you can actually download files directly?

    Two: Checked for existing stuff?

    https://developers.procore.com/documentation/making-first-call

    https://feedback.procore.com/forums/183340-customer-feedback-for-procore-technologies-inc/suggestions/17409889-pdf-download

    Three: Tried browser automation?  IE COM and/or Selenium?

    https://westerndevs.com/simple-powershell-automation-browser-based-tasks/

    Friday, November 1, 2019 10:21 PM
  • I appreciate the response. I'm not sure what you mean by screen scraping.  We pull data through API calls and store it in a database. The pdf url is just a static static string constructed with the project id.  We need to get the pdf's for many projects and the app doesn't provide this (single click). So  I was just looking to dynamically download the pdf's based on supplementing the project id's into the url string.

    I tried your suggested method of using the web client class and get an error of: "too many automatic redirections were attempted."

    This indicates that the web site has obfuscated downloads or that you need to use their proprietary API. Contact the site owners to find out how to use their API. IE and WebClient cannot work against an API which would require setting up the Json or Soap API calls to use Invoke.

    We cannot help you with this as it is proprietary to the site so you will have to get them to give you the info and API docs.  In most cases you will be required to register for API usage and get a special token to use in your calls.

    PS:  What you are doing by trying to fill in the form through the object is known as screen scraping.  It will not work on most modern secure sites.


    \_(ツ)_/


    • Edited by jrv Saturday, November 2, 2019 12:02 AM
    • Marked as answer by Binary Creations Thursday, November 7, 2019 3:59 PM
    Saturday, November 2, 2019 12:01 AM
  • Here is a link to the procore API: https://developers.procore.com/


    \_(ツ)_/

    Saturday, November 2, 2019 12:04 AM
  • My *guess* here is not that he's trying to modify the PDF...  He's trying to login, then download the PDF that he gains access to after the login.  Just a guess and there's nothing more we can do.  I'm not going to try to get a login to that site (even if somehow free).
    Saturday, November 2, 2019 1:09 PM