locked
Microsoft Jscript using the Cscript engine in Powershell: Can webpages be retrieved by Jscript as text strings only, or also as html Document objects, or as XML objects, RRS feed

  • Question

  • I am developing a Microsoft Jscript application which I run in Powershell, using the cscript engine.

    The application is grabbing web pages, which are to be analysed further to extract certain information.

    Untill now I use RegExps for this purpose. However I would like to be able to analyse the webpages using XPATH or HTML DOM methods, being more robust to page changes.

    However, it seems that I can retrieve the webpages as text only, neither as html document object, analysable with HTML DOM methods, nor as an XML document object analysable by XPATH methods.

    Here is part of my code:

    //prepare http request 

    var  httpReq = tryXMLHttpRequest();
      alert("Getting page 1...");
      httpReq.open("get", url, false);
      //httpReq.responseType = "document"; //fails
      httpReq.send();

      if (httpReq.status != 200)
      {
        alert("Failed with status: " + httpReq.status);
        return false;
      }

    I retrieve the result of the http request by:

    var document   = httpReq.responseText; // yields document as a string

    which yields a string object.

    Trying to get an HTML Document object by setting the http request response type:

    //see: https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/HTML_in_XMLHttpRequest

     httpReq.responseType = "document"; //fails

    fails

    Trying get a XML Document object in the following way:

    var document = httpReq.responseXML

    also fails.

    For the preparation of the http request, I do the following try catch blocks,

    Various trial and error runs have made me clear that only the first 2 can return succesfuly.

    function newXMLHttpRequest()
    {
      try //1
      {
          alert('trying new ActiveXObject("Microsoft.XMLHTTP")......')
          return new ActiveXObject("Microsoft.XMLHTTP");
      }
      catch (e) {alert('error trying new ActiveXObject("Microsoft.XMLHTTP")')}

      try //2
      {
        alert('trying new ActiveXObject("Msxml2.XMLHTTP").....')
        return new ActiveXObject("Msxml2.XMLHTTP");
      }
      catch (e) {alert('error trying new ActiveXObject("Msxml2.XMLHTTP")')}

      try
      {
        alert('trying new XMLHttpRequest.....')
        return new XMLHttpRequest();
      }
      catch (e) {alert('error trying new XMLHttpRequest') }

      try
      {
        alert('trying ActiveXObject("MSXML2.XMLHTTP.3.0").....')
        return ActiveXObject("MSXML2.XMLHTTP.3.0")
      }
      catch(e) {alert('error trying ActiveXObject("MSXML2.XMLHTTP.3.0")')}

      return false;
    }

    For the http request I have also tried (to no avail as mentioned above) 

    httpReq.open("GET", url, false);

    httpReq.responseType = "document";

    httpReq.send();

    Ihave also tried DocumentFrament() to insert my html "soup" into,

    see: https://developer.mozilla.org/en-US/docs/Web/API/DocumentFragment/DocumentFragment

    var fragment = new DocumentFragment()

    fails with : DocumentFragment is undefined.

    So I repeat my question:

    Microsoft Jscript using the Cscript engine in Powershell: Can webpages be retrieved by Jscript as text strings only or also as html Document objects, or as XML objects, and if so, can you please show me the code to do so.

    P.S.

    My programming experience is in Python, I am a total newbie to JavaScript and the capabilites/limitations of Ms Jscript and the Cscript engine w.r.t. Javascript.

    • Moved by Bill_Stewart Monday, April 3, 2017 7:36 PM User pointlessly reinventing the wheel
    Thursday, February 16, 2017 5:41 PM

All replies

  • We would use Invoke-WebRequest.  It does everything that you js code does.

    You can only return strings from cscript.  Invoke returns objects.


    \_(ツ)_/

    Thursday, February 16, 2017 7:07 PM