none
Pulling HTML elements from site RRS feed

  • Question

  • Thanks to the forums I have expanded on a basic script that goes out and searches a predefined listing of URL's for several keywords, returning those URLs which contain the defined keywords. 

    What I cannot figure out how to do (I have mostly been trying to get Invoke-WebRequest to accomplish this) is go out to these same URL's and pull the HTML elements out and put them into a .csv.  So, for example, I am including a couple of elements from the target website.  From it, I want to pull out the "CVE-ID's", "Date Released" and "Proximity" from the page.  This is the source code from an example page:

        <tr class="t">
                <td class="status">CVE-ID's</td>
                <td>
                    <p> CVE-2014-2389 CVE-2014-1956 </p>
                </td>
            </tr>
        <tr class="t">
            <td class="status">Date Released</td>
            <td>
                <p>03 Jul 2014</p>
            </td>
            </tr>
        <tr class="t">
                <td class="status">Proximity</td>
                <td>
                    <p> From adjacent network </p>
                </td>
            </tr>
    Tuesday, July 15, 2014 6:52 PM

Answers

  • This is one way to grab them:

    PS C:\scripts> $string='xxxxxxxxxxxxxxxx CVE-2014-2389 fffffffffff  CVE-2014-1956 ppppppppppppp'
    PS C:\scripts> Select-String '(?<n>CVE-\d+-\d+)' -input $string -AllMatches | Foreach {$_.matches.Value}
    CVE-2014-2389
    CVE-2014-1956


    ¯\_(ツ)_/¯

    • Marked as answer by Sure-man Tuesday, July 15, 2014 10:11 PM
    Tuesday, July 15, 2014 7:56 PM
  • What is in $string?


    ¯\_(ツ)_/¯

    • Marked as answer by Sure-man Tuesday, July 15, 2014 10:11 PM
    Tuesday, July 15, 2014 9:23 PM
  • I think you need to learn the basics of PowerShell.  How can you get the output into a file when you are not writing it to a file.

    You are also just putting the list of URLs into the string search.

    Take some time and think about what you are doing.  How is select-string going to know what is on a web page from its URL?


    ¯\_(ツ)_/¯

    • Marked as answer by Sure-man Tuesday, July 15, 2014 10:11 PM
    Tuesday, July 15, 2014 9:37 PM

All replies

  • This is one way to grab them:

    PS C:\scripts> $string='xxxxxxxxxxxxxxxx CVE-2014-2389 fffffffffff  CVE-2014-1956 ppppppppppppp'
    PS C:\scripts> Select-String '(?<n>CVE-\d+-\d+)' -input $string -AllMatches | Foreach {$_.matches.Value}
    CVE-2014-2389
    CVE-2014-1956


    ¯\_(ツ)_/¯

    • Marked as answer by Sure-man Tuesday, July 15, 2014 10:11 PM
    Tuesday, July 15, 2014 7:56 PM
  • $html=@'
    <tr class="t">
                <td class="status">CVE-ID's</td>
                <td>
                    <p> CVE-2014-2389 CVE-2014-1956 </p>
                </td>
            </tr>
        <tr class="t">
            <td class="status">Date Released</td>
            <td>
                <p>03 Jul 2014</p>
            </td>
            </tr>
        <tr class="t">
                <td class="status">Proximity</td>
                <td>
                    <p> From adjacent network </p>
                </td>
            </tr>
    '@
    
    PS C:\scripts> Select-String '(?<n>CVE-\d+-\d+)' -input $html -AllMatches | Foreach {$_.matches.Value}
    CVE-2014-2389
    CVE-2014-1956
    PS C:\scripts>


    ¯\_(ツ)_/¯

    Tuesday, July 15, 2014 7:58 PM
  • Ok, so here is what I have:

    $output = @()
    $web = New-Object Net.WebClient
    $urls = get-content "C:\Scripts\URLS.txt"
    foreach($url in $urls){
    $results = $web.DownloadString("$url")
    $matches = $results | Select-String '(?<n>CVE-\d+-\d+)' -input $string -AllMatches | Foreach {$_.matches.Value}
    if ($matches.Matches){
    $Object = New-Object PSObject
    $Object | add-member Noteproperty URL           $url
    $Object | add-member Noteproperty CVE           $matches.Matches.value     
    $output+=$Object}
    }
    $output|Out-File "C:\Scripts\urlandcve.txt" -force

    With the following error:

    Select-String : The input object cannot be bound to any parameters for the command either because the command does not take pipeline input or the input and its

    properties do not match any of the parameters that take pipeline input.

    At line:6 char:23

    + $matches = $results | Select-String '(?<n>CVE-\d+-\d+)' -input $string -AllMatch ...

    +                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        + CategoryInfo          : InvalidArgument: (<!DOCTYPE HTML .../body>

    </html>

    :String) [Select-String], ParameterBindingException

        + FullyQualifiedErrorId : InputObjectNotBound,Microsoft.PowerShell.Commands.SelectStringCommand

    Tuesday, July 15, 2014 8:18 PM
  • You can't have to sources on an object.

    $matches = $results | Select-String '(?<n>CVE-\d+-\d+)' -input $string -AllMatches | Foreach {$_.matches.Value}

    You are using $string and $results. Won't work. Use one or the other.


    ¯\_(ツ)_/¯

    Tuesday, July 15, 2014 8:43 PM
  • Now, the script runs to completion without error and produces the .txt file, but the output text file is blank.  I have confirmed the targeted URL (from URLS.txt) does contain the CVE.

    $output = @()
    $web = New-Object Net.WebClient
    $urls = get-content "C:\Scripts\URLS.txt"
    #temporary location for this test script
    foreach($url in $urls){
    $results = $web.DownloadString("$url")
    #$matches = $results | 
    Select-String '(?<n>CVE-\d+-\d+)' -input $string -AllMatches | Foreach {$_.matches.Value}
    if ($matches.Matches){
    $Object = New-Object PSObject
    $Object | add-member Noteproperty URL           $url
    $Object | add-member Noteproperty CVE           $matches.Matches.value     
    $output+=$Object}
    }
    $output|Out-File "C:\Scripts\urlandcve.txt" -force
    #temporary location for this test script

    I even tried commenting out the $results = $web.DownloadString("$url") which also worked without error, but produced a blank .txt file.
    Tuesday, July 15, 2014 9:11 PM
  • What is in $string?


    ¯\_(ツ)_/¯

    • Marked as answer by Sure-man Tuesday, July 15, 2014 10:11 PM
    Tuesday, July 15, 2014 9:23 PM
  • Admittedly, I just copied and pasted your first recommendation which contained $string.

    PS C:\scripts> $string='xxxxxxxxxxxxxxxx CVE-2014-2389 fffffffffff  CVE-2014-1956 ppppppppppppp'
    PS C
    :\scripts> Select-String '(?<n>CVE-\d+-\d+)' -input $string -AllMatches | Foreach {$_.matches.Value}

    Looking back at it now, I see you included that because you had set the string variable as text to search through.

    Here is my updated script which still completes and produces a file, but it's blank.

    $output = @()
    $web = New-Object Net.WebClient
    $urls = get-content "C:\Scripts\URLS.txt"
    #foreach($url in $urls){
    #$results = $web.DownloadString("$url")
    #$matches = $results | 
    Select-String '(?<n>CVE-\d+-\d+)' -input $urls -AllMatches | Foreach {$_.matches.Value}
    if ($matches.Matches){
    $Object = New-Object PSObject
    $Object | add-member Noteproperty URL           $url
    $Object | add-member Noteproperty CVE           $matches.Matches.value     
    $output+=$Object}
    }
    $output|Out-File "C:\Scripts\urlandcve.txt" -force


    • Edited by Sure-man Tuesday, July 15, 2014 9:37 PM Added } to close matches.value on line 7
    Tuesday, July 15, 2014 9:32 PM
  • I think you need to learn the basics of PowerShell.  How can you get the output into a file when you are not writing it to a file.

    You are also just putting the list of URLs into the string search.

    Take some time and think about what you are doing.  How is select-string going to know what is on a web page from its URL?


    ¯\_(ツ)_/¯

    • Marked as answer by Sure-man Tuesday, July 15, 2014 10:11 PM
    Tuesday, July 15, 2014 9:37 PM
  • You are right, I do need considerably more practice with the basics.  I cannot answer your question on how the select-string is going to know what is on a web page from it's URL.

    I just know I got it working and wouldn't have without your help, so for that thank you.

    Here is the final script:

    $output = @()
    $web = New-Object Net.WebClient
    $urls = get-content "C:\Scripts\TestURLS.txt"
    foreach($url in $urls){
    $results = $web.DownloadString("$url")
    $matches = $results | Select-String '(?<n>CVE-\d+-\d+)' -AllMatches
    if ($matches.Matches){
    $Object = New-Object PSObject
    $Object | add-member Noteproperty URL           $url
    $Object | add-member Noteproperty CVE           $matches.Matches.value     
    $output+=$Object}
    }
    $output|Out-File "C:\Scripts\urlandcve.txt" -force

    Output was written to the file.  Here's what it looks like (minus the extra spaces and carriage returns):

    URL                                                               CVE
    ---                                                               ---                                     
    http ://cvedetails.com/cve-details.php?t=1&cve_id=CVE-2014-3074    {CVE-2014-3074, CVE-2014-3074, CVE-2014-3074, CVE-2014-3074...}

    Now, to figure out how to eliminate the duplicates and truncation.  Thanks again for your help.

    Thanks again.

    Tuesday, July 15, 2014 10:02 PM