none
How can I export table row in internet explorer? RRS feed

  • Question

  • I need to export a single table row on a website and I can't figure out how to do it.  The source view for the row I need is:

    tr class="alt"> <td id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE</td> <td></td> <td></td> <td>0.00065227</td> <td>0.01233629</td> <td>0.00371003</td> </tr>

    I can get the table ID using the code below, but I don't know how to get the rest of the values. The table ID does not change but the numerical values do.

    $ie = New-Object -com InternetExplorer.Application
    $ie.silent = $false
    $ie.navigate2("mywebsite.com")
    $ie.Document.getElementById("16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE")

    Friday, January 10, 2014 3:10 PM

Answers

  • Hi Tom,

    this may not be quite the perfect solution, but it works for me at least. I'm not using the IE ComObject, but rather the .NET Webclient for it ...

    # Load downloader function
    function Get-WebContent
    {
    	<#
    		.SYNOPSIS
    			Downloads a file
    	
    		.DESCRIPTION
    			Download any file using a valid weblink and either store it locally or return its content
    		
    		.PARAMETER webLink
    			The full link to the file (Example: "http://www.example.com/files/examplefile.dat"). Adds "http://" if webLink starts with "www".
    	
    		.PARAMETER destination
    			The target where you want to store the file, including the filename (Example: "C:\Example\examplefile.dat"). Folder needs not exist but path must be valid. Optional.
    	
    		.PARAMETER getContent
    			Switch that controls whether the function returns the file content.
    	
    		.EXAMPLE
    			Get-WebContent -webLink "http://www.technet.com" -destination "C:\Example\technet.html"
    			This will download the technet website and store it as a html file to the target location
    	
    		.EXAMPLE
    			Get-WebContent -webLink "www.technet.com" -getContent
    			This will download the technet website and return its content (as a string)
    	#>
    	Param(
    	[Parameter(Mandatory=$true,Position="0")]
    	[Alias('from')]
    	[string]
    	$WebLink,
    	
    	[Parameter(Position="1")]
    	[Alias('to')]
    	[string]
    	$Destination,
    	
    	[Alias('grab')]
    	[switch]
    	$GetContent
    	)
    	
    	# Correct WebLink for typical errors
    	if ($webLink.StartsWith("www") -or $webLink.StartsWith("WWW")){$webLink = "http://" + $webLink}
    	
    	$webclient = New-Object Net.Webclient
    	$file = $webclient.DownloadString($webLink)
    	if ($destination -ne "")
    	{
    		try {Set-Content -Path $destination -Value $file -Force}
    		catch {}
    	}
    	if ($getContent){return $file}
    }
    
    # Download website
    $website = Get-WebContent -WebLink "http://www.mywebsite.com" -GetContent
    
    # Cut away everything before the relevant part
    $string = $website.SubString($website.IndexOf('<td id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">'))
    
    # Cut away everything after the row
    $string = $string.SubString(0,$string.IndexOf('</tr>'))
    
    # Split the string into each individual line
    $lines = $string.Split("`n")
    
    # Prepareing result variable
    $results = @()
    
    # For each line, cut away the clutter
    foreach ($line in $lines)
    {
    	$temp = $line.SubString(4,($line.length - 10))
    	
    	# for the first line, the td has an id, which this compensates for
    	if ($temp -like 'id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">*'){$temp = $temp.SubString(($temp.IndexOf(">") + 1))}
    	
    	# Add cleaned line to results
    	$results += $temp
    }

    You may need to adapt the string parsing beneath the function, if the text you posted is not literally identical to the way this function returns it. It worked for a string block acquired via copy&paste from your post anyway. :)

    I certainly would be more than happy to read a more elegant version, if someone has one to offer.

    Cheers,
    Fred


    There's no place like 127.0.0.1

    • Marked as answer by mrtom731 Saturday, January 11, 2014 4:20 PM
    Friday, January 10, 2014 3:50 PM

All replies

  • Hi Tom,

    this may not be quite the perfect solution, but it works for me at least. I'm not using the IE ComObject, but rather the .NET Webclient for it ...

    # Load downloader function
    function Get-WebContent
    {
    	<#
    		.SYNOPSIS
    			Downloads a file
    	
    		.DESCRIPTION
    			Download any file using a valid weblink and either store it locally or return its content
    		
    		.PARAMETER webLink
    			The full link to the file (Example: "http://www.example.com/files/examplefile.dat"). Adds "http://" if webLink starts with "www".
    	
    		.PARAMETER destination
    			The target where you want to store the file, including the filename (Example: "C:\Example\examplefile.dat"). Folder needs not exist but path must be valid. Optional.
    	
    		.PARAMETER getContent
    			Switch that controls whether the function returns the file content.
    	
    		.EXAMPLE
    			Get-WebContent -webLink "http://www.technet.com" -destination "C:\Example\technet.html"
    			This will download the technet website and store it as a html file to the target location
    	
    		.EXAMPLE
    			Get-WebContent -webLink "www.technet.com" -getContent
    			This will download the technet website and return its content (as a string)
    	#>
    	Param(
    	[Parameter(Mandatory=$true,Position="0")]
    	[Alias('from')]
    	[string]
    	$WebLink,
    	
    	[Parameter(Position="1")]
    	[Alias('to')]
    	[string]
    	$Destination,
    	
    	[Alias('grab')]
    	[switch]
    	$GetContent
    	)
    	
    	# Correct WebLink for typical errors
    	if ($webLink.StartsWith("www") -or $webLink.StartsWith("WWW")){$webLink = "http://" + $webLink}
    	
    	$webclient = New-Object Net.Webclient
    	$file = $webclient.DownloadString($webLink)
    	if ($destination -ne "")
    	{
    		try {Set-Content -Path $destination -Value $file -Force}
    		catch {}
    	}
    	if ($getContent){return $file}
    }
    
    # Download website
    $website = Get-WebContent -WebLink "http://www.mywebsite.com" -GetContent
    
    # Cut away everything before the relevant part
    $string = $website.SubString($website.IndexOf('<td id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">'))
    
    # Cut away everything after the row
    $string = $string.SubString(0,$string.IndexOf('</tr>'))
    
    # Split the string into each individual line
    $lines = $string.Split("`n")
    
    # Prepareing result variable
    $results = @()
    
    # For each line, cut away the clutter
    foreach ($line in $lines)
    {
    	$temp = $line.SubString(4,($line.length - 10))
    	
    	# for the first line, the td has an id, which this compensates for
    	if ($temp -like 'id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">*'){$temp = $temp.SubString(($temp.IndexOf(">") + 1))}
    	
    	# Add cleaned line to results
    	$results += $temp
    }

    You may need to adapt the string parsing beneath the function, if the text you posted is not literally identical to the way this function returns it. It worked for a string block acquired via copy&paste from your post anyway. :)

    I certainly would be more than happy to read a more elegant version, if someone has one to offer.

    Cheers,
    Fred


    There's no place like 127.0.0.1

    • Marked as answer by mrtom731 Saturday, January 11, 2014 4:20 PM
    Friday, January 10, 2014 3:50 PM
  • Two issues which I see.  The table ID will likely be different every time you download the page.

    IF the page is XHTML or HTML5 then it can be parsed most easily as XML.  YOu can just find all nodes that are 'table' and pick the one you want.  All <tr> elements will be the rows.  From this point it is very easy to convert the table into data objects.

    You can alos directly import HTML tables in Excel and MSAccess.


    ¯\_(ツ)_/¯

    Friday, January 10, 2014 7:45 PM
  • I modified the end of your script. Instead of parsing, I just used:

    $lines = $string.Split("`n")
    $lines = $lines.Replace("<td>", "")
    $lines = $lines.Replace("</td>", "") | Out-File -FilePath c:\test\test.csv -Append

    -----------------------------------

    Everything after that I removed and it's nice and clean. Thank you very much for your help!!

    Friday, January 10, 2014 8:04 PM