Лучший отвечающий
Преобразование htm (html) в csv

Вопрос
-
Возникла необходимость связать 2 программы - одна создаёт отчёты в htm, вторая импортирует csv.
у самого разобраться не получилось.
пример htm:
<HTML><HEAD><meta http-equiv="refresh" content="150";><TITLE>Current Conditions at , </TITLE></HEAD><BODY background="Clouds.jpg"><P><FONT size=4></FONT></P><P><TABLE border=0 cellSpacing=0 cellPadding=0 width="90%" align=center height=50 > </TABLE></P><P align=center><FONT size=5 color=darkred><STRONG><A NAME = "Current">Current Weather Conditions at , </A></STRONG></FONT></P><P align=center><FONT size=4 color=darkred><STRONG><A NAME = "Current">As of: 10.06.16 16:41</A></STRONG></FONT></P><TABLE cellspacing=1 cellpadding=0 width="85%" align=center border=1> <TR height=20> <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Temperature:</FONT></STRONG></TD> <TD Width="22%" align=left> <b> 22.8°C</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Dewpoint:</FONT></STRONG></TD> <TD Width=100 align=left><b>5.7°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Humidity:</FONT></STRONG></TD> <TD Width=100 align=left><b> 33% </b></TD> <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Wind Chill:</FONT></STRONG></TD> <TD Width="11%" align=left><b>22.8°C</b></TD> </TR> <TR height=20> <TD Width="15%"><STRONG><FONT face="Tw Cen MT">Wind:</FONT></STRONG> </TD> <TD Width="18%" align=left><b> SSW at 1.8 m/s</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">THW Index:</FONT></STRONG></TD> <TD Width=100 align=left><b>21.8°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Barometer:</STRONG></FONT></TD> <TD Width=100 align=left> <b> 754.2 mm and Falling Slowly</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Heat Index:</FONT></STRONG></TD> <TD Width=100 align=left><b> 21.8°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Today's Rain:</FONT></STRONG></TD> <TD Width=100 align=left> <b>0.0 mm</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Monthly Rain:</FONT></STRONG></TD> <TD Width=100 align=left><b>13.5 mm</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Storm Total:</FONT></STRONG></TD> <TD Width=100 align=left><b>0.0 mm</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Yearly Rain:</FONT></STRONG></TD> <TD Width=100 align=left><b>101.3 mm</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Current Rain Rate:</FONT></STRONG></TD> <TD Width=100 align=left> <b>0.0 mm/hr</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Solar Radiation:</FONT></STRONG></TD> <TD Width=100 align=left><b>517 W/m?</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">UV:</FONT></STRONG></TD> <TD Width=100 align=left><b>--- index</b></TD> </TR> </TABLE><br><P align=center><FONT size=4 color=darkred><STRONG><A NAME = "Current">Sunrise: 4:41 Sunset: 20:37</A></STRONG></FONT></P> <TABLE align=center> <TR width="100%"><TD> <font size = 4 face="verdana,arial" color=darkred><STRONG>This Site Powered by:</STRONG></font></TD></TR><TR align=center><TD><a href="http://www.davisnet.com" target=HI><img src="Davis Logo.jpg"></a></TD></TR></TABLE></P><TABLE cellspacing=1 cellpadding=0 width="85%" align=center border=1> </BODY></HTML>
в результате нужны получить:
"Temperature:"," 22.8°C","Dewpoint:","5.7°C" "Humidity:"," 33% ","Wind Chill:","22.8°C" "Wind: "," SSW at 1.8 m/s","THW Index:","21.8°C" "Barometer:"," 754.2 mm and Falling Slowly","Heat Index:"," 21.8°C" "Today's Rain:"," 0.0 mm","Monthly Rain:","13.5 mm" "Storm Total:","0.0 mm","Yearly Rain:","101.3 mm" "Current Rain Rate:"," 0.0 mm/hr","Solar Radiation:","517 W/m?" "UV:","--- index","","" ", This Site Powered by:","","","" ", ","","",""
если быть точным - интересуют поля "Solar Radiation:","517 W/m?"
нужно что бы работало через командную строку, циклически читало htm и перезаписывало csv через определённый промежуток времени (например 5 секунд)
спасибо за информацию.
13 июня 2016 г. 14:49
Ответы
-
PowerShell:
1) Если требуется читать из файла, то раскомментировать строку убрав # :
#$wb = Get-Content C:\html\file.html -Raw
2) Для сохранения в файл
$result|ConvertTo-Csv-NoTypeInformation|Select-Skip1
заменить на
$result|ConvertTo-Csv-NoTypeInformation|Select-Skip1 | Out-File C:\result.csv
$TableNumber = 1 $result = @() $wb = @' <HTML><HEAD><meta http-equiv="refresh" content="150";><TITLE>Current Conditions at , </TITLE></HEAD><BODY background="Clouds.jpg"><P><FONT size=4></FONT></P><P><TABLE border=0 cellSpacing=0 cellPadding=0 width="90%" align=center height=50 > </TABLE></P><P align=center><FONT size=5 color=darkred><STRONG><A NAME = "Current">Current Weather Conditions at , </A></STRONG></FONT></P><P align=center><FONT size=4 color=darkred><STRONG><A NAME = "Current">As of: 10.06.16 16:41</A></STRONG></FONT></P><TABLE cellspacing=1 cellpadding=0 width="85%" align=center border=1> <TR height=20> <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Temperature:</FONT></STRONG></TD> <TD Width="22%" align=left> <b> 22.8°C</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Dewpoint:</FONT></STRONG></TD> <TD Width=100 align=left><b>5.7°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Humidity:</FONT></STRONG></TD> <TD Width=100 align=left><b> 33% </b></TD> <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Wind Chill:</FONT></STRONG></TD> <TD Width="11%" align=left><b>22.8°C</b></TD> </TR> <TR height=20> <TD Width="15%"><STRONG><FONT face="Tw Cen MT">Wind:</FONT></STRONG> </TD> <TD Width="18%" align=left><b> SSW at 1.8 m/s</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">THW Index:</FONT></STRONG></TD> <TD Width=100 align=left><b>21.8°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Barometer:</STRONG></FONT></TD> <TD Width=100 align=left> <b> 754.2 mm and Falling Slowly</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Heat Index:</FONT></STRONG></TD> <TD Width=100 align=left><b> 21.8°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Today's Rain:</FONT></STRONG></TD> <TD Width=100 align=left> <b>0.0 mm</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Monthly Rain:</FONT></STRONG></TD> <TD Width=100 align=left><b>13.5 mm</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Storm Total:</FONT></STRONG></TD> <TD Width=100 align=left><b>0.0 mm</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Yearly Rain:</FONT></STRONG></TD> <TD Width=100 align=left><b>101.3 mm</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Current Rain Rate:</FONT></STRONG></TD> <TD Width=100 align=left> <b>0.0 mm/hr</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Solar Radiation:</FONT></STRONG></TD> <TD Width=100 align=left><b>517 W/m?</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">UV:</FONT></STRONG></TD> <TD Width=100 align=left><b>--- index</b></TD> </TR> </TABLE><br><P align=center><FONT size=4 color=darkred><STRONG><A NAME = "Current">Sunrise: 4:41 Sunset: 20:37</A></STRONG></FONT></P> <TABLE align=center> <TR width="100%"><TD> <font size = 4 face="verdana,arial" color=darkred><STRONG>This Site Powered by:</STRONG></font></TD></TR><TR align=center><TD><a href="http://www.davisnet.com" target=HI><img src="Davis Logo.jpg"></a></TD></TR></TABLE></P><TABLE cellspacing=1 cellpadding=0 width="85%" align=center border=1> </BODY></HTML> '@ #$wb = Get-Content C:\html\file.html -Raw $WebRequest = New-Object -ComObject "HTMLFile" $WebRequest.IHTMLDocument2_write($wb) ## Extract the tables out of the web request $tables = @($WebRequest.getElementsByTagName("TABLE")) $table = $tables[$TableNumber] $titles = @() $rows = @($table.Rows) ## Go through all of the rows in the table foreach($row in $rows) { $cells = @($row.Cells) ## If we’ve found a table header, remember its titles if($cells[0].tagName -eq "TH") { $titles = @($cells | % { ("" + $_.InnerText).Trim() }) continue } ## If we haven’t found any table headers, make up names "P1", "P2", etc. if(-not $titles) { $titles = @(1..($cells.Count + 2) | % { "P$_" }) } ## Now go through the cells in the the row. For each, try to find the ## title that represents that column and create a hashtable mapping those ## titles to content $resultObject = [Ordered] @{} for($counter = 0; $counter -lt $cells.Count; $counter++) { $title = $titles[$counter] if(-not $title) { continue } $resultObject[$title] = ("" + $cells[$counter].InnerText).Trim() } ## And finally cast that hashtable to a PSCustomObject $result += [PSCustomObject] $resultObject } $result | ConvertTo-Csv -NoTypeInformation | Select -Skip 1
Вывод:
Ps. Если нужна только одна строка:(gc C:\html\file.html -Raw) -replace " "," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]} Solar Radiation: 517 W/m?
- Изменено Kazun 13 июня 2016 г. 16:05
- Предложено в качестве ответа Vector BCOModerator 13 июня 2016 г. 20:09
- Помечено в качестве ответа Anton Sashev Ivanov 14 июня 2016 г. 6:01
13 июня 2016 г. 15:38
Все ответы
-
PowerShell:
1) Если требуется читать из файла, то раскомментировать строку убрав # :
#$wb = Get-Content C:\html\file.html -Raw
2) Для сохранения в файл
$result|ConvertTo-Csv-NoTypeInformation|Select-Skip1
заменить на
$result|ConvertTo-Csv-NoTypeInformation|Select-Skip1 | Out-File C:\result.csv
$TableNumber = 1 $result = @() $wb = @' <HTML><HEAD><meta http-equiv="refresh" content="150";><TITLE>Current Conditions at , </TITLE></HEAD><BODY background="Clouds.jpg"><P><FONT size=4></FONT></P><P><TABLE border=0 cellSpacing=0 cellPadding=0 width="90%" align=center height=50 > </TABLE></P><P align=center><FONT size=5 color=darkred><STRONG><A NAME = "Current">Current Weather Conditions at , </A></STRONG></FONT></P><P align=center><FONT size=4 color=darkred><STRONG><A NAME = "Current">As of: 10.06.16 16:41</A></STRONG></FONT></P><TABLE cellspacing=1 cellpadding=0 width="85%" align=center border=1> <TR height=20> <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Temperature:</FONT></STRONG></TD> <TD Width="22%" align=left> <b> 22.8°C</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Dewpoint:</FONT></STRONG></TD> <TD Width=100 align=left><b>5.7°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Humidity:</FONT></STRONG></TD> <TD Width=100 align=left><b> 33% </b></TD> <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Wind Chill:</FONT></STRONG></TD> <TD Width="11%" align=left><b>22.8°C</b></TD> </TR> <TR height=20> <TD Width="15%"><STRONG><FONT face="Tw Cen MT">Wind:</FONT></STRONG> </TD> <TD Width="18%" align=left><b> SSW at 1.8 m/s</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">THW Index:</FONT></STRONG></TD> <TD Width=100 align=left><b>21.8°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Barometer:</STRONG></FONT></TD> <TD Width=100 align=left> <b> 754.2 mm and Falling Slowly</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Heat Index:</FONT></STRONG></TD> <TD Width=100 align=left><b> 21.8°C</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Today's Rain:</FONT></STRONG></TD> <TD Width=100 align=left> <b>0.0 mm</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Monthly Rain:</FONT></STRONG></TD> <TD Width=100 align=left><b>13.5 mm</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Storm Total:</FONT></STRONG></TD> <TD Width=100 align=left><b>0.0 mm</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Yearly Rain:</FONT></STRONG></TD> <TD Width=100 align=left><b>101.3 mm</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">Current Rain Rate:</FONT></STRONG></TD> <TD Width=100 align=left> <b>0.0 mm/hr</b></TD> <TD Width=200><STRONG><FONT face="Tw Cen MT">Solar Radiation:</FONT></STRONG></TD> <TD Width=100 align=left><b>517 W/m?</b></TD> </TR> <TR height=20> <TD Width=200><STRONG><FONT face="Tw Cen MT">UV:</FONT></STRONG></TD> <TD Width=100 align=left><b>--- index</b></TD> </TR> </TABLE><br><P align=center><FONT size=4 color=darkred><STRONG><A NAME = "Current">Sunrise: 4:41 Sunset: 20:37</A></STRONG></FONT></P> <TABLE align=center> <TR width="100%"><TD> <font size = 4 face="verdana,arial" color=darkred><STRONG>This Site Powered by:</STRONG></font></TD></TR><TR align=center><TD><a href="http://www.davisnet.com" target=HI><img src="Davis Logo.jpg"></a></TD></TR></TABLE></P><TABLE cellspacing=1 cellpadding=0 width="85%" align=center border=1> </BODY></HTML> '@ #$wb = Get-Content C:\html\file.html -Raw $WebRequest = New-Object -ComObject "HTMLFile" $WebRequest.IHTMLDocument2_write($wb) ## Extract the tables out of the web request $tables = @($WebRequest.getElementsByTagName("TABLE")) $table = $tables[$TableNumber] $titles = @() $rows = @($table.Rows) ## Go through all of the rows in the table foreach($row in $rows) { $cells = @($row.Cells) ## If we’ve found a table header, remember its titles if($cells[0].tagName -eq "TH") { $titles = @($cells | % { ("" + $_.InnerText).Trim() }) continue } ## If we haven’t found any table headers, make up names "P1", "P2", etc. if(-not $titles) { $titles = @(1..($cells.Count + 2) | % { "P$_" }) } ## Now go through the cells in the the row. For each, try to find the ## title that represents that column and create a hashtable mapping those ## titles to content $resultObject = [Ordered] @{} for($counter = 0; $counter -lt $cells.Count; $counter++) { $title = $titles[$counter] if(-not $title) { continue } $resultObject[$title] = ("" + $cells[$counter].InnerText).Trim() } ## And finally cast that hashtable to a PSCustomObject $result += [PSCustomObject] $resultObject } $result | ConvertTo-Csv -NoTypeInformation | Select -Skip 1
Вывод:
Ps. Если нужна только одна строка:(gc C:\html\file.html -Raw) -replace " "," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]} Solar Radiation: 517 W/m?
- Изменено Kazun 13 июня 2016 г. 16:05
- Предложено в качестве ответа Vector BCOModerator 13 июня 2016 г. 20:09
- Помечено в качестве ответа Anton Sashev Ivanov 14 июня 2016 г. 6:01
13 июня 2016 г. 15:38 -
Спасибо.
по порядку:
1 у меня windows 7 x64, был повэршел версии 2.0, обновился до 3.0 отсюда:
https://www.microsoft.com/en-us/download/details.aspx?id=34595
2 исходный код отработал на отлично, модернизировал под себя:
$TableNumber = 1
$result = @()
$wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw
$WebRequest = New-Object -ComObject "HTMLFile"
$WebRequest.IHTMLDocument2_write($wb)
## Extract the tables out of the web request
$tables = @($WebRequest.getElementsByTagName("TABLE"))
$table = $tables[$TableNumber]
$titles = @()
$rows = @($table.Rows)
## Go through all of the rows in the table
foreach($row in $rows)
{
$cells = @($row.Cells)
## If we’ve found a table header, remember its titles
if($cells[0].tagName -eq "TH")
{
$titles = @($cells | % { ("" + $_.InnerText).Trim() })
continue
}
## If we haven’t found any table headers, make up names "P1", "P2", etc.
if(-not $titles)
{
$titles = @(1..($cells.Count + 2) | % { "P$_" })
}
## Now go through the cells in the the row. For each, try to find the
## title that represents that column and create a hashtable mapping those
## titles to content
$resultObject = [Ordered] @{}
for($counter = 0; $counter -lt $cells.Count; $counter++)
{
$title = $titles[$counter]
if(-not $title) { continue }
$resultObject[$title] = ("" + $cells[$counter].InnerText).Trim()
}
## And finally cast that hashtable to a PSCustomObject
$result += [PSCustomObject] $resultObject
}
$result | ConvertTo-Csv -NoTypeInformation | Select -Skip 1 | Out-File D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm.csv
3 дополнительно сделан .cmd файл (для запуска цикла)
:start1 set process1=powershell.exe powershell %~dp02.ps1 goto checker1 :check1 cls echo Process %process1% is running... :checker1 tasklist /FI "IMAGENAME eq %process1%" /NH | findstr /i "%process1%">nul if %errorLevel% == 0 goto :check1 ping -n 60 localhost > Nul goto :start1
4. ещё остался вопрос построке Solar Radiation: 517 W/m? - заменил
$wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw
на
$wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw -replace " "," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]}
выдало ошибку:
C:\Windows\system32>powershell D:\Temp\27\2.ps1 Get-Content : Не удается найти параметр, соответствующий имени параметра "repla ce". D:\Temp\27\2.ps1:6 знак:72 + $wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw -repla ce ... + ~~~~~~ ~~ + CategoryInfo : InvalidArgument: (:) [Get-Content], ParameterBin dingException + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.PowerShell.Comm ands.GetContentCommand
ещё раз спасибо. основная задача решена )
14 июня 2016 г. 10:34 -
Тут скобки пропущены:
$wb = (Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw) -replace " "," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]}
14 июня 2016 г. 10:56 -
ошибка:
Не удается индексировать в массив NULL. D:\Temp\27\Weather_CSV.ps1:6 знак:149 + ... /b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (:) [], RuntimeException + FullyQualifiedErrorId : NullArray
14 июня 2016 г. 12:30