none
Преобразование htm (html) в csv RRS feed

  • Вопрос

  • Возникла необходимость связать 2 программы - одна создаёт отчёты в htm, вторая импортирует csv.

    у самого разобраться не получилось.

    пример htm:

    <HTML><HEAD><meta http-equiv="refresh" content="150";><TITLE>Current Conditions at  ,  </TITLE></HEAD><BODY background="Clouds.jpg"><P><FONT size=4></FONT></P><P><TABLE border=0 cellSpacing=0 cellPadding=0 width="90%" align=center  height=50 > </TABLE></P><P align=center><FONT size=5  color=darkred><STRONG><A NAME = "Current">Current Weather Conditions at  ,  </A></STRONG></FONT></P><P align=center><FONT size=4  color=darkred><STRONG><A NAME = "Current">As of:  10.06.16 16:41</A></STRONG></FONT></P><TABLE  cellspacing=1 cellpadding=0 width="85%"  align=center  border=1>    <TR height=20>   <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Temperature:</FONT></STRONG></TD>    <TD Width="22%" align=left> <b> 22.8°C</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Dewpoint:</FONT></STRONG></TD>  <TD Width=100 align=left><b>5.7°C</b></TD>  </TR>   <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Humidity:</FONT></STRONG></TD>    <TD Width=100 align=left><b> 33% </b></TD>    <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Wind Chill:</FONT></STRONG></TD>  <TD Width="11%" align=left><b>22.8°C</b></TD>  </TR>   <TR height=20>   <TD Width="15%"><STRONG><FONT face="Tw Cen MT">Wind:</FONT></STRONG> </TD>    <TD Width="18%" align=left><b> SSW at 1.8&nbsp;m/s</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">THW Index:</FONT></STRONG></TD>  <TD Width=100 align=left><b>21.8°C</b></TD>  </TR>  <TR height=20>   <TD Width=200><STRONG><FONT face="Tw Cen MT">Barometer:</STRONG></FONT></TD>    <TD Width=100 align=left> <b> 754.2&nbsp;mm and Falling Slowly</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Heat Index:</FONT></STRONG></TD>  <TD Width=100 align=left><b> 21.8°C</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Today's Rain:</FONT></STRONG></TD>  <TD Width=100 align=left> <b>0.0&nbsp;mm</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Monthly Rain:</FONT></STRONG></TD>    <TD Width=100 align=left><b>13.5&nbsp;mm</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Storm Total:</FONT></STRONG></TD>    <TD Width=100 align=left><b>0.0&nbsp;mm</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Yearly Rain:</FONT></STRONG></TD>    <TD Width=100 align=left><b>101.3&nbsp;mm</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Current Rain Rate:</FONT></STRONG></TD>  <TD Width=100 align=left> <b>0.0&nbsp;mm/hr</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Solar Radiation:</FONT></STRONG></TD>    <TD Width=100 align=left><b>517&nbsp;W/m?</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">UV:</FONT></STRONG></TD>    <TD Width=100 align=left><b>---&nbsp;index</b></TD>  </TR> </TABLE><br><P align=center><FONT size=4  color=darkred><STRONG><A NAME = "Current">Sunrise:   4:41&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sunset:  20:37</A></STRONG></FONT></P>  <TABLE align=center>  <TR width="100%"><TD>  <font size = 4 face="verdana,arial" color=darkred><STRONG>This Site Powered by:</STRONG></font></TD></TR><TR align=center><TD><a href="http://www.davisnet.com" target=HI><img src="Davis Logo.jpg"></a></TD></TR></TABLE></P><TABLE  cellspacing=1 cellpadding=0 width="85%"  align=center  border=1>    </BODY></HTML>
    в результате нужны получить:
    "Temperature:","  22.8°C","Dewpoint:","5.7°C"
    "Humidity:"," 33% ","Wind Chill:","22.8°C"
    "Wind: "," SSW at 1.8 m/s","THW Index:","21.8°C"
    "Barometer:","  754.2 mm and Falling Slowly","Heat Index:"," 21.8°C"
    "Today's Rain:"," 0.0 mm","Monthly Rain:","13.5 mm"
    "Storm Total:","0.0 mm","Yearly Rain:","101.3 mm"
    "Current Rain Rate:"," 0.0 mm/hr","Solar Radiation:","517 W/m?"
    "UV:","--- index","",""
    ",   This Site Powered by:","","",""
    ", ","","",""
    
    если быть точным - интересуют поля "Solar Radiation:","517 W/m?"

    нужно что бы работало через командную строку, циклически читало htm и перезаписывало csv через определённый промежуток времени (например 5 секунд)

    спасибо за информацию.

    13 июня 2016 г. 14:49

Ответы

  • PowerShell:

    1) Если требуется читать из файла, то раскомментировать строку убрав # :

    #$wb = Get-Content C:\html\file.html -Raw

    2) Для сохранения в файл

    $result|ConvertTo-Csv-NoTypeInformation|Select-Skip1

    заменить на

    $result|ConvertTo-Csv-NoTypeInformation|Select-Skip1 | Out-File C:\result.csv

    $TableNumber = 1
    $result = @()
    
    $wb = @'
    <HTML><HEAD><meta http-equiv="refresh" content="150";><TITLE>Current Conditions at  ,  </TITLE></HEAD><BODY background="Clouds.jpg"><P><FONT size=4></FONT></P><P><TABLE border=0 cellSpacing=0 cellPadding=0 width="90%" align=center  height=50 > </TABLE></P><P align=center><FONT size=5  color=darkred><STRONG><A NAME = "Current">Current Weather Conditions at  ,  </A></STRONG></FONT></P><P align=center><FONT size=4  color=darkred><STRONG><A NAME = "Current">As of:  10.06.16 16:41</A></STRONG></FONT></P><TABLE  cellspacing=1 cellpadding=0 width="85%"  align=center  border=1>    <TR height=20>   <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Temperature:</FONT></STRONG></TD>    <TD Width="22%" align=left> <b> 22.8°C</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Dewpoint:</FONT></STRONG></TD>  <TD Width=100 align=left><b>5.7°C</b></TD>  </TR>   <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Humidity:</FONT></STRONG></TD>    <TD Width=100 align=left><b> 33% </b></TD>    <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Wind Chill:</FONT></STRONG></TD>  <TD Width="11%" align=left><b>22.8°C</b></TD>  </TR>   <TR height=20>   <TD Width="15%"><STRONG><FONT face="Tw Cen MT">Wind:</FONT></STRONG> </TD>    <TD Width="18%" align=left><b> SSW at 1.8&nbsp;m/s</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">THW Index:</FONT></STRONG></TD>  <TD Width=100 align=left><b>21.8°C</b></TD>  </TR>  <TR height=20>   <TD Width=200><STRONG><FONT face="Tw Cen MT">Barometer:</STRONG></FONT></TD>    <TD Width=100 align=left> <b> 754.2&nbsp;mm and Falling Slowly</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Heat Index:</FONT></STRONG></TD>  <TD Width=100 align=left><b> 21.8°C</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Today's Rain:</FONT></STRONG></TD>  <TD Width=100 align=left> <b>0.0&nbsp;mm</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Monthly Rain:</FONT></STRONG></TD>    <TD Width=100 align=left><b>13.5&nbsp;mm</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Storm Total:</FONT></STRONG></TD>    <TD Width=100 align=left><b>0.0&nbsp;mm</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Yearly Rain:</FONT></STRONG></TD>    <TD Width=100 align=left><b>101.3&nbsp;mm</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Current Rain Rate:</FONT></STRONG></TD>  <TD Width=100 align=left> <b>0.0&nbsp;mm/hr</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Solar Radiation:</FONT></STRONG></TD>    <TD Width=100 align=left><b>517&nbsp;W/m?</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">UV:</FONT></STRONG></TD>    <TD Width=100 align=left><b>---&nbsp;index</b></TD>  </TR> </TABLE><br><P align=center><FONT size=4  color=darkred><STRONG><A NAME = "Current">Sunrise:   4:41&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sunset:  20:37</A></STRONG></FONT></P>  <TABLE align=center>  <TR width="100%"><TD>  <font size = 4 face="verdana,arial" color=darkred><STRONG>This Site Powered by:</STRONG></font></TD></TR><TR align=center><TD><a href="http://www.davisnet.com" target=HI><img src="Davis Logo.jpg"></a></TD></TR></TABLE></P><TABLE  cellspacing=1 cellpadding=0 width="85%"  align=center  border=1>    </BODY></HTML>
    '@
    
    #$wb = Get-Content C:\html\file.html -Raw
    $WebRequest  = New-Object -ComObject "HTMLFile"
    $WebRequest.IHTMLDocument2_write($wb) 
    
    
    
    ## Extract the tables out of the web request
    
    $tables = @($WebRequest.getElementsByTagName("TABLE"))
    
    $table = $tables[$TableNumber]
    
    $titles = @()
    
    $rows = @($table.Rows)
    
    ## Go through all of the rows in the table
    
    foreach($row in $rows)
    {
    
        $cells = @($row.Cells)
    
        
    
        ## If we’ve found a table header, remember its titles
    
        if($cells[0].tagName -eq "TH")
    
        {
    
            $titles = @($cells | % { ("" + $_.InnerText).Trim() })
    
            continue
    
        }
    
        ## If we haven’t found any table headers, make up names "P1", "P2", etc.
    
        if(-not $titles)
    
        {
    
            $titles = @(1..($cells.Count + 2) | % { "P$_" })
    
        }
    
        ## Now go through the cells in the the row. For each, try to find the
    
        ## title that represents that column and create a hashtable mapping those
    
        ## titles to content
    
        $resultObject = [Ordered] @{}
    
        for($counter = 0; $counter -lt $cells.Count; $counter++)
    
        {
    
            $title = $titles[$counter]
    
            if(-not $title) { continue }
    
            
    
            $resultObject[$title] = ("" + $cells[$counter].InnerText).Trim()
    
        }
    
        ## And finally cast that hashtable to a PSCustomObject
    
       $result += [PSCustomObject] $resultObject
    
    } 
    
    
    $result | ConvertTo-Csv -NoTypeInformation | Select -Skip 1

    Вывод:


    Ps. Если нужна только одна строка:

    (gc C:\html\file.html -Raw) -replace "&nbsp;"," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]}
    Solar Radiation: 517 W/m?

    • Изменено Kazun 13 июня 2016 г. 16:05
    • Предложено в качестве ответа Vector BCOModerator 13 июня 2016 г. 20:09
    • Помечено в качестве ответа Anton Sashev Ivanov 14 июня 2016 г. 6:01
    13 июня 2016 г. 15:38

Все ответы

  • PowerShell:

    1) Если требуется читать из файла, то раскомментировать строку убрав # :

    #$wb = Get-Content C:\html\file.html -Raw

    2) Для сохранения в файл

    $result|ConvertTo-Csv-NoTypeInformation|Select-Skip1

    заменить на

    $result|ConvertTo-Csv-NoTypeInformation|Select-Skip1 | Out-File C:\result.csv

    $TableNumber = 1
    $result = @()
    
    $wb = @'
    <HTML><HEAD><meta http-equiv="refresh" content="150";><TITLE>Current Conditions at  ,  </TITLE></HEAD><BODY background="Clouds.jpg"><P><FONT size=4></FONT></P><P><TABLE border=0 cellSpacing=0 cellPadding=0 width="90%" align=center  height=50 > </TABLE></P><P align=center><FONT size=5  color=darkred><STRONG><A NAME = "Current">Current Weather Conditions at  ,  </A></STRONG></FONT></P><P align=center><FONT size=4  color=darkred><STRONG><A NAME = "Current">As of:  10.06.16 16:41</A></STRONG></FONT></P><TABLE  cellspacing=1 cellpadding=0 width="85%"  align=center  border=1>    <TR height=20>   <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Temperature:</FONT></STRONG></TD>    <TD Width="22%" align=left> <b> 22.8°C</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Dewpoint:</FONT></STRONG></TD>  <TD Width=100 align=left><b>5.7°C</b></TD>  </TR>   <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Humidity:</FONT></STRONG></TD>    <TD Width=100 align=left><b> 33% </b></TD>    <TD Width="33%"><STRONG><FONT face="Tw Cen MT">Wind Chill:</FONT></STRONG></TD>  <TD Width="11%" align=left><b>22.8°C</b></TD>  </TR>   <TR height=20>   <TD Width="15%"><STRONG><FONT face="Tw Cen MT">Wind:</FONT></STRONG> </TD>    <TD Width="18%" align=left><b> SSW at 1.8&nbsp;m/s</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">THW Index:</FONT></STRONG></TD>  <TD Width=100 align=left><b>21.8°C</b></TD>  </TR>  <TR height=20>   <TD Width=200><STRONG><FONT face="Tw Cen MT">Barometer:</STRONG></FONT></TD>    <TD Width=100 align=left> <b> 754.2&nbsp;mm and Falling Slowly</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Heat Index:</FONT></STRONG></TD>  <TD Width=100 align=left><b> 21.8°C</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Today's Rain:</FONT></STRONG></TD>  <TD Width=100 align=left> <b>0.0&nbsp;mm</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Monthly Rain:</FONT></STRONG></TD>    <TD Width=100 align=left><b>13.5&nbsp;mm</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Storm Total:</FONT></STRONG></TD>    <TD Width=100 align=left><b>0.0&nbsp;mm</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Yearly Rain:</FONT></STRONG></TD>    <TD Width=100 align=left><b>101.3&nbsp;mm</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Current Rain Rate:</FONT></STRONG></TD>  <TD Width=100 align=left> <b>0.0&nbsp;mm/hr</b></TD>    <TD Width=200><STRONG><FONT face="Tw Cen MT">Solar Radiation:</FONT></STRONG></TD>    <TD Width=100 align=left><b>517&nbsp;W/m?</b></TD>  </TR>  <TR height=20>    <TD Width=200><STRONG><FONT face="Tw Cen MT">UV:</FONT></STRONG></TD>    <TD Width=100 align=left><b>---&nbsp;index</b></TD>  </TR> </TABLE><br><P align=center><FONT size=4  color=darkred><STRONG><A NAME = "Current">Sunrise:   4:41&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sunset:  20:37</A></STRONG></FONT></P>  <TABLE align=center>  <TR width="100%"><TD>  <font size = 4 face="verdana,arial" color=darkred><STRONG>This Site Powered by:</STRONG></font></TD></TR><TR align=center><TD><a href="http://www.davisnet.com" target=HI><img src="Davis Logo.jpg"></a></TD></TR></TABLE></P><TABLE  cellspacing=1 cellpadding=0 width="85%"  align=center  border=1>    </BODY></HTML>
    '@
    
    #$wb = Get-Content C:\html\file.html -Raw
    $WebRequest  = New-Object -ComObject "HTMLFile"
    $WebRequest.IHTMLDocument2_write($wb) 
    
    
    
    ## Extract the tables out of the web request
    
    $tables = @($WebRequest.getElementsByTagName("TABLE"))
    
    $table = $tables[$TableNumber]
    
    $titles = @()
    
    $rows = @($table.Rows)
    
    ## Go through all of the rows in the table
    
    foreach($row in $rows)
    {
    
        $cells = @($row.Cells)
    
        
    
        ## If we’ve found a table header, remember its titles
    
        if($cells[0].tagName -eq "TH")
    
        {
    
            $titles = @($cells | % { ("" + $_.InnerText).Trim() })
    
            continue
    
        }
    
        ## If we haven’t found any table headers, make up names "P1", "P2", etc.
    
        if(-not $titles)
    
        {
    
            $titles = @(1..($cells.Count + 2) | % { "P$_" })
    
        }
    
        ## Now go through the cells in the the row. For each, try to find the
    
        ## title that represents that column and create a hashtable mapping those
    
        ## titles to content
    
        $resultObject = [Ordered] @{}
    
        for($counter = 0; $counter -lt $cells.Count; $counter++)
    
        {
    
            $title = $titles[$counter]
    
            if(-not $title) { continue }
    
            
    
            $resultObject[$title] = ("" + $cells[$counter].InnerText).Trim()
    
        }
    
        ## And finally cast that hashtable to a PSCustomObject
    
       $result += [PSCustomObject] $resultObject
    
    } 
    
    
    $result | ConvertTo-Csv -NoTypeInformation | Select -Skip 1

    Вывод:


    Ps. Если нужна только одна строка:

    (gc C:\html\file.html -Raw) -replace "&nbsp;"," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]}
    Solar Radiation: 517 W/m?

    • Изменено Kazun 13 июня 2016 г. 16:05
    • Предложено в качестве ответа Vector BCOModerator 13 июня 2016 г. 20:09
    • Помечено в качестве ответа Anton Sashev Ivanov 14 июня 2016 г. 6:01
    13 июня 2016 г. 15:38
  • Спасибо.

    по порядку:

    1 у меня windows 7 x64, был повэршел версии 2.0, обновился до 3.0 отсюда:

    https://www.microsoft.com/en-us/download/details.aspx?id=34595

    2 исходный код отработал на отлично, модернизировал под себя:

    $TableNumber = 1
    $result = @()



    $wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw
    $WebRequest  = New-Object -ComObject "HTMLFile"
    $WebRequest.IHTMLDocument2_write($wb)



    ## Extract the tables out of the web request

    $tables = @($WebRequest.getElementsByTagName("TABLE"))

    $table = $tables[$TableNumber]

    $titles = @()

    $rows = @($table.Rows)

    ## Go through all of the rows in the table

    foreach($row in $rows)
    {

        $cells = @($row.Cells)

        

        ## If we’ve found a table header, remember its titles

        if($cells[0].tagName -eq "TH")

        {

            $titles = @($cells | % { ("" + $_.InnerText).Trim() })

            continue

        }

        ## If we haven’t found any table headers, make up names "P1", "P2", etc.

        if(-not $titles)

        {

            $titles = @(1..($cells.Count + 2) | % { "P$_" })

        }

        ## Now go through the cells in the the row. For each, try to find the

        ## title that represents that column and create a hashtable mapping those

        ## titles to content

        $resultObject = [Ordered] @{}

        for($counter = 0; $counter -lt $cells.Count; $counter++)

        {

            $title = $titles[$counter]

            if(-not $title) { continue }

            

            $resultObject[$title] = ("" + $cells[$counter].InnerText).Trim()

        }

        ## And finally cast that hashtable to a PSCustomObject

       $result += [PSCustomObject] $resultObject

    }


    $result | ConvertTo-Csv -NoTypeInformation | Select -Skip 1 | Out-File D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm.csv


    3 дополнительно сделан .cmd файл (для запуска цикла)

    :start1
    set process1=powershell.exe 
    powershell %~dp02.ps1
    goto checker1
    :check1
    cls
    echo Process %process1% is running...
    :checker1
    tasklist /FI "IMAGENAME eq %process1%" /NH | findstr /i "%process1%">nul
    if %errorLevel% == 0 goto :check1
    ping -n 60 localhost > Nul
    goto :start1


    4. ещё остался вопрос построке Solar Radiation: 517 W/m? - заменил  

    $wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw

    на

    $wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw -replace "&nbsp;"," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]}

    выдало ошибку:

    C:\Windows\system32>powershell D:\Temp\27\2.ps1
    Get-Content : Не удается найти параметр, соответствующий имени параметра "repla
    ce".
    D:\Temp\27\2.ps1:6 знак:72
    + $wb = Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw -repla
    ce  ...
    +                                                                        ~~~~~~
    ~~
        + CategoryInfo          : InvalidArgument: (:) [Get-Content], ParameterBin
       dingException
        + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.PowerShell.Comm
       ands.GetContentCommand
    

    ещё раз спасибо. основная задача решена )








    14 июня 2016 г. 10:34
  • Тут скобки пропущены:
    $wb = (Get-Content D:\Temp\27\Weather_Summary_Vantage_Pro_Plus.htm -Raw) -replace "&nbsp;"," " -match "(Solar Radiation:).+<b>(.+)\B</b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]}
    14 июня 2016 г. 10:56
  • ошибка:

    Не удается индексировать в массив NULL.
    D:\Temp\27\Weather_CSV.ps1:6 знак:149
    + ... /b></TD>" | % {"{0} {1}" -f $matches[1],$matches[2]}
    +                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
        + FullyQualifiedErrorId : NullArray


    14 июня 2016 г. 12:30