Remove HTML from array RRS feed

  • Question

  • Hi,

    POSH newb here...I know just enough to steal other people's code and stitch it together. I have a script that extracts text from XML files and dumps it to a table in an HTML report. Recently the contents of the XML files changed so that the element containing the text now contains HTML in addition to the desired text content, so "This is text" is now:

    <html><body><div data-format="PresentationML" data-version="2.0" class="wysiwyg"><p>This is text</p></div></body></html>

    I've found several articles that have a simple RegEx, e.g.

    $1 = '"<html><body><div data-format="PresentationML" data-version="2.0" class="wysiwyg"><p>This is text</p></div></body></html>"'
    $1 = $1 -replace '<[^>]+>',''

    That RegEx works on a single line, but if I add it to my script the resulting table is blank. Here is the working script:

    Open file dialog from here: https://gallery.technet.microsoft.com/scriptcenter/GUI-popup-FileOpenDialog-babd911d
    #File selection dialog
    $openFileDialog = New-Object windows.forms.openfiledialog   
    $openFileDialog.initialDirectory = [System.IO.Directory]::GetCurrentDirectory()   
    $openFileDialog.title = "Select File to Import"   
    $openFileDialog.filter = "All files (*.xml)| *.XML*"   
    $openFileDialog.ShowHelp = $True   
    Write-Host "Select Downloaded Settings File... (see FileOpen Dialog)" -ForegroundColor Green  
    $result = $openFileDialog.ShowDialog()   
    if ($result -eq "OK") {    
        Write-Host "Selected Downloaded Settings File:"  -ForegroundColor Green  
        Write-Host "Import Settings File Imported!" -ForegroundColor Green 
    else { Write-Host "Import Settings File Cancelled!" -ForegroundColor Yellow} 
    #Formatting for HTML
    $a = "<style>"
    $a = $a + "BODY{background-color:white;}"
    $a = $a + "TABLE{border-width: 3px;border-style: solid;border-color: black;border-collapse: collapse;}"
    $a = $a + "TH{border-width: 3px;padding: 3px;border-style: solid;border-color: black;}"
    $a = $a + "TD{border-width: 3px;padding: 3px;border-style: solid;border-color: black;}"
    $a = $a + "</style>"
    #Extract content
    $Record = ([xml](Get-Content $OpenFileDialog.filename)).log.record
    #Remove HTML tags
    #$Record =  $Record -replace  '<[^>]+>',''
    #Get file creation date & assign to variable
    $Created = (Get-Item $OpenFileDialog.filename).LastWriteTime.ToString('yyyy-MM-dd')
    #Hashtable used to select desired elements
    $props = @(
        @{n = 'Date'; e = {$_.messageInfo.messageTimestamp}; }
        @{n = 'Sender'; e = {$_.initiator.user.companyUserEmail}; }
        @{n = 'Content'; e = {$_.messageInfo.content."#cdata-section"}; }
    #Generate HTML report using date as file name
    $Record | Select $props | ConvertTo-Html -Head $a > ($Created + ".html"))

    Any insight on how to get the RegEx working, or an alternative approach, will be deeply appreciated.



    Thursday, June 28, 2018 6:00 PM

All replies

  • Could you post the content of $Record? try [string] instead of [xml].
    • Edited by DumbleD0re Thursday, July 18, 2019 1:35 PM
    Thursday, July 18, 2019 1:32 PM