Hi,
POSH newb here...I know just enough to steal other people's code and stitch it together. I have a script that extracts text from XML files and dumps it to a table in an HTML report. Recently the contents of the XML files changed so that the element containing
the text now contains HTML in addition to the desired text content, so "This is text" is now:
<html><body><div data-format="PresentationML" data-version="2.0" class="wysiwyg"><p>This is text</p></div></body></html>
I've found several articles that have a simple RegEx, e.g.
$1 = '"<html><body><div data-format="PresentationML" data-version="2.0" class="wysiwyg"><p>This is text</p></div></body></html>"'
$1 = $1 -replace '<[^>]+>',''
That RegEx works on a single line, but if I add it to my script the resulting table is blank. Here is the working script:
<#
Open file dialog from here: https://gallery.technet.microsoft.com/scriptcenter/GUI-popup-FileOpenDialog-babd911d
#>
#File selection dialog
$openFileDialog = New-Object windows.forms.openfiledialog
$openFileDialog.initialDirectory = [System.IO.Directory]::GetCurrentDirectory()
$openFileDialog.title = "Select File to Import"
$openFileDialog.filter = "All files (*.xml)| *.XML*"
$openFileDialog.ShowHelp = $True
Write-Host "Select Downloaded Settings File... (see FileOpen Dialog)" -ForegroundColor Green
$result = $openFileDialog.ShowDialog()
$result
if ($result -eq "OK") {
Write-Host "Selected Downloaded Settings File:" -ForegroundColor Green
$OpenFileDialog.filename
Write-Host "Import Settings File Imported!" -ForegroundColor Green
}
else { Write-Host "Import Settings File Cancelled!" -ForegroundColor Yellow}
#Formatting for HTML
$a = "<style>"
$a = $a + "BODY{background-color:white;}"
$a = $a + "TABLE{border-width: 3px;border-style: solid;border-color: black;border-collapse: collapse;}"
$a = $a + "TH{border-width: 3px;padding: 3px;border-style: solid;border-color: black;}"
$a = $a + "TD{border-width: 3px;padding: 3px;border-style: solid;border-color: black;}"
$a = $a + "</style>"
#Extract content
$Record = ([xml](Get-Content $OpenFileDialog.filename)).log.record
#Remove HTML tags
#$Record = $Record -replace '<[^>]+>',''
#Get file creation date & assign to variable
$Created = (Get-Item $OpenFileDialog.filename).LastWriteTime.ToString('yyyy-MM-dd')
#Hashtable used to select desired elements
$props = @(
@{n = 'Date'; e = {$_.messageInfo.messageTimestamp}; }
@{n = 'Sender'; e = {$_.initiator.user.companyUserEmail}; }
@{n = 'Content'; e = {$_.messageInfo.content."#cdata-section"}; }
#Generate HTML report using date as file name
$Record | Select $props | ConvertTo-Html -Head $a > ($Created + ".html"))
Any insight on how to get the RegEx working, or an alternative approach, will be deeply appreciated.
Thanks,
M-