Introduction

SharePoint has an out-of-box feature to extract metadata such as To, From, CC and Subject from Exchange's EML file and save it in a column, however, SharePoint cannot do the same for MSG files hence this article describes how to use power shell to extract metadata from MSG files and update a SharePoint list.

Setting the scene

Consider a scenario where you have several MSG files that you want to store in a SharePoint document library and in the process extract metadata from it. The MSG files are in a document library and a list will have the metadata and somehow a relationship is built between them.

Requirements

1. PowerShell with SharePoint2010 or 2013 module

2. Notepad to edit the PS1 file

3. Temporary location to extract the Exchange MSG files

4. An open source application to unzip MSG files. In this example we are using 7-zip.

5. A SharePoint document library and list.

Document Library and List

A document library called 'EmailDocs' currently exists with several MSG files present there. We also have a custom list called 'EmailDocs2' with new columns added. Below are screenshots of both EmailDocs and EmailDocs2's settings.

EmailDocs

Below is a screenshot of the Subject column. If you have created the EmailDocs with the Content Type as Document the you should have a 'Copy Source' column available.

EmailDocs2

The PowerShell scripts

There are two scripts, one is to download the MSG files from the document library and second is to extract the metadata and add them to the list. You can also CLICK HERE to down the ZIP file that has both the scripts combined.

Download the MSG files

In the below script, add the destination folder location, URL of your web application and complete URL of your document library.

######################## Start Variables ########################

 $destination = "C:\Temp\scripts\msgextract\msgextract\msg"

 $webUrl = "http://sp13test.christianfamily.biz/"

 $listUrl = "http://sp13test.christianfamily.biz/EmailDocs/"

 ##############################################################

 

 $web = Get-SPWeb -Identity $webUrl

 $list = $web.GetList($listUrl)

 

 function ProcessFolder {

 param($folderUrl)

 $folder = $web.GetFolder($folderUrl)

 foreach ($file in $folder.Files) {

 #Ensure destination directory

 $destinationfolder = $destination

 if (!(Test-Path -path $destinationfolder))

 {

 $dest = New-Item $destinationfolder -type directory

 }

 #Download file

 $binary = $file.OpenBinary()

 $stream = New-Object System.IO.FileStream($destinationfolder + "/" + $file.Name), Create

 $writer = New-Object System.IO.BinaryWriter($stream)

 $writer.write($binary)

 $writer.Close()

 }

}

 

 #Download root files

 ProcessFolder($list.RootFolder.Url)

 #Download files in folders

 foreach ($folder in $list.Folders) {

 ProcessFolder($folder.Url)

 }

 

 

 

Extract the metadata

 

 For the script to run successfully the below three separate folders are created. They are,

 1. MSG - This is where the MSG files are temporarily downloaded.

 2. BIN - This is where the 7-Zip application in saved.

 3. MSGEXTRACTED - This is where the MSG files are extracted. In order for this script to be reusable, this folder is created at the beginning and then deleted at the end.

 

#msgextract.ps1

 #Version 1 by Daniel Christian

 

 Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue

 

 #date

 $date = Get-Date -format F

 

 #Set directories

 $bin = "C:\Temp\scripts\msgextract\msgextract\bin"

 $BaseDir = "C:\Temp\scripts\msgextract\msgextract"

 $CapDir = $hst + "_" + $date

 $msgDir = $BaseDir + "\msg\"

 

 # Destination site collection

 $WebURL = "http://sp13test.christianfamily.biz"

 

 # Destination list name

 $listName = "EmailDocs2"

 

 #Get the SPWeb object and save it to a variable

 $web = Get-SPWeb -identity $WebURL

 

 #Get the SPList object to retrieve the list

 $list = $web.Lists[$listName]

 

 #Get all items in this list and save them to a variable

 $items = $list.items

 

 #Array to Hold Result - PSObjects

 $ListItemCollection = @()

 

 #Add a new folder name

 $foldernumber = 0

 

 #7zip

 $7z = "C:\Temp\scripts\msgextract\msgextract\bin\7z.exe"

 

 New-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg -ItemType directory

 

 Function meta_extract($_msg){

 New-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber\ -ItemType directory

 $OutputDir = $BaseDir + "\extractedmsg\" + $foldernumber

 &$7z x $_msg -oC:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber\ |out-null

 }

 

 Function fileSorter{

 $msg_filters =

 @(

 @{ptrn='0037'; class="Subject"},

 @{ptrn='0070'; class="Topic"},

 @{ptrn='0044'; class="Rcvd repr name"},

 @{ptrn='0C1A'; class="Sender name"},

 @{ptrn='0E03'; class="CC"},

 @{ptrn='0E04'; class="To"},

 @{ptrn='1046'; class="From"}

 )

 

 $data = gci C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber

 

 #Adding items to the list

 $newItem = New-Object PSObject

 $newItem = $list.items.Add()

 

 foreach($f in $data){

 foreach($c in $msg_filters){

 if ($f -match $c.ptrn){

 $OutName = "C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber" + $c.class + ".txt"

 $msgdata = gc C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber\$f|out-file $outName

 (gc $outName) -replace "`0", "" | sc $outName

 $tmphtml = "<div><msg>" + $c.class + "</msg>"

 $tmpdata = gc $outName

 $ListItemCollection += $tmpdata

 $tmphtml = "<p>" + $tmpdata + "</p></div>"

 

 #Adding items to the list

 if ($c.class -match "subject"){

 $newItem["Subject"] = $tmpdata

 }

 

 if ($c.class -match "Rcvd repr name"){

 $newItem["To"] = $tmpdata

 }

 

 if ($c.class -match "CC"){

 $newItem["CC"] = $tmpdata

 }

 

 if ($c.class -match "sender name"){

 $newItem["From"] = $tmpdata

 }

 

 }

 }

 }

 $newItem["Topic"] = $_.Name

 $newItem.Update()

 Remove-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg\$foldernumber -recurse

 }

 

 Function Main{

 $files = Get-ChildItem $msgDir\*.msg

 $files|foreach{

 meta_extract $_

 Write-host $_.Name.replace(".msg","").replace("FW ","").replace("Fw ","").replace("RE ","").replace("Re ","") -foregroundcolor black -backgroundcolor yellow

 Write-host "Metadata Extracted!" -foregroundcolor red -backgroundcolor yellow

 fileSorter $_

 $foldernumber += 1

 }

 Remove-Item -Path C:\Temp\scripts\msgextract\msgextract\extractedmsg -recurse

Remove-Item -Path C:\Temp\scripts\msgextract\msgextract\msg\*.msg -recurse

}

Main

Video

Take a look at the below video which walks you through the different steps of the above two scripts. This video also demonstrates the difference between EML and MSG files:

(Click Here to view a high resolution view)

Conclusion

You can take the above method a step further to download only the new MSG files that have been added, extract the metadata from it and update the list. This can very easily be done by adding a Yes/No column with default No and calling it something similar to 'DataExtracted' such that after the data is extracted the script will change the No to Yes. Also, you will have to add both the scripts to a Task Schedule on the server to run as often as you would like it.