Use PowerShell to Parse Email Message Headers—Part 1

Summary: Guest Blogger Thiyagu teaches how to use Windows PowerShell to parse and analyze email message headers.

Microsoft Scripting Guy Ed Wilson here. Thiyagarajan Parthiban is our guest blogger today with an interesting article about using Windows PowerShell to analyze Exchange email. First, though, let’s learn something about Thiyagu.

I am the founder of the Singapore PowerShell User Group. I am an Exchange administrator, and I have been scripting for more than seven years now. Before Windows PowerShell, I did most of my scripting in VBScript. With Windows PowerShell, I automate Exchange/Active Directory tasks, and I am also good at WMI, ADSI, and generating custom reports. I have developed custom applications in C# for automation. I love to automate things!

You will find me on my blog.

Take it away, Thiyagu!

Last year, a survey was conducted to figure out the number of email messages sent every day across the globe. It was estimated that approximately 294 billion messages per day are sent, which is 2.8 million messages per second. By the way, 90 percent of all email is either spam or viruses.

Each and every email message you send or receive has a piece of information in it called a message header. Every email message you receive in your inbox has this information. There are different ways to view this message header, depending on which email client you are using. For example, here are instructions for getting the message header of an email address if you are using Outlook 2010. The following figure shows how a message header looks.

This text information contains a lot of details about the message you have received. RFC 822 tells how to place information about an email message into this header.

For now, focus on the green box in the preceding figure. This section has information about how this message got to your inbox. For an email to come to your inbox, it takes so many routes. That is our main focus in today’s post: we want to get this data parsed out of this messy text and present it in a nice little table so that you can understand what really happened with that email message.

Whenever you try to make sense of an email message header, read it from the bottom up. The above piece of code has only four lines and is cut into different lines. Here is how it looks after removing the unwanted lines:

You see, it looks already a lot cleaner, after you remove those extra lines. Take a look at the boxes and circles in the image above, and read it like this:

Received the email from Server “Corp.red.com ([16.25.5.17])” by Server “Singapore.red.com ([15.60.22.16])” with protocol “mapi id 14.01.0323.002” and at time “Wed, 13 Jul 2011 18:50:16 +0800” , which is UTC +8 (which is Singapore), you can tell from the server which received the message it says “Singapore.red.com”

If that did not make much sense, it might help to visualize it like in the following figure.

Now, this server will send the message again to another server and so on. Each trip from server to server is called a hop. The hop chain continues until it finally reaches your inbox. At times, there might be a delay when going through one of these chains. Maybe a server was busy or there was too much load, which could cause delays to email being delivered to your inbox. In this example, we have four lines, so we have four hops for the email to reach your mailbox.

Enough of theory. Let’s talk about Windows PowerShell, starting with another figure.

We have to extract the piece of information between the above-mentioned sections to form our objects. You can see from the preceding screenshot that it has a pattern. Luckily, this is where regular expressions come to the rescue. Read this great article by PowerShell MVP Tome about regular expressions and Windows PowerShell.

We need to get four pieces of information from each line:

All text after Received: from until there is a word called by. This will be our Received From Server information.
All text after by until there is a word called with. This info will be the server who receives the email from the server above.
All text after with, until there is a character ; (a semicolon) and this is the protocol.
All text after ; (a semicolon) and get the next minimum 32 to 36 space/nonspace data.

This data is the date. Here is a sample date:

Wed, 20 Jul 2011 22:28:16 -0700. This is the maximum possible for standard time, so we can get other data as well. Sometimes, there might be space or there is new line, so I am giving myself a buffer, so later we can remove unwanted data from this string.

Here is the regular expression pattern I came up with:

$regexFrom1 = ‘Received: from([\s\S]*?)by([\s\S]*?)with([\s\S]*?);([(\s\S)*]{32,36})(?:\s\S*?)’

Can you believe, that the above regular expression pattern can do all four of the things I said above? If you are good at Windows PowerShell and still haven’t used regular expressions, you are missing an important weapon in your Windows PowerShell arsenal.

Note Check out this webcast by Tome. It is a great introduction to regular expressions.

Because we do not know how the text is going to be in the message header, it is good to read the whole data as one long string and work with it. Here is the technique to do read a file into one big string.

$text = [System.IO.File]::OpenText(“C:\Scripts\msg6.txt”).ReadToEnd()

His file now has the same information as the first screenshot in this post. I wanted to write a function that would take this $text as input, process the string, give out all the parsed data, package it in an array of PSObjects, and return them. I used Select-String along with the regular expression pattern and iterated through all the matches I got.

Here is how I did that:

Function Process-ReceivedFrom

{

Param($text)

$regexFrom1 = ‘Received: from([\s\S]*?)by([\s\S]*?)with([\s\S]*?);([(\s\S)*]{32,36})(?:\s\S*?)’

$fromMatches = $text | Select-String -Pattern $regexFrom1 -AllMatches

if ($fromMatches)

{

$rfArray = @()

$fromMatches.Matches | foreach{

$from = Clean-string $_.groups[1].value

$by = Clean-string $_.groups[2].value

$with = Clean-string $_.groups[3].value

Switch -wildcard ($with)

{

“SMTP*” {$with = “SMTP”}

“ESMTP*” {$with = “ESMTP”}

default{}

}

$time = Clean-string $_.groups[4].value

$fromhash = @{

ReceivedFromFrom = $from

ReceivedFromBy = $by

ReceivedFromWith = $with

ReceivedFromTime = [Datetime]$time

}

$fromArray = New-Object -TypeName PSObject -Property $fromhash

$rfArray += $fromArray

}

$rfArray

}

else

{

return $null

}

To explain the regular expression a little bit:

‘Received: from([\s\S]*?)by([\s\S]*?)with([\s\S]*?);([(\s\S)*]{32,36})(?:\s\S*?)’

Each of those is matched into groups and then you can access them using the matches property. This is true except for the last one (in the world of regular expressions, “?:” means don’t group them). This is the class it will get stored in: Microsoft.PowerShell.Commands.MatchInfo.

I just loop through the matches and then build a PSObject for each of the matches. Now, if I output the results of the function to a gridview, I see what is shown in the following figure:

Read the next part tomorrow, where I show how I put the pieces together to get delay information from different hops and then finally to build a GUI tool for this functionality.

Thiyagu, this is an excellent article. Thank you for sharing your time with us and for sharing your expertise with the Windows PowerShell community. I am really looking forward to part 2 tomorrow!

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

Use PowerShell to Parse Email Message Headers—Part 1

Category

Topics

Author

0 comments

Read next

Analyze Email Message Headers with PowerShell—Part 2

Use PowerShell to Work with Any INI File