Use Windows PowerShell to Parse RSS Feeds

Doctor Scripto

Summary: Microsoft PowerShell MVP, Will Anderson, talks about using Windows PowerShell to parse RSS feeds.

Microsoft Scripting Guy, Ed Wilson, is here. Today I welcome recent Windows PowerShell MVP and new guest blogger, Will Anderson

Hi there fellow scripters!

Last week as I took my seat on a connecting flight in New York from beautiful Charlotte, North Carolina, my thoughts drifted to the wonderful experiences and memories that I took with me from the Windows PowerShell Summit. One thought in particular crossed my mind. It was a challenge posed to me by a member of the Windows PowerShell team.

“Will,” he said, “One of the challenges our team faces is juggling back and forth between working on new releases and managing the feedback we receive from the community.”

My interest was piqued as he continued, “A couple of times a week, we have to review all of the feedback on Connect. Depending on what we’re working on, we usually triage the bug reports or the feature requests, and we’re usually looking for the ones with the most votes so we know which ones the community is asking for most. We don’t have time to whip up a new tool, so do you think you’d be up to the challenge of creating one?”

Challenge accepted!

Initially, I attempted to leverage Invoke-WebRequest against a search URL on the Connect website. But parsing through the data, which consisted of many pages, would have been too cumbersome, and it would not necessarily retrieve the data I wanted.

Instead, I decided to request the Most Recent Requests RSS feed to retrieve the data I wanted. So I start by invoking the web request against that page and output it to an XML page to parse through:

Invoke-WebRequest -Uri 'https://connect.microsoft.com/rss/99/RecentFeedbackForConnection.xml' -OutFile C:\scripts\ConnectFeed.xml

Now, I can reference the file I created and start diving into the XML for our data:

   [xml]$Content = Get-Content C:\scripts\ConnectFeed.xml

  $Feed = $Content.rss.channel

Now that we have something to look at, let’s gather some data. We’ll start by getting the last time the request was updated, the description, category, author, and of course, the link so the Windows PowerShell team can go straight to the article when they’re ready to work on it.

Of course, the data we’re looking for is in multiple properties called Item, so we’re going to have our script go through them recursively to pull the data we want. Note that I’m stating the $msg.updated object is going to be designated as a DateTime object. This is to make sure that we can do some cool filtering by date and time later.

   ForEach ($msg in $Feed.Item){

   [PSCustomObject]@{

    'LastUpdated' = [datetime]$msg.updated

    'Description' = $msg.description

    'Category' = $msg.category

    'Author' = $msg.author

    'Link' = $msg.link

   }#EndPSCustomObject

  }#EndForEach

Now we get the following return:

Image of command output

So far, that looks good. But looking at the description, I see some data that I think would be really useful as objects on their own. I’m referring mostly to the Up-Votes and Down-Votes, but I think we could break out pretty much everything after that break.

So let’s split off that section of the string into useable chunks and get rid of any oddball spaces in the line:

(($msg.description).split('<BR>')).split(',').trim()

I execute again and we get this:

Image of command output

Aha! Now we have all of this magnificent data, but it’s string data, and not very sortable. So I'm going to create some rules to fix that issue. Let’s start by focusing on those last seven string objects I created overall, and specifically target the Up-Votes and Down-Votes for our test. Then I’ll add them to our PSCustomObject:

   ForEach ($msg in $Feed.Item){

   $ParseData = (($msg.description).split('<BR>')).split(',').trim() | Select-Object -Last 7

   ForEach ($Datum in $ParseData){

    If ($Datum -like "*up*"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf

    If ($Datum -like "*down*"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf

   }#EndForEach

   [PSCustomObject]@{

    'LastUpdated' = [datetime]$msg.updated

    'Description' = $msg.description

    'Category' = $msg.category

    'Author' = $msg.author

    'Link' = $msg.link

    'UpVotes' = $Upvote

    'DownVotes' = $Downvote

   }#EndPSCustomObject

When I execute this, we get the following:

Image of command output

Now we’re cooking with gas! Let’s go ahead and add the rest of our objects to the mix. I'll create a rule for each object in the mix:

   ForEach ($msg in $Feed.Item){

   $ParseData = (($msg.description).split('<BR>')).split(',').trim() | Select-Object -Last 7

   ForEach ($Datum in $ParseData){

    If ($Datum -like "*up*"){[int]$Upvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf

    If ($Datum -like "*down*"){[int]$Downvote = ($Datum).split(' ') | Select-Object -First 1}#EndIf

    If ($Datum -like "*validations*"){[int]$Validation = ($Datum).split(' ') | Select-Object -First 1}#EndIf

    If ($Datum -like "*workarounds*"){[int]$Workaround = ($Datum).split(' ') | Select-Object -First 1}#EndIf

    If ($Datum -like "*comments*"){[int]$Comment = ($Datum).split(' ') | Select-Object -First 1}#EndIf

    If ($Datum -like "*feedback*"){[int]$FeedBackID = ($Datum).split(' ') | Select-Object -Last 1}#EndIf

   }#EndForEach

   [PSCustomObject]@{

    'LastUpdated' = [datetime]$msg.updated

    'Description' = $msg.description

    'Category' = $msg.category

    'Author' = $msg.author

    'Link' = $msg.link

    'UpVotes' = $Upvote

    'DownVotes' = $Downvote

    'Validations' = $Validation

    'WorkArounds' = $Workaround

    'Comments' = $Comment

    'FeedbackID' = $FeedBackID

   }#EndPSCustomObject

  }

Here is the result:

That looks a lot better—except we still have the text that we’ve sorted out left in the description. Let’s see if I can remove that.

I’ll do this by using the IndexOf method to find the pattern that matches the HTML line break tag (<BR>) that separates the description from the data we parsed out for our PSCustomObject. Then, I’ll peel away that and anything after it from the string data that we want by using the SubString method:

$Description = ($msg.description).Substring(0,($msg.description).IndexOf('<BR>'))

Now I replace the $msg.description in our PSCustomObject with our new description variable and execute the command. Here is the result:

Image of command output

We have a much cleaner, more readable view. I'll wrap this up in a function to make it an easy-to-use one-liner. Now our friends on the Windows PowerShell team can easily dig into the latest bug reports like so:

(Get-ConnectFeedback).where({$PSitem.LastUpdated -lt (Get-Date).AddDays(-21) -and $PSitem.Category -eq 'Bug' }) | Sort-Object UpVotes

Image of command output

We’ve managed to pull down an RSS feed, parse through the data to find the information that is useful to us, and separated it into useable objects to create a one-line script for easy reading. Now the Windows PowerShell team has a little more time to build awesome new features, and we’ve learned another way to use Windows PowerShell as an excellent tool for gathering information on the web!

You can download the entire script from the Script Center Repository: Parse RSS Feeds with PowerShell.

~Will

Thanks, Will, for writing this up and sharing it with our readers.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy 

0 comments

Discussion is closed.

Feedback usabilla icon