Welcome back to the RegEx crash course. Last time we talked about the basic symbols we plan to use as our foundation. This week, we will be learning a new way to leverage our patterns for data extraction and how to rip our extracted data into pieces we care about.
[RegEx]
The [Regex]
data type has some cool static members, but we’re mostly going to play with the plural method matches(<data>,<pattern>)
if you don’t know what static members are you can check this post or this help data.
A lot of the time, when we work with RegEx we are using it to extract everything that matches our pattern in a large amount of data. Using $matches
like we did in the previous posts means we have to write a lot of looping and if statements. With [regex]::matches()
we can condense all that and it could work on a big blob of text instead of just a list of individual lines. This means that if there is more than 1 match per line we can still get it!
If we take a look at some sample data that it returns, we can see that we actually get a pretty rich match object:
Groups : {0} Success : True Name : 0 Captures : {0} Index : 3534 Length : 23 Value : ecowpland1d@myspace.com
The thing we care about is the value
property, but you’ll notice it even tells you the starting character and how many characters long it is.
Let’s take a look at how we might modify the email match from the earlier post to use this:
#grab our data as one big blob (-raw) $file = get-content "$PSScriptRoot\MOCK_DATA.txt" -raw #make our pattern $regex = "\w+@\w+\.\w+" #extract all matches and display the value property [RegEx]::Matches($file,$regex).value
Now it all looks a lot sleeker, go [RegEx]!
Grouping
Grouping is a way that we can logically break up our extraction. I use this for 2 main reasons:
- The data I want isn’t unique on its own, but the data around it is. Now I can match the unique piece and rip out what I want to use.
- The data I want is all there, but I plan to use pieces of it for different things
Grouping can be done by wrapping sections of your pattern in parenthesis. The full pattern will always match as group 0
, which is why we were typing $matches[0]
to start. Each individual group then gets pulled out in numerical order.
Maybe we grab some data by copy/pasting out of outlook and it looks like this: Brenda Seamon <bseamon0@bbc.co.uk>
If its in a large block of text we might want to use RegEx to extract it like we have before. This time, we want to grab the First, Last, and Email to use for things. They’re all in our data, and we can use grouping to pull them out individually.
Let’s start by finding a pattern that gets all of our data. I used this one: $pattern = "\w+\s+\w+\s+<\w+@\w+\.\w+>"
- 1+ word characters (first name)
- 1+ space characters
- 1+ word characters (last name)
- 1+ space characters
- <
- The rest of our email pattern, like we used before.
- >
We can see that words with our test:
$data = "Brenda Seamon <bseamon0@bbc.co>" $pattern = "\w+\s+\w+\s+<\w+@\w+\.\w+>" $data -match $pattern $matches[0]
Now that we know it works, lets try grouping up the pieces we want by putting parens around the first, last and email sections: $pattern = "(\w+)\s+(\w+)\s+"
$data = "Brenda Seamon <bseamon0@bbc.co>" $pattern = "(\w+)\s+(\w+)\s<(\w+@\w+\.\w+)>" $data -match $pattern $matches[0] "All match: {0} First name: {1} Last name: {2} Email: {3} " -f $matches[0],$matches[1],$matches[2],$matches[3]
All match: Brenda Seamon &amp;amp;lt;bseamon0@bbc.co&amp;amp;gt; First name: Brenda Last: name Seamon Email: bseamon0@bbc.co
We can also name these groups using ?<NAME>
inside of the parens. This makes our pattern start to look really bananas if we saw it without context $pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>"
$data = "Brenda Seamon <bseamon0@bbc.co>" $pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>" $data -match $pattern $matches[0] "All match: {0} First name: {1} Last name: {2} Email: {3} " -f $matches[0],$matches["first"],$matches["last"],$matches["email"]
Notice $matches
is a hash table, and our group names become the keys. Let’s try grabbing the groups using [RegEx]
$data = "Brenda Seamon <bseamon0@bbc.co>" $pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>" $results = [Regex]::Matches($data,$Pattern) $results[0].groups["email"].value
Its a bit more work, since we need to keep track of which result we are on in a group of matches. This scales nicely for looping though!
$data = "Brenda Seamon <bseamon0@bbc.co>, Joeann Brotherwood <jbrotherwood1@house.gov>, Jake Duffan <jduffan2@google.ru>" $pattern = "(?<first>\w+)\s+(?<last>\w+)\s+<(?<email>\w+@\w+\.\w+)>" $results = [Regex]::Matches($data,$Pattern) $people = @() foreach($person in $results) { $obj = [pscustomobject]@{ "First Name" = $person.Groups["first"].value "Last Name" = $person.Groups["last"].value Email = $person.Groups["email"].value } $people += $obj } $People
First Name Last Name Email ---------- --------- ----- Brenda Seamon bseamon0@bbc.co Joeann Brotherwood jbrotherwood1@house.gov Jake Duffan jduffan2@google.ru
Hopefully you’ve had fun playing with RegEx so far. We will take a look at some other symbols and some little tricks we can use grouping and ?
for in future posts!
As always, don’t forget to rate, comment and share! Let me know what you think of the content and what topics you’d like to see me blog about in the future.
Nicely done Kory ! I have learned something with regex today.
This is how I would have written the output of your code, just because I love to keep things short:
$result | foreach {
[PSCustomObject]@{
FirstName = $_.Groups[‘first’].value
LastName = $_.Groups[‘last’].value
Email = $_.Groups[’email’].value
}
}