October 7th, 2016

PowerShell regex crash course – Part 2 of 5

Doctor Scripto
Scripter

Summary: Thomas Rayner, Microsoft Cloud and Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell.

Hello! I’m Thomas Rayner, a proud Cloud and Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (@MrThomasRayner), or posting on my blog, workingsysadmin.com. This week, I’m presenting a five-part crash course about how to use regular expressions in PowerShell. Regular expressions are sequences of characters that define a search pattern, mainly for use in pattern matching with strings. Regular expressions are extremely useful to extract information from text such as log files or documents. This isn’t meant to be a comprehensive series but rather, just as the name says, a crash course. So, buckle up!

Many people are intimidated by regular expressions, or “regex”. If you see something like ‘(\d{1,3}\.){3}(\d{1,3})’ and your eyes start glazing over, don’t worry. By the end of this series, you’ll have the skills to identify that pattern matches IP addresses. For the uninitiated, big strings of seemingly random characters appear indecipherable, but regex is an incredibly powerful tool that any PowerShell pro needs to have a grip on.

Today, I am going to introduce special characters. These characters within regex have a different meaning than you might assume. For example, \d is used to match “any digit”. Neat, right? You can use these special characters to unlock the power of regex and start to create more complex patterns. Let’s jump into some different special characters and some examples.

Special character Meaning Example
. (period) Matches any single character ‘something’ -match ‘some.hing’ #returns true because the ‘t’ is a single character that matches this pattern
\n Matches a newline character
@”
This is some multi
Line text
“@ -match ‘\n’ #returns true because there is a new line between ‘multi’ and ‘line’
\t Matches a tab character Works just like the newline except for tabs instead of new lines
\d Matches any digit (0-9) ‘testing123’ -match ‘\d’ #returns true because there are numbers present
\D Matches a non-digit ‘1234’ -match ‘\D’ #returns false because everything in the string is a number
\w Matches an alphanumeric character ‘hello123’ -match ‘\w’ #returns true because alpha numeric characters are present
\W Matches a non-alphanumeric character ‘hello123’ -match ‘\W’ #returns false because everything is an alphanumeric character
\s Matches a whitespace character ‘ ‘ -match ‘\s’ #returns true because between the quotation marks is a single space
\S Matches a non-whitespace character ‘ ‘ -match ‘\S’ #returns false because all the characters are whitespace
\ Use \ to escape special characters \. matches a dot and Error! Hyperlink reference not valid. matches a backslash
^ Matches the start of a string
$ Matches the end of a string

You could write something like this.

'something123' -match 'some.hing12\d'  #returns true

The pattern here is “the letters s o m e, any character, h i n g 1 2 and then any number”, which is a match for our string “something123”. Now consider these two examples.

'something123' -match '^\d'  #returns false

'something123' -match '\d$'  #returns true

The first one is false because the pattern is “the start of the string, followed by any number”, and that obviously doesn’t match our string. The second one is true because it means “any number followed by the end of the string”, which is clearly present in our “something123” example. You can start to see how you can string these patterns together to make more elaborate patterns.

Now, I’d like to introduce another mechanism in PowerShell to work with regex. There’s a [regex] accelerator that offers a whole bunch of functionality. So far, we’ve been working with boolean responses (true or false) to tell if a string matches a specific pattern. What about if you want to get the specific value in a string that matches a pattern? Let me show you.

[regex]::matches('abc123','\d').value  #returns 1, 2, 3 in an array

Here, I’m looking for the digits in “abc123” using the matches() method of the [regex] accelerator. The first parameter is the string to look within, and the second is the regex pattern to match. By default, lots of information is returned, and I’m just isolating the value property. I’ll get an array back of all the matches in that string. How about another example?

[regex]::matches('abc123','\d\d\d').value  #returns 123 in a string

At first glance, this looks very similar. The difference here is that the pattern I’m looking for is three digits, not just one. The pattern of three digits only occurs once, and so that is the value that gets returned.

Join me next week, and we’ll talk about grouping and character classes!

Thanks, Thomas!  You’ve really got my interest charged up with regex!  Time to see what fun I can have!

I invite you to follow the Scripting Guys on Twitter and Facebook. If you have any questions, send email to them at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow.

Until then, always remember that, with Great PowerShell comes Great Responsibility.

Sean Kearney Honorary Scripting Guy Cloud and Datacenter Management MVP

Author

The "Scripting Guys" is a historical title passed from scripter to scripter. The current revision has morphed into our good friend Doctor Scripto who has been with us since the very beginning.

0 comments

Discussion are closed.