PowerShell regex crash course – Part 3 of 5

Doctor Scripto

Summary: Thomas Rayner, Microsoft Cloud & Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell.

Hello! I’m Thomas Rayner, a proud Cloud & Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (@MrThomasRayner), or posting on my blog, workingsysadmin.com. This week, I’m presenting a five-part crash course about how to use regular expressions in PowerShell. Regular expressions are sequences of characters that define a search pattern, mainly for use in pattern matching with strings. Regular expressions are extremely useful to extract information from text such as log files or documents. This isn’t meant to be a comprehensive series but rather, just as the name says: a crash course. So, buckle up!

Many people are intimidated by regular expressions, or “regex”. If you see something like ‘(\d{1,3}\.){3}(\d{1,3})’ and your eyes start glazing over, don’t worry. By the end of this series, you’ll have the skills to identify that pattern matches IP addresses. For the uninitiated, big strings of seemingly random characters appear indecipherable, but regex is an incredibly powerful tool that any PowerShell pro needs to have a grip on.

Already, we have introduced a few different regex concepts. We started with some quantifiers and then moved on to explain special characters. Now, let’s talk about character classes. Enter the wonderful world of brackets!

First, curly braces. That’s right: { and }. We use curly braces to signify the number of times that we want a specific pattern or character to occur in our matches. Let’s look at an example.

something123' -match '\d{3}'  #returns true

We know that the \d part of this pattern matches any digit. The part in the curly braces is new. What we’re saying here is that we want the part that comes right before the curly braces to occur the number of times shown within the curly braces. So, we want any digit, and we want that to happen three times. If we change that number, we change what we’re looking to match.

something123' -match '\d{4}'  #returns false

This comes back false because we want four digits in a row, and that pattern doesn’t exist in “something123”. We can also specify ranges.

something123' -match '\d{2,4}'  #returns true

Here, we’re saying, “I want any digit between two and four times”. The first number is the minimum number of occurrences, and the second is the maximum. If there are two numbers separated by a comma, then that’s the range of occurrence frequency. You can also go {2,} to specify “two or more times”.

Second, round () brackets. In regex, we can use round brackets to group things. Think of it like algebra; whatever is in the brackets will be evaluated and returned as a match together. Let’s put this together with what we learned about curly braces.

'hello123hello123hello123' -match '(hello123){3}'  #returns true

I’m telling PowerShell to look for the pattern “hello123” three times. “hello123” is the pattern that’s within the round brackets, and I want it three times. You can expand on this concept.

'hello123hello123 something else' -match '(hello123){1,4}\s?something'  #returns true

Maybe this isn’t the most practical example, but what I’m matching is “one to four occurrences of ‘hello123’ followed by zero or one whitespace, followed by ‘something'”. Because that pattern is present, this line returns true. Now, we are ready to examine the example in one of the earlier paragraphs more closely.

'' -match '(\d{1,3}\.){3}\d{1,3}'

Let’s break it down.

  • \d{1,3} – looking for one to three digits
  • \. – looking for a period

Therefore, \d{1,3}\. is looking for one to three digits, followed by a period

  • (\d{1,3}\.){3} – looking for three occurrences of “one to three digits followed by a period”

So, we have “three occurrences of ‘one to three digits followed by a period’ and then one to three more digits”. Sounds like an IP address, right? This would match, also, something like 999.830.60.450, which is not a valid IP address. The example still works, though.

What about square [] brackets? In regex, we use square brackets to denote a set. That might mean a range of characters or an array of characters that we’re interested in.

'something' -match '[f-q]$' #returns true

'something' -match '[h-q]$'  #returns false

In the first example, we’re looking for the pattern “a letter between f and q, followed by the end of the line”. Because “something” ends in g, which is between f and q, the pattern is a match. In the second example, we’re looking for “a letter between h and q, followed by the end of the line” which doesn’t exist since g falls outside that range. This is case sensitive in regex, but the –match operator doesn’t take case into account.

You can negate a set, too, using the ^ symbol. That is to say, match “not this character”.

'something' -match '[^q]$'  #returns true

'something' -match '[^g]$'  #returns false

The first example says, “something that is not a q followed by the end of the line” which matches our string of “something”. The next example says, “something that is not a g followed by the end of the line” which returns false, because “something” ends in a g followed by the end of the line.

If you like examples, stay tuned! Next Friday, I’m going to talk about lookaheads and lookbehinds, then the next day is ALL EXAMPLES!

Wow, Thomas!  I never knew working with regular expressions could be this easy! I’m going to have my calendar set to read up on next week’s post!  Can’t wait!

I invite you to follow the Scripting Guys on Twitter and Facebook. If you have any questions, send email to them at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow.

Until then, always remember that, with Great PowerShell comes Great Responsibility.

Sean Kearney Honorary Scripting Guy Cloud and Datacenter Management MVP


Discussion is closed.

Feedback usabilla icon