Regular Expressions (REGEX): Introduction

Kory Thacher

Hi all, this week I’ll be talking about Regular Expressions. I’ve got a few posts planned to get you set up and going with some basic Regex.

Regex is used for extracting and validating data. Essentially, you can think of Regex as windows wild cards on steroids. Anytime we need to match data with a little more clarity than the *s and ?s that windows gives us, we have Regex.

Regex has a reputation for being difficult and confusing, but it really isn’t so bad when you get used to it. The biggest contributors to Regex’s reputation are:

  1. Regex uses its own set of symbols, from PowerShell we need to generate plain-text Regex strings and send them into the parser. This means escaping special PowerShell symbols to get them passed correctly.
  2. Regex is confusing to read, but easier to write. I like to joke that Regex is a write only language. because when you see data, and write a pattern in plain English, its not so bad to build the pattern out of symbols. However, when you see a bunch of symbols by themselves it looks like a bunch of spaghetti code. Additionally, there are a bunch of different ways people might build a pattern for the same data. When you use Regex, make sure to leave friendly comments for anyone viewing your code later.

With that in mind, let’s take a look at a sample about why you should care, and then in later posts we will break it down and learn more.

Maybe we have some fake data, like this:

MOCK_DATA

We’ll work with just numbers in this case and try to extract those phone numbers. In plain English, we can look at the data and say all the phone numbers break down like this:

  1. 3 numbers
  2. dash character
  3. 3 more numbers
  4. dash character
  5. 4 more numbers

Now, we could get false positives, but since we can see the data we can call it “good enough” 🙂

In Regex, we can use \d to say “look for a number” and {min,max} to specify a quantity. We’ll talk more about these symbols later. With that in mind, our pattern could look something like \d{3}-\d{3}-\d{4}

Now, to use regex, I’m going to utilize -Match and the built in variable $matches[0], which will hold the matched data. All we need to do is put these pieces together:

#grab our data
$file = get-content "$PSScriptRoot\MOCK_DATA.txt"

#make our pattern
$regex = "\d{3}-\d{3}-\d{4}"

#loop through each lin
foreach ($line in $file)
{
#if our line contains our pattern, write the matched data to the screen
if($line -match $regex)
{
$matches[0]
}
}

Results:

982-674-7597
275-545-2825
275-609-0729
570-808-4168
726-131-4847
912-974-5105
351-131-8303
938-281-7352
737-424-9922
198-238-7774
199-866-6315
967-153-4550
730-103-5861
464-747-2670
473-232-5315
173-795-8209
424-484-7750
388-383-4977
328-526-8012
710-232-3341
537-744-9215
343-679-9591
404-643-4727
654-476-2559
986-109-0938
199-790-8042
340-974-7318
522-411-1281
874-705-5922
982-223-7617
456-820-5936
157-781-8516
508-552-8426
913-814-8741
318-716-1850
198-231-8411
148-900-9662
544-416-2598
353-429-1125
316-568-4160
425-256-2700
790-673-7772
493-734-9005
813-496-0519
981-114-6637
763-797-9753
820-648-4784
824-511-8491
293-878-6488
832-704-8998

You can find the code in this GitHub repo

Hopefully this gets you excited about Regex! In a couple weeks I’ll do another post breaking down the basic symbols.

As always, don’t forget to rate, comment and share! Let me know what you think of the content and what topics you’d like to see me blog about in the future.

0 comments

Discussion is closed.

Feedback usabilla icon