Hi all, this week I’ll be talking about Regular Expressions. I’ve got a few posts planned to get you set up and going with some basic Regex.
Regex is used for extracting and validating data. Essentially, you can think of Regex as windows wild cards on steroids. Anytime we need to match data with a little more clarity than the *s and ?s that windows gives us, we have Regex.
Regex has a reputation for being difficult and confusing, but it really isn’t so bad when you get used to it. The biggest contributors to Regex’s reputation are:
- Regex uses its own set of symbols, from PowerShell we need to generate plain-text Regex strings and send them into the parser. This means escaping special PowerShell symbols to get them passed correctly.
- Regex is confusing to read, but easier to write. I like to joke that Regex is a write only language. because when you see data, and write a pattern in plain English, its not so bad to build the pattern out of symbols. However, when you see a bunch of symbols by themselves it looks like a bunch of spaghetti code. Additionally, there are a bunch of different ways people might build a pattern for the same data. When you use Regex, make sure to leave friendly comments for anyone viewing your code later.
With that in mind, let’s take a look at a sample about why you should care, and then in later posts we will break it down and learn more.
Maybe we have some fake data, like this:
We’ll work with just numbers in this case and try to extract those phone numbers. In plain English, we can look at the data and say all the phone numbers break down like this:
- 3 numbers
- dash character
- 3 more numbers
- dash character
- 4 more numbers
Now, we could get false positives, but since we can see the data we can call it “good enough” 🙂
In Regex, we can use \d
to say “look for a number” and {min,max}
to specify a quantity. We’ll talk more about these symbols later. With that in mind, our pattern could look something like \d{3}-\d{3}-\d{4}
Now, to use regex, I’m going to utilize -Match
and the built in variable $matches[0]
, which will hold the matched data. All we need to do is put these pieces together:
#grab our data $file = get-content "$PSScriptRoot\MOCK_DATA.txt" #make our pattern $regex = "\d{3}-\d{3}-\d{4}" #loop through each lin foreach ($line in $file) { #if our line contains our pattern, write the matched data to the screen if($line -match $regex) { $matches[0] } }
Results:
982-674-7597 275-545-2825 275-609-0729 570-808-4168 726-131-4847 912-974-5105 351-131-8303 938-281-7352 737-424-9922 198-238-7774 199-866-6315 967-153-4550 730-103-5861 464-747-2670 473-232-5315 173-795-8209 424-484-7750 388-383-4977 328-526-8012 710-232-3341 537-744-9215 343-679-9591 404-643-4727 654-476-2559 986-109-0938 199-790-8042 340-974-7318 522-411-1281 874-705-5922 982-223-7617 456-820-5936 157-781-8516 508-552-8426 913-814-8741 318-716-1850 198-231-8411 148-900-9662 544-416-2598 353-429-1125 316-568-4160 425-256-2700 790-673-7772 493-734-9005 813-496-0519 981-114-6637 763-797-9753 820-648-4784 824-511-8491 293-878-6488 832-704-8998
You can find the code in this GitHub repo
Hopefully this gets you excited about Regex! In a couple weeks I’ll do another post breaking down the basic symbols.
As always, don’t forget to rate, comment and share! Let me know what you think of the content and what topics you’d like to see me blog about in the future.
0 comments