Regular Expressions (REGEX): Basic symbols

Kory Thacher

Welcome back to the RegEx guide. Last post we talked a little bit about the basics of RegEx and its uses. I mentioned the most important thing is to understand the symbols. Today we’ll ease in with some of the basics to get us going, but later we will expand on these and see some other options we have.

. is used to represent any single character, aside from a newline, so it will feel very similar to the windows wildcard ?

\ is the escape character for RegEx, the escape character has two jobs:

  1. Take special properties away from special characters: \. would be used to represent a literal dot character. \\ is used for a literal back slash character.
  2. Add special properties to a normal character: \d is used to look for any digit (we’ll see more of these in a bit)

We can use {} to specify quantity in a few different ways by attaching them to characters or symbols.

  1. {exact number} so something like \d{2} says “look for exactly two digits”
  2. {min,max} so something like \d{2,4} says “look for at least two digits, but keep grabbing them until you have more than 4”
  3. {min,} will check for a minimum with no max cap, so \d{2,} says “look for at least 2 digits, but keep grabbing them until you see something that isn’t a digit”
  4. + is a shortcut for {1,0} so you can say “one or more”
  5. * is a shortcut for {0,} so you can say “zero or more” (be careful with that one!)
  6. ? is {0,1} so you can say “this may or may not be here”. Could be useful for links that may or may not have an “s” for “http”/”https”

Character classes like \d are the real meat & potatoes for building out RegEx, and getting some useful patterns. These are case sensitive (lowercase), and we will talk about the uppercase version in another post. Three of these are the most common to get started:

  1. \d looks for digits
  2. \s looks for whitespace
  3. \w looks for word characters
  4. We will talk about \p in a future post to match more specific symbol groups.

Lets put it together and try a couple things. We’ll still use -match and $matches[0] for now, but we’ll use some other things to leverage RegEx once we are comfortable with the basic symbols.

We’ll use the same shell as we had in the last post and the same MOCK_DATA as before. This time, lets match emails. Try it yourself first!

Hint:

The emails seem to all be First letter followed by last name, so just some word character. Then an @ symbol, more word character, then a dot, then more word characters!

Answer:

"\w+@\w+\.\w+"

Putting it together:

#grab our data
$file = get-content "$PSScriptRoot\MOCK_DATA.txt"

#make our pattern
$regex = "\w+@\w+\.\w+"

#loop through each line
foreach ($line in $file)
{
#if our line contains our pattern, write the matched data to the screen
if($line -match $regex)
{
$matches[0]
}
}

Results:

bseamon0@bbc.co
jbrotherwood1@house.gov
jduffan2@google.ru
eleates3@home.pl
spaquet4@about.com
ltrainer5@squarespace.com
kgrotty6@pinterest.com
chilliam7@amazon.co
mlumber8@reference.com
chuitson9@free.fr
ntrewa@imgur.com
iferneyhoughb@jigsy.com
washlingc@slideshare.net
acrushamd@flavors.me
blundbecke@unblog.fr
aadairf@spiegel.de
bwilderg@photobucket.com
mcurrmh@shareasale.com
baberkirderi@netscape.com
rgrzelewskij@twitpic.com
rproomk@reddit.com
tnernl@deviantart.com
cgodartm@reverbnation.com
xbosdetn@xing.com
ktippetto@webs.com
ameneyerp@illinois.edu
jhicksq@amazon.com
bspoorsr@answers.com
lbriffetts@businessweek.com
tmethringhamt@instagram.com
mberryu@businessinsider.com
mschankev@blog.com
lgoodredgew@tinyurl.com
dgoaksx@timesonline.co
ncornuauy@about.com
msculleyz@wisc.edu
abenettolo10@dot.gov
ipaaso11@cdc.gov
hdowse12@usatoday.com
splacidi13@dyndns.org
rdadswell14@newsvine.com
csalsberg15@telegraph.co
cpimmocke16@senate.gov
jvader17@disqus.com
amerton18@jimdo.com
eclitsome19@clickbank.net
jmelmore1a@elpais.com
hscotney1b@soundcloud.com
rcouling1c@statcounter.com
ecowpland1d@myspace.com

Once again, you can find it on git here.

Hope you’re enjoying RegEx so far, and starting to see how it can be pretty useful! Next time we will take a look at grouping to extract different pieces of data, and using [regex]instead of just $matches.

As always, don’t forget to rate, comment and share! Let me know what you think of the content and what topics you’d like to see me blog about in the future.

0 comments

Discussion is closed.

Feedback usabilla icon