{"id":80355,"date":"2016-10-21T00:01:32","date_gmt":"2016-10-21T07:01:32","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/?p=80355"},"modified":"2019-02-18T09:10:24","modified_gmt":"2019-02-18T16:10:24","slug":"powershell-regex-crash-course-part-4-of-5","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/powershell-regex-crash-course-part-4-of-5\/","title":{"rendered":"PowerShell regex crash course \u2013 Part 4 of 5"},"content":{"rendered":"<p><strong>Summary<\/strong>: Thomas Rayner, Microsoft Cloud and Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell.<\/p>\n<p>Hello! I\u2019m Thomas Rayner, a proud Cloud &amp; Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (<a target=\"_blank\" href=\"http:\/\/twitter.com\/MrThomasRayner\">@MrThomasRayner<\/a>), or posting on my blog, <a target=\"_blank\" href=\"http:\/\/workingsysadmin.com\/\">workingsysadmin.com<\/a>. This week, I\u2019m presenting a five-part crash course about how to use regular expressions in PowerShell. Regular expressions are sequences of characters that define a search pattern, mainly for use in pattern matching with strings. Regular expressions are extremely useful to extract information from text such as log files or documents. This isn\u2019t meant to be a comprehensive series but rather, just as the name says, a crash course. So, buckle up!<\/p>\n<p>Many people are intimidated by regular expressions, or \u201cregex\u201d. If you see something like <strong>\u2018(\\d{1,3}\\.){3}(\\d{1,3})\u2019<\/strong> and your eyes start glazing over, don\u2019t worry. By the end of this series, you\u2019ll have the skills to identify that pattern matches IP addresses. For the uninitiated, big strings of seemingly random characters appear indecipherable, but regex is an incredibly powerful tool that any PowerShell pro needs to have a grip on.<\/p>\n<p>From what I&#8217;ve seen, lookaheads and lookbehinds are very underused by PowerSheller users who write regex. So far, everything we&#8217;ve looked at (quantifiers, special characters, character classes, and groups) all match characters in a string. For instance <strong>&#8216;\\w{3}\\s{2}&#8217;<\/strong> will match three alphanumeric characters followed by two whitespace characters. Lookaheads and lookbehinds are different, though. The best way to describe them is to say that lookaheads and lookbehinds match locations <em>between<\/em> characters, rather than matching characters themselves. Stay with me here.<\/p>\n<p>Say you have a string &#8220;domain\\username&#8221;, and you want to take just the username part. There are a lot of ways to do this in regex. Here are a few.<\/p>\n<p style=\"padding-left: 60px\"><code>'domain\\username' -replace '\\w+\\\\',''<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>'domain\\username' -replace '.*\\\\',''<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>'domain\\username' -replace '\\w+\\\\',''<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches('domain\\username','\\w+$').value<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches('domain\\username','[^\\\\]+$').value<\/code><\/p>\n<p>The first three examples are variations of a &#8220;look for something that has a backslash after it, and replace it with nothing\u201d. The last two examples look for &#8220;a word character (or not a backslash) until you get to the end of the string&#8221;. There is another way to squeeze the juice out of this lemon, though.<\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches('domain\\username','(?&lt;=\\\\).+$').value<\/code><\/p>\n<p>Let\u2019s take a closer look at the regex pattern here. <strong>(?&lt;=\\\\)<\/strong> is a lookbehind. What it means is \u201cmatch the space between characters where the character on the left is a backslash\u201d. Because the \u201cusername\u201d part of \u201cdomain\\username\u201d has a \u201c\\\u201d, this part matches the space between the characters \u201c\\\u201d and \u201cu\u201d. Then, the pattern <strong>.+$<\/strong> takes every character until the end of the string.<\/p>\n<p>You can also do a lookahead.<\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches('domain\\username','.+?(?=\\\\)').value<\/code><\/p>\n<p>Here I\u2019m just getting the \u201cdomain\u201d part of this string. The pattern <strong>.+?<\/strong> matches all the characters that it takes to get to the next part of the pattern which is <strong>(?=\\\\)<\/strong>. That pattern is a lookahead which matches the space between the \u201cn\u201d in \u201cdomain\u201d and the \u201c\\\u201d that comes after.<\/p>\n<p>You can make a lookahead or lookbehind into a negative lookahead or negative lookbehind by replacing the \u201c=\u201d part with \u201c!\u201d. Consider the following example.<\/p>\n<p style=\"padding-left: 60px\"><code>@('something','this one bad') | where { $_ -match 's(?!\\s)' }\u00a0 #returns \u2018something\u2019 only<\/code><\/p>\n<p>Hypothetically, we have an array of two items, and I am interested only in items that have an \u201cs\u201d but not where a space comes right after the \u201cs\u201d. Weird, but it\u2019s a simple example just for this crash course.<\/p>\n<p>Here\u2019s a full table of the different lookahead and lookbehind syntax.<\/p>\n<table width=\"100%\">\n<tbody>\n<tr bgcolor=\"black\">\n<th width=\"144\"><span style=\"color: white\"><strong><em>Syntax<\/em><\/strong><\/span><\/th>\n<th width=\"479\"><span style=\"color: white\"><strong>Meaning \u2013 Example<\/strong><\/span><\/th>\n<\/tr>\n<tr>\n<td width=\"144\"><em>(?=&lt;pattern&gt;)<\/em><\/td>\n<td width=\"479\">Lookahead. Matches the space between the character that comes before where that character is followed by &lt;pattern&gt;.<\/p>\n<p><code>[regex]::matches(\u2018something\u2019,\u2019<strong>(?=e)<\/strong>\u2019).value<\/code><\/p>\n<p>This will match the space between the \u201cm\u201d and \u201ce\u201d in \u201csomething\u201d because the pattern searches for a space where the character on the right is an \u201ce\u201d.<\/td>\n<\/tr>\n<tr bgcolor=\"lightblue\">\n<td width=\"144\"><em>(?!&lt;pattern&gt;)<\/em><\/td>\n<td width=\"479\">Negative lookahead. Matches the space between the character that comes before it where the character that comes after it does not match &lt;pattern&gt;.<\/p>\n<p><code>[regex]::matches(\u2018something\u2019,\u2019<strong>m(?!q)<\/strong>\u2019).value<\/code><\/p>\n<p>This will match the \u201cm\u201d and the space between the \u201cm\u201d and \u201ce\u201d in \u201csomething\u201d because the pattern searches for an \u201cm\u201d where the following character is not \u201cq\u201d.<\/td>\n<\/tr>\n<tr>\n<td width=\"144\"><em>(?&lt;=&lt;pattern&gt;)<\/em><\/td>\n<td width=\"479\">Lookbehind. Matches the space between the character that comes after it where that character is preceded by &lt;pattern&gt;.<\/p>\n<p><code>[regex]::matches(\u2018something\u2019,\u2019<strong>(?&lt;=m)<\/strong>\u2019).value<\/code><\/p>\n<p>This will match the space between the \u201cm\u201d and the \u201ce\u201d in \u201csomething\u201d because the pattern searches for a space where the character on the left is \u201cm\u201d.<\/td>\n<\/tr>\n<tr bgcolor=\"lightblue\">\n<td width=\"144\"><em>(?&lt;!&lt;pattern&gt;)<\/em><\/td>\n<td width=\"479\">Negative lookbehind. Matches the space between the character that comes after it where that character is not preceded by &lt;pattern&gt;.<\/p>\n<p><code>[regex]::matches(\u2018something\u2019,\u2019<strong>(?&lt;!q)e<\/strong>\u2019).value<\/code><\/p>\n<p>This will match the space between the \u201cm\u201d and the \u201ce\u201d as well as the \u201ce\u201d because the pattern searches for an \u201ce\u201d that is not preceded by \u201cq\u201d.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>I can hear you asking, \u201cWhat\u2019s the point?\u201d. Why would you want to use a lookahead or lookbehind when the syntax can look so confusing? Well, let\u2019s use an example to illustrate the point. Say I have a string, \u2018this\\is\\a string\u2019, where I want the \u201cis\\a\u201d part. Let\u2019s also say that the actual words \u201cthis is a string\u201d aren\u2019t consistently those words. It could be \u2018why\\is\\this something\u2019 too. Why? Because it\u2019s an example.<\/p>\n<p>I could do this.<\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches(\u2018this\\is\\a string\u2019,'\\\\[^\\\\].+?\\s').value<\/code><\/p>\n<p>That\u2019s pretty good. I\u2019ve got the pattern \u201ca single backslash followed by as many characters as it gets to get to a space\u201d. But that includes the leading \u201c\\\u201d before \u201cis\u201d, and it\u2019s hard to see here, but there\u2019s a space character following \u201ca\u201d. Okay, let\u2019s strip them out.<\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches(\u2018this\\is\\a string\u2019,'\\\\[^\\\\].+?\\s').value.trim('\\ ')<\/code><\/p>\n<p>Not bad. That will return the part that we care about, and it works in most cases. <strong>.trim()<\/strong> takes an array of characters, and we passed it the backslash and the space character to trim those off the start and end.<\/p>\n<p>Let\u2019s see how I would tackle this with lookaheads and lookbehinds.<\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches(\u2018this\\is\\a string\u2019,'(?&lt;=\\\\).+?(?=\\s)').value<\/code><\/p>\n<p>The pattern here is \u201cthe space between characters where the character on the left is a backslash, followed by as many characters as it takes to get to whitespace\u201d.<\/p>\n<p>The one that performs better will depend on the context in which you use regex. In this particular example, the non-lookahead example with <strong>.trim()<\/strong> is actually a couple ticks faster on average in my tests. In larger files with more complicated string manipulation (maybe splitting, trimming, replacing, then joining is needed), the lookaheads are going to save you a lot of processing time.<\/p>\n<p>I don\u2019t personally see a ton of people using lookaheads and lookbehinds, but they are a powerful tool and I think the more tools you have in your toolbox, the better equipped you are to handle challenges.<\/p>\n<p>Tune in tomorrow for a bunch of examples!<\/p>\n<p>That was some amazing content, Thomas!\u00a0 I know for a fact I\u2019m going to be ripping through text files with regular expressions this weekend to see what I can do with them! Thanks!<\/p>\n<p>I invite you to follow the Scripting Guys on <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguystwitter\">Twitter<\/a> and <a href=\"http:\/\/bit.ly\/scriptingguysfacebook\">Facebook<\/a>. If you have any questions, send email to them at <a target=\"_blank\" href=\"mailto:scripter@microsoft.com\">scripter@microsoft.com<\/a>, or post your questions on the <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingforum\">Official Scripting Guys Forum<\/a>. See you tomorrow.<\/p>\n<p>Until then, always remember that with Great PowerShell comes Great Responsibility.<\/p>\n<p><strong>Sean Kearney<\/strong>\nHonorary Scripting Guy\nCloud and Datacenter Management MVP<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: Thomas Rayner, Microsoft Cloud and Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell. Hello! I\u2019m Thomas Rayner, a proud Cloud &amp; Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (@MrThomasRayner), or posting on my blog, workingsysadmin.com. This week, I\u2019m [&hellip;]<\/p>\n","protected":false},"author":596,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[568,641],"tags":[56,652,45],"class_list":["post-80355","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hey-scripting-guy","category-windows-powershell","tag-guest-blogger","tag-thomas-rayner","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Summary: Thomas Rayner, Microsoft Cloud and Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell. Hello! I\u2019m Thomas Rayner, a proud Cloud &amp; Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (@MrThomasRayner), or posting on my blog, workingsysadmin.com. This week, I\u2019m [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/80355","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/596"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=80355"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/80355\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=80355"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=80355"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=80355"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}