{"id":80375,"date":"2016-10-28T08:43:35","date_gmt":"2016-10-28T15:43:35","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/?p=80375"},"modified":"2019-02-18T09:10:23","modified_gmt":"2019-02-18T16:10:23","slug":"powershell-regex-crash-course-part-5-of-5","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/powershell-regex-crash-course-part-5-of-5\/","title":{"rendered":"PowerShell regex crash course \u2013 Part 5 of 5"},"content":{"rendered":"<p><strong>Summary<\/strong>: Thomas Rayner, Microsoft Cloud and Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell.<\/p>\n<p>Hello! I\u2019m Thomas Rayner, a proud Cloud and Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (<a target=\"_blank\" href=\"http:\/\/twitter.com\/MrThomasRayner\">@MrThomasRayner<\/a>), or posting on my blog, <a target=\"_blank\" href=\"http:\/\/workingsysadmin.com\/\">workingsysadmin.com<\/a>. This week, I\u2019m presenting a five-part crash course about how to use regular expressions in PowerShell. Regular expressions are sequences of characters that define a search pattern, mainly for use in pattern matching with strings. Regular expressions are extremely useful to extract information from text such as log files or documents. This isn\u2019t meant to be a comprehensive series but rather, just as the name says, a crash course. So, buckle up!<\/p>\n<p>Today, all I\u2019m going to do is run through examples. Some touch on other posts from this series, but others are brand new. Enjoy!<\/p>\n<p>Here\u2019s a quick way to get the values between quotation marks in a string. Say you have the following string.<\/p>\n<p style=\"padding-left: 60px\"><code>$s = @\"<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>Here is: \"Some data\"<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>Here's \"some other data\"<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>this is \"important\" data<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>\"@<\/code><\/p>\n<p>If you just want the \u201csome data\u201d, \u201csome other data\u201d and \u201cimportant\u201d parts, you could do this a couple ways.<\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches($s,'(?&lt;=\\\").+?(?=\\\")').value<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>[regex]::matches($s,'\".+?\"').value.trim('\"')<\/code><\/p>\n<p>Both return the desired results. The first one uses lookbehinds and lookaheads to search for the characters between quotation marks. The second one does basically the same thing but includes the quotation marks themselves, so I trim them afterwards. In this case, the lookahead\/lookbehind example seems to be consistently faster in my tests.<\/p>\n<p>How about this quick way of detecting if a string has non-alpha characters in it?<\/p>\n<p style=\"padding-left: 60px\"><code>$string1 = 'something'<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>$string2 = 'some@thing'<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>$string1 -match '[^a-zA-Z]'\u00a0 #returns false \u2013 no special chars<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>$string2 -match '[^a-zA-Z]'\u00a0 #returns true \u2013 has special chars<\/code><\/p>\n<p>In this example, if there\u2019s a character in either of the strings that doesn\u2019t match lowercase or uppercase a-z, then the statement is true.<\/p>\n<p>How about seeing if an integer (could be anything, in this case a number) is a specific length?<\/p>\n<p style=\"padding-left: 60px\"><code>[int]$v6 = 849032<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>[int]$v2 = 23<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>$v6 -match '^\\d{6}$'<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>$v2 -match '^\\d{6}$'<\/code><\/p>\n<p><strong>$v6<\/strong> is an int that is six digits long. <strong>$v2<\/strong> is an int that is only two digits long. On lines three and four, we\u2019re testing to see if each variable matches the pattern \u2018^\\d{6}$\u2019 which is regex speak for \u201cstart of the line, any digit, and six of them, end of the line\u201d. The first one will be true because it\u2019s six digits, and the second one will be false. You could also use something like \u2018^\\d{4,6}$\u2019 to validate that the int is between four and six digits long.<\/p>\n<p>Now, let\u2019s see if a string starts or ends in a specific character (or pattern).<\/p>\n<p style=\"padding-left: 60px\"><code>'something\\' -match '.+?\\\\$'\u00a0 #returns true<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>'something' -match '.+?\\\\$'\u00a0 #returns false<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>'\\something' -match '^\\\\.+?'\u00a0 #returns true<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>'something' -match '^\\\\.+?'\u00a0 #returns false<\/code><\/p>\n<p>In the first two examples, I\u2019m checking to see if the string ends in a backslash. In the last two examples, I\u2019m seeing if the string starts with one. The regex pattern being matched for the first two is <strong>.+?\\$<\/strong> . What\u2019s that mean? Well, the first part <strong>.+?<\/strong> means \u201cany character, and as many of them as it takes to get to the next part of the regex. The second part <strong>\\\\<\/strong> means \u201ca backslash\u201d. Because <strong>\\<\/strong> is the escape character, we\u2019re basically escaping the escape character. The last part <strong>$<\/strong> is the signal for the end of the line. Effectively, what we have is \u201canything at all, where the last thing on the line is a backslash\u201d which is exactly what we\u2019re looking for. In the second two examples, I\u2019ve just moved the <strong>\\\\<\/strong> to the start of the line and started with <strong>^<\/strong> instead of ending with <strong>$<\/strong> because <strong>^<\/strong> is the signal for the start of the line.<\/p>\n<p>Sometimes, you\u2019re given a path to a file system location that\u2019s poorly formatted. Sometimes, you\u2019re given thousands of them. Well here\u2019s an easy way to normalize those paths.<\/p>\n<p style=\"padding-left: 60px\"><code>'c:\\some\/awful\/oops\\here\\we-go.txt' -replace '\/','\\'<\/code><\/p>\n<p>Quick and easy. Anywhere there\u2019s a \u201c\/\u201d, replace it with a \u201c\\\u201d.<\/p>\n<p>How about if you want to replace something with the original value but modified?<\/p>\n<p style=\"padding-left: 60px\"><code>'this is something' -replace 's[oqr]mething','$0 fun'<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>'this is sqmething' -replace 's[oqr]mething','$0 fun'<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>'this is srmething' -replace 's[oqr]mething','$0 fun'<\/code><\/p>\n<p>Check that out. Here\u2019s what gets returned.<\/p>\n<p style=\"padding-left: 60px\"><code>this is something fun<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>this is sqmething fun<\/code><\/p>\n<p style=\"padding-left: 60px\"><code>this is srmething fun<\/code><\/p>\n<p>In all three examples, we\u2019re looking for the pattern \u201cs, followed by o or q or r, followed by mething\u201d. What am I replacing it with? Whatever the part of the string was that matched (signified by <strong>$0<\/strong>) plus the word \u201cfun\u201d. If you have multiple groups (separate matching groups by enclosing them in round brackets), you can use $1, $2 etc., to indicate which match you want to refer to. Notice that I used single quotes. If you use double quotes, $0 means something else.<\/p>\n<p>You could replace using a calculated value, too, using the <strong>[regex]<\/strong> accelerator.<\/p>\n<p style=\"padding-left: 60px\"><code>[Regex]::Replace('192.168.1.100', \u2018\\d{1,3}$\u2019, {param($old) [Int]$old.Value + 1})<\/code><\/p>\n<p>This will return \u201c192.168.1.101\u201d. The <strong>[regex]::replace()<\/strong> method allows you to pass a scriptblock after the pattern. In this example, we\u2019re replacing something in the string \u201c192.168.1.100\u201d. What we\u2019re replacing matches the pattern \u201c1 to 3 digits followed by the end of the string\u201d and we\u2019re replacing it with the old value plus 1. Cool, right?<\/p>\n<p>One last weird one. What if you have a string that reads like \u201cthis this is a fun string\u201d and you want to remove the duplicate \u201cthis\u201d? Regex to the rescue again!<\/p>\n<p style=\"padding-left: 60px\"><code>'this this is a fun string' -replace '\\b(\\w+)(\\s+\\1){1,}\\b','$1'<\/code><\/p>\n<p>Alright, what is going on here? We\u2019re feeding a string into the <strong>\u2013replace<\/strong> function. What\u2019s the pattern we\u2019re looking for? Well it\u2019s <strong>\\b(\\w+)(\\s+\\1){1,}\\b<\/strong> of course. Let\u2019s break it down. The first part of the match is \u201cthe boundary of a word\u201d. Second is <strong>(\\w+)<\/strong> which matches all the word characters until it gets to something that isn\u2019t a word. Third is <strong>(\\s\\1){1,}<\/strong> which means \u201ca space followed by the thing that matches the second part of this pattern. <strong>(\\0<\/strong> is the first part of the match \u2013 the word boundary, <strong>\\1<\/strong> is the second part of the match \u2013 the word itself denoted by <strong>(\\w+)<\/strong>, and so on) one or more times. The fourth part of the pattern is another word boundary. So, where we have a word boundary followed by a word, followed by that word again at least one time, followed by a word boundary, we want to replace it. And we replace it with <strong>$1<\/strong> which equates to the original word we matched. Still with me?<\/p>\n<p>This week, also, every PowerTip has been a regex example so check those out too!<\/p>\n<p>These are just some off-the-cuff examples of regex in action. Regex is so robust, and there are so many applications for it that it would take months to do a fully comprehensive series of posts. What I presented this week is merely a crash course \u2013 something to get your feet wet, introduce you to some concepts, and give you a jumping off point to dig deeper on your own. There are lots of regex resources out there. You just need to be motivated and look for them.<\/p>\n<p>This wraps up my regex crash course! I hope you learned something. Still confused? That\u2019s okay, too. The best way to get better at regex is by starting to use it and practice. Don\u2019t be afraid. Regex is complicated but also immensely powerful so it is definitely in your interest to at least get a rudimentary regex education.<\/p>\n<p>See you next time!<\/p>\n<p>Excellent work Thomas!\u00a0 Thanks to your posts I\u2019m feeling a lot more to speed on making Regular Expressions useful to me!<\/p>\n<p>I invite you to follow the Scripting Guys on <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguystwitter\">Twitter<\/a> and <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguysfacebook\">Facebook<\/a>. If you have any questions, send email to them at <a target=\"_blank\" href=\"mailto:scripter@microsoft.com\">scripter@microsoft.com<\/a>, or post your questions on the <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingforum\">Official Scripting Guys Forum<\/a>. See you tomorrow.<\/p>\n<p>Until then always remember that with Great PowerShell comes Great Responsibility.<\/p>\n<p><strong>Sean Kearney<\/strong>\nHonorary Scripting Guy\nCloud and Datacenter Management MVP<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: Thomas Rayner, Microsoft Cloud and Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell. Hello! I\u2019m Thomas Rayner, a proud Cloud and Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (@MrThomasRayner), or posting on my blog, workingsysadmin.com. This week, I\u2019m [&hellip;]<\/p>\n","protected":false},"author":596,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[568,687,641],"tags":[56,652,45],"class_list":["post-80375","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hey-scripting-guy","category-regular-expressions","category-windows-powershell","tag-guest-blogger","tag-thomas-rayner","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Summary: Thomas Rayner, Microsoft Cloud and Datacenter Management MVP, shows the basics of working with regular expressions in PowerShell. Hello! I\u2019m Thomas Rayner, a proud Cloud and Datacenter Management Microsoft MVP, filling in for The Scripting Guy! this week. You can find me on Twitter (@MrThomasRayner), or posting on my blog, workingsysadmin.com. This week, I\u2019m [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/80375","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/596"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=80375"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/80375\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=80375"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=80375"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=80375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}