{"id":695,"date":"2018-08-14T00:21:17","date_gmt":"2018-08-14T08:21:17","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/koryt\/?p=695"},"modified":"2019-10-09T05:27:34","modified_gmt":"2019-10-09T13:27:34","slug":"regular-expressions-regex-grouping-regex","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/regular-expressions-regex-grouping-regex\/","title":{"rendered":"Regular Expressions (REGEX): Grouping &amp; [RegEx]"},"content":{"rendered":"<p>Welcome back to the RegEx crash course. Last time we talked about the <a href=\"https:\/\/blogs.msdn.microsoft.com\/koryt\/?p=685\">basic symbols<\/a> we plan to use as our foundation. This week, we will be learning a new way to leverage our patterns for data extraction and how to rip our extracted data into pieces we care about.<\/p>\n<h3>[RegEx]<\/h3>\n<p>The <code>[Regex]<\/code>\u00a0data type has some cool static members,\u00a0 but we&#8217;re mostly going to play with the plural method\u00a0<code>matches(&lt;data&gt;,&lt;pattern&gt;)<\/code> if you don&#8217;t know what static members are you can check <a href=\"https:\/\/devblogs.microsoft.com\/scripting\/2018\/\">this post<\/a> or <a href=\"https:\/\/docs.microsoft.com\/en-us\/powershell\/scripting\/getting-started\/cookbooks\/using-static-classes-and-methods?view=powershell-6\">this help data<\/a>.<\/p>\n<p>A lot of the time, when we work with RegEx we are using it to extract <em>everything<\/em> that matches our pattern in a large amount of data. Using <code>$matches<\/code>\u00a0like we did in the previous posts means we have to write a lot of looping and if statements. With <code>[regex]::matches()<\/code>we can condense all that and it could work on a big blob of text instead of just a list of individual lines. This means that if there is more than 1 match per line we can still get it!<\/p>\n<p>If we take a look at some sample data that it returns, we can see that we actually get a pretty rich match object:<\/p>\n<pre class=\"lang:default decode:true\">Groups : {0}\r\nSuccess : True\r\nName : 0\r\nCaptures : {0}\r\nIndex : 3534\r\nLength : 23\r\nValue : ecowpland1d@myspace.com<\/pre>\n<p>The thing we care about is the\u00a0<code>value<\/code>\u00a0property, but you&#8217;ll notice it even tells you the starting character and how many characters long it is.<\/p>\n<p>Let&#8217;s take a look at how we might modify the email match from the earlier post to use this:<\/p>\n<pre class=\"lang:ps decode:true\">#grab our data as one big blob (-raw)\r\n$file = get-content \"$PSScriptRoot\\MOCK_DATA.txt\" -raw\r\n\r\n#make our pattern\r\n$regex = \"\\w+@\\w+\\.\\w+\"\r\n\r\n#extract all matches and display the value property\r\n[RegEx]::Matches($file,$regex).value\r\n<\/pre>\n<p>Now it all looks a lot sleeker, go [RegEx]!<\/p>\n<h4>Grouping<\/h4>\n<p>Grouping is a way that we can logically break up our extraction. I use this for 2 main reasons:<\/p>\n<ol>\n<li>The data I want isn&#8217;t unique on its own, but the data around it is. Now I can match the <em>unique<\/em> piece and rip out what I <em>want<\/em> to use.<\/li>\n<li>The data I want is all there, but I plan to use pieces of it for different things<\/li>\n<\/ol>\n<p>Grouping can be done by wrapping sections of your pattern in parenthesis. The full pattern will always match as group <code>0<\/code>, which is why we were typing <code>$matches[0]<\/code>\u00a0to start. Each individual group then gets pulled out in numerical order.<\/p>\n<p>Maybe we grab some data by copy\/pasting out of outlook and it looks like this: <code>Brenda Seamon &lt;bseamon0@bbc.co.uk&gt;<\/code>\u00a0If its in a large block of text we might want to use RegEx to extract it like we have before. This time, we want to grab the First, Last, and Email to use for things. They&#8217;re all in our data, and we can use grouping to pull them out individually.<\/p>\n<p>Let&#8217;s start by finding a pattern that gets all of our data. I used this one: <code>$pattern = \"\\w+\\s+\\w+\\s+&lt;\\w+@\\w+\\.\\w+&gt;\"<\/code><\/p>\n<ol>\n<li>1+ word characters (first name)<\/li>\n<li>1+ space characters<\/li>\n<li>1+ word characters (last name)<\/li>\n<li>1+ space characters<\/li>\n<li>&lt;<\/li>\n<li>The rest of our email pattern, like we used before.<\/li>\n<li>&gt;<\/li>\n<\/ol>\n<p>We can see that words with our test:<\/p>\n<pre class=\"lang:ps decode:true\">$data = \"Brenda Seamon &lt;bseamon0@bbc.co&gt;\"\r\n$pattern = \"\\w+\\s+\\w+\\s+&lt;\\w+@\\w+\\.\\w+&gt;\"\r\n$data -match $pattern\r\n$matches[0]<\/pre>\n<p>Now that we know it works, lets try grouping up the pieces we want by putting parens around the first, last and email sections: <code>$pattern = \"(\\w+)\\s+(\\w+)\\s+\"<\/code><\/p>\n<pre class=\"lang:ps decode:true\">$data = \"Brenda Seamon &lt;bseamon0@bbc.co&gt;\"\r\n$pattern = \"(\\w+)\\s+(\\w+)\\s&lt;(\\w+@\\w+\\.\\w+)&gt;\"\r\n$data -match $pattern\r\n$matches[0]\r\n\r\n\"All match: {0}\r\nFirst name: {1}\r\nLast name: {2}\r\nEmail: {3}\r\n\" -f $matches[0],$matches[1],$matches[2],$matches[3]<\/pre>\n<pre class=\"lang:default decode:true\">All match: Brenda Seamon &amp;amp;amp;amp;lt;bseamon0@bbc.co&amp;amp;amp;amp;gt;\r\nFirst name: Brenda\r\nLast: name Seamon\r\nEmail: bseamon0@bbc.co\r\n<\/pre>\n<p>We can also name these groups using <code>?&lt;NAME&gt;<\/code>inside of the parens. This makes our pattern start to look really bananas if we saw it without context <code>$pattern = \"(?&lt;first&gt;\\w+)\\s+(?&lt;last&gt;\\w+)\\s+&lt;(?&lt;email&gt;\\w+@\\w+\\.\\w+)&gt;\"<\/code><\/p>\n<pre class=\"lang:ps decode:true \">$data = \"Brenda Seamon &lt;bseamon0@bbc.co&gt;\"\r\n$pattern = \"(?&lt;first&gt;\\w+)\\s+(?&lt;last&gt;\\w+)\\s+&lt;(?&lt;email&gt;\\w+@\\w+\\.\\w+)&gt;\"\r\n$data -match $pattern\r\n$matches[0]\r\n\r\n\"All match: {0}\r\nFirst name: {1}\r\nLast name: {2}\r\nEmail: {3}\r\n\" -f $matches[0],$matches[\"first\"],$matches[\"last\"],$matches[\"email\"]<\/pre>\n<p>Notice <code>$matches<\/code>\u00a0is a hash table, and our group names become the keys. Let&#8217;s try grabbing the groups using [RegEx]<\/p>\n<pre class=\"lang:ps decode:true\">$data = \"Brenda Seamon &lt;bseamon0@bbc.co&gt;\"\r\n$pattern = \"(?&lt;first&gt;\\w+)\\s+(?&lt;last&gt;\\w+)\\s+&lt;(?&lt;email&gt;\\w+@\\w+\\.\\w+)&gt;\"\r\n$results = [Regex]::Matches($data,$Pattern)\r\n$results[0].groups[\"email\"].value<\/pre>\n<p>Its a bit more work, since we need to keep track of which result we are on in a group of matches. This scales nicely for looping though!<\/p>\n<pre class=\"lang:ps decode:true\">$data = \"Brenda Seamon &lt;bseamon0@bbc.co&gt;, Joeann Brotherwood &lt;jbrotherwood1@house.gov&gt;, Jake Duffan &lt;jduffan2@google.ru&gt;\"\r\n$pattern = \"(?&lt;first&gt;\\w+)\\s+(?&lt;last&gt;\\w+)\\s+&lt;(?&lt;email&gt;\\w+@\\w+\\.\\w+)&gt;\"\r\n$results = [Regex]::Matches($data,$Pattern)\r\n$people = @()\r\n\r\nforeach($person in $results)\r\n{\r\n$obj = [pscustomobject]@{\r\n\"First Name\" = $person.Groups[\"first\"].value\r\n\"Last Name\" = $person.Groups[\"last\"].value\r\nEmail = $person.Groups[\"email\"].value\r\n}\r\n\r\n$people += $obj\r\n}\r\n\r\n$People<\/pre>\n<pre class=\"lang:default decode:true\">First Name Last Name Email\r\n---------- --------- -----\r\nBrenda Seamon bseamon0@bbc.co\r\nJoeann Brotherwood jbrotherwood1@house.gov\r\nJake Duffan jduffan2@google.ru\r\n<\/pre>\n<p>Hopefully you&#8217;ve had fun playing with RegEx so far. We will take a look at some other symbols and some little tricks we can use grouping and <code>?<\/code>for in future posts!<\/p>\n<p>As always, don&#8217;t forget to rate, comment and share! Let me know what you think of the content and what topics you&#8217;d like to see me blog about in the future.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome back to the RegEx crash course. Last time we talked about the basic symbols we plan to use as our foundation. This week, we will be learning a new way to leverage our patterns for data extraction and how to rip our extracted data into pieces we care about. [RegEx] The [Regex]\u00a0data type has [&hellip;]<\/p>\n","protected":false},"author":7300,"featured_media":87096,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1738],"tags":[2221,2125,377,174],"class_list":["post-695","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-powershell","tag-kory-thacher","tag-koryt","tag-powershell","tag-regular-expressions"],"acf":[],"blog_post_summary":"<p>Welcome back to the RegEx crash course. Last time we talked about the basic symbols we plan to use as our foundation. This week, we will be learning a new way to leverage our patterns for data extraction and how to rip our extracted data into pieces we care about. [RegEx] The [Regex]\u00a0data type has [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/695","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/7300"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=695"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/695\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=695"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=695"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=695"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}