{"id":15561,"date":"2011-02-18T00:01:00","date_gmt":"2011-02-18T00:01:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2011\/02\/18\/speed-up-array-comparisons-in-powershell-with-a-runtime-regex\/"},"modified":"2011-02-18T00:01:00","modified_gmt":"2011-02-18T00:01:00","slug":"speed-up-array-comparisons-in-powershell-with-a-runtime-regex","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/speed-up-array-comparisons-in-powershell-with-a-runtime-regex\/","title":{"rendered":"Speed Up Array Comparisons in Powershell with a Runtime Regex"},"content":{"rendered":"<p><b>Summary<\/b>: Learn how to speed up array comparisons in Windows PowerShell by using a runtime regular expression<\/p>\n<p><img decoding=\"async\" height=\"34\" width=\"34\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" align=\"left\" alt=\"Hey, Scripting Guy! Question\" border=\"0\" title=\"Hey, Scripting Guy! Question\" \/><\/p>\n<p>&nbsp; Hey, Scripting Guy! I am interested in speeding up comparisons of arrays when I use Windows PowerShell. Can you help me?<\/p>\n<p>&mdash;CR<\/p>\n<p><img decoding=\"async\" height=\"34\" width=\"34\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" align=\"left\" alt=\"Hey, Scripting Guy! Answer\" border=\"0\" title=\"Hey, Scripting Guy! Answer\" \/> Hello CR, <\/p>\n<p>Microsoft Scripting Guy, Ed Wilson, here. We are still in our Guest Blogger Week, so I will turn your question over to today&rsquo;s guest blogger, Rob Campbell. <\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/4744.HSG-2-18-11-1_2B7C607C.jpg\"><img decoding=\"async\" height=\"451\" width=\"304\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/4670.HSG-2-18-11-1_thumb_41EEFF03.jpg\" alt=\"Photo of Rob Campbell\" border=\"0\" title=\"Photo of Rob Campbell\" style=\"border-bottom: 0px;border-left: 0px;padding-left: 0px;padding-right: 0px;border-top: 0px;border-right: 0px;padding-top: 0px\" \/><\/a><\/p>\n<p>Here is Rob&rsquo;s description of his scripting background: <\/p>\n<blockquote>\n<p>I work at a medium-to-large corporate financial institution as an AD and Exchange administrator. I&rsquo;ve worked in IT for over 35 years, starting as a night operator on IBM System 3 and 360 series mainframes. I never really got the hang of VB, but I love <a target=\"_blank\" href=\"http:\/\/technet.microsoft.com\/en-us\/scriptcenter\/powershell.aspx\">Windows PowerShell<\/a>. I have done a few large-scale scripts, but most of my scripting work is ad-hoc, on-demand reports and maintenance changes. I don&rsquo;t know that I can lay claim to any particular area of specialization. The nearest I ever came up with for a good job description was &ldquo;tactical logician.&rdquo;<\/p>\n<\/blockquote>\n<p>Occasionally, you&rsquo;ll need to compare the contents of a pair of arrays of strings in Windows PowerShell, and there are some nice built-in operators that help you do that, notably <b>&ndash;contains<\/b>, <b>-notcontains<\/b>, and <b>compare-object<\/b>. Normally I&rsquo;ll use the <b>&ndash;contains<\/b> and <b>-notcontains<\/b> operators. Here, we find all the elements of array <b>$b<\/b> that do and do not appear in array <b>$a<\/b>:<\/p>\n<blockquote>\n<p>$a = &ldquo;red.&rdquo;,&rdquo;blue.&rdquo;,&rdquo;yellow.&rdquo;,&rdquo;green.&rdquo;,&rdquo;orange.&rdquo;,&rdquo;purple.&rdquo;<\/p>\n<p>$b = &rdquo;blue.&rdquo;,&rdquo;green.&rdquo;,&rdquo;orange.&rdquo;,&#8221;white.&#8221;,&#8221;gray.&rdquo;<\/p>\n<p>$b |? {$a -contains $_}<\/p>\n<p>blue.<\/p>\n<p>green.<\/p>\n<p>orange.<\/p>\n<p>$b |? {$a -notcontains $_}<\/p>\n<p>white.<\/p>\n<p>gray.<\/p>\n<\/blockquote>\n<p>This is great for single instances, repetitive comparisons, or relatively small numbers of arrays. It is concise and intuitive. However, sometimes you need to compare an array to a collection of thousands of arrays. This could be email recipient addresses in message tracking logs, AD group memberships, NTFS access control lists, or virtually any large collection of objects that have multivalued string properties that you need to compare to some other array of string properties. For this kind of scenario, there is another method that provides much better performance&mdash;a regular expression (regex) match.<\/p>\n<p>Regular expression matches are most commonly used to match single strings, and we are talking about array comparisons&mdash;comparing one set of multiple values to another set of multiple values. Regular expressions can be written to match multiple values at once by using the alternation operator (<b>|<\/b>).<\/p>\n<p>Some characters are reserved for use as metacharacters in regular expressions, and they must be escaped with a backslash to be interpreted literally in the match. One of these is the period, so for our example, a regular expression to match all the values in our <b>$a<\/b> array would look like this:<\/p>\n<p>[regex] $a_regex = &ldquo;^(red\\.|blue\\.|yellow\\.|green\\.|orange\\.|purple\\.)$&rdquo;<\/p>\n<p>With all of the values, we want to match groups that are in parentheses and separated by a pipe symbol. The following code uses this regex to replace our <b>&ndash;contains<\/b> construct with <b>&ndash;match<\/b> operations.<\/p>\n<blockquote>\n<p>$b -match $a_regex<\/p>\n<p>blue.<\/p>\n<p>green.<\/p>\n<p>orange.<\/p>\n<p>$b -notmatch $a_regex<\/p>\n<p>white.<\/p>\n<p>gray.<\/p>\n<\/blockquote>\n<p>Both methods produce the same result, but there is a substantial difference in the amount of time it takes for them to do it.<\/p>\n<p>You could use the following code if you had to compare <b>$a<\/b> to 10,000 <b>$b<\/b> arrays, finding the common or different elements in each one.<\/p>\n<blockquote>\n<p>$counter = 1..10000<\/p>\n<p>$test = measure-command {<\/p>\n<p>foreach ($i in $counter){<\/p>\n<p>$b | where {$a -contains $_}<\/p>\n<p>$b | where {$a -notcontains $_}<\/p>\n<p>}<\/p>\n<p>} <\/p>\n<p>$test.totalseconds<\/p>\n<p>9.2625084<\/p>\n<\/blockquote>\n<p>Here we run the same test using our regex match 10,000 times.<\/p>\n<blockquote>\n<p>$counter = 1..10000<\/p>\n<p>$test = measure-command {<\/p>\n<p>foreach ($i in $counter){<\/p>\n<p>$b -match $a<\/p>\n<p>$b -notmatch $a<\/p>\n<p>}<\/p>\n<p>} <\/p>\n<p>$test.totalseconds<\/p>\n<p>0.6718625<\/p>\n<\/blockquote>\n<p>The actual test times will vary depending on the processor and background processes, but I have found the alternation regex to consistently perform many time faster than the <b>&ndash;contains<\/b> operators.<\/p>\n<p>Now, how can we use this in a script when what is in <b>$a<\/b> isn&rsquo;t known until runtime? Simple. We wait until runtime to create our regular expression, based on what is in <b>$a<\/b>.<\/p>\n<p>First, we want to make sure that we do a literal match on our incoming array elements. Fortunately, the regex type contains a static method (escape) that will do this for us. So, starting with our <b>$a<\/b> array, first we escape all the special characters in each element:<\/p>\n<blockquote>\n<p>$a |foreach {[regex]::escape($_)}<\/p>\n<p>red\\.<\/p>\n<p>blue\\.<\/p>\n<p>yellow\\.<\/p>\n<p>green\\.<\/p>\n<p>orange\\.<\/p>\n<p>purple\\.<\/p>\n<\/blockquote>\n<p>In addition, all the reserved characters are escaped for us. Now, we join the elements of our array with the pipe symbol that denotes alternation:<\/p>\n<blockquote>\n<p>($a |foreach {[regex]::escape($_)}) &ndash;join &ldquo;|&rdquo;<\/p>\n<p>red\\.|blue\\.|yellow\\.|green\\.|orange\\.|purple\\.<\/p>\n<\/blockquote>\n<p>We are almost there. All that is left is to add our grouping parentheses, anchors, and any regex options we want. I&rsquo;m going to add the (<b>?i<\/b>) that denotes a case-insensitive regex to this one, and use the beginning-of-line (<b>^<\/b>) and end-of-line (<b>$<\/b>) anchors:<\/p>\n<blockquote>\n<p>&lsquo;(?i)^(&lsquo; + (($a |foreach {[regex]::escape($_)}) &ndash;join &ldquo;|&rdquo;) + &lsquo;)$&rsquo;<\/p>\n<p>(?i)^(red\\.|blue\\.|yellow\\.|green\\.|orange\\.|purple\\.)$<\/p>\n<\/blockquote>\n<p>There is our regex. All that is left is to assign it to a variable and cast it to the proper type:<\/p>\n<blockquote>\n<p>[regex] $a_regex = &lsquo;(?i)^(&lsquo; + (($a |foreach {[regex]::escape($_)}) &ndash;join &ldquo;|&rdquo;) + &lsquo;)$&rsquo;<\/p>\n<p>$a_regex.tostring()<\/p>\n<p>(?i)^(red\\.|blue\\.|yellow\\.|green\\.|orange\\.|purple\\.)$<\/p>\n<\/blockquote>\n<p>Now our test script looks like this:<\/p>\n<blockquote>\n<p>$a = &ldquo;red.&rdquo;,&rdquo;blue.&rdquo;,&rdquo;yellow.&rdquo;,&rdquo;green.&rdquo;,&rdquo;orange.&rdquo;,&rdquo;purple.&rdquo;<\/p>\n<p>$b = &rdquo;blue.&rdquo;,&rdquo;green.&rdquo;,&rdquo;orange.&rdquo;,&#8221;white.&#8221;,&#8221;gray.&#8221;<\/p>\n<p>[regex] $a_regex = &lsquo;(?i)^(&lsquo; + (($a |foreach {[regex]::escape($_)}) &ndash;join &ldquo;|&rdquo;) + &lsquo;)$&rsquo;<\/p>\n<p>$b -match $a_regex<\/p>\n<p>blue.<\/p>\n<p>green.<\/p>\n<p>orange.<\/p>\n<p>$b -notmatch $a_regex<\/p>\n<p>white.<\/p>\n<p>gray.<\/p>\n<\/blockquote>\n<p>CR, that is all there is to using regular expressions to speed up comparisons of arrays. Thank you, Rob, for sharing with us today. This brings Guest Blogger Week to a close. Join me tomorrow for <a target=\"_blank\" href=\"http:\/\/blogs.technet.com\/b\/heyscriptingguy\/archive\/tags\/weekend+scripter\/\">Weekend Scripter<\/a>. <\/p>\n<p>I invite you to follow me on <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguystwitter\">Twitter<\/a> and <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingguysfacebook\">Facebook<\/a>. If you have any questions, send email to me at <a href=\"mailto:scripter@microsoft.com\">scripter@microsoft.com<\/a>, or post your questions on the <a target=\"_blank\" href=\"http:\/\/bit.ly\/scriptingforum\">Official Scripting Guys Forum<\/a>. See you tomorrow. Until then, peace.<\/p>\n<p><b>Ed Wilson, Microsoft Scripting Guy<\/b><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: Learn how to speed up array comparisons in Windows PowerShell by using a runtime regular expression &nbsp; Hey, Scripting Guy! I am interested in speeding up comparisons of arrays when I use Windows PowerShell. Can you help me? &mdash;CR Hello CR, Microsoft Scripting Guy, Ed Wilson, here. We are still in our Guest Blogger [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[18,56,223,3,4,45],"class_list":["post-15561","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-arrays-hash-tables-and-dictionary-objects","tag-guest-blogger","tag-rob-campbell","tag-scripting-guy","tag-scripting-techniques","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Summary: Learn how to speed up array comparisons in Windows PowerShell by using a runtime regular expression &nbsp; Hey, Scripting Guy! I am interested in speeding up comparisons of arrays when I use Windows PowerShell. Can you help me? &mdash;CR Hello CR, Microsoft Scripting Guy, Ed Wilson, here. We are still in our Guest Blogger [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/15561","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=15561"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/15561\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=15561"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=15561"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=15561"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}