{"id":14263,"date":"2010-04-23T07:00:00","date_gmt":"2010-04-23T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2010\/04\/23\/why-cant-i-get-my-regular-expression-pattern-to-match-words-that-begin-with\/"},"modified":"2010-04-23T07:00:00","modified_gmt":"2010-04-23T07:00:00","slug":"why-cant-i-get-my-regular-expression-pattern-to-match-words-that-begin-with","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20100423-00\/?p=14263","title":{"rendered":"Why can&#039;t I get my regular expression pattern to match words that begin with %?"},"content":{"rendered":"<p>A customer asked for help writing a regular expression that, in the customer&#8217;s words, matched the string <code>%1<\/code> when it appeared as a standalone word.<\/p>\n<table border=\"1\">\n<tr>\n<th>Match<\/th>\n<th>No match<\/th>\n<\/tr>\n<tr>\n<td><tt><u>%1<\/u><\/tt><\/td>\n<td><tt>%1b<\/tt><\/td>\n<\/tr>\n<tr>\n<td><tt>:<u>%1<\/u>:<\/tt><\/td>\n<td><tt>x%1<\/tt><\/td>\n<\/tr>\n<\/table>\n<p> One of the things that people often forget to do when asking a question is to <a href=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2010\/04\/22\/10000406.aspx\"> describe the things that they tried and what the results were<\/a>. This is important information to include, because it saves the people who try to answer the question from wasting their time repeating the things that you already tried.<\/p>\n<table border=\"1\">\n<tr>\n<th>Pattern<\/th>\n<th>String<\/th>\n<th>Result<\/th>\n<th>Expected<\/th>\n<\/tr>\n<tr>\n<td><tt>\\b%1\\b<\/tt><\/td>\n<td><tt>%1<\/tt><\/td>\n<td>No match<\/td>\n<td>Match<\/td>\n<\/tr>\n<tr>\n<td><tt>\\b%1\\b<\/tt><\/td>\n<td><tt>:%1:<\/tt><\/td>\n<td>No match<\/td>\n<td>Match<\/td>\n<\/tr>\n<tr>\n<td><tt>\\b%1\\b<\/tt><\/td>\n<td><tt>x%1<\/tt><\/td>\n<td>Match<\/td>\n<td>No match<\/td>\n<\/tr>\n<tr>\n<td><tt>^..$<\/tt><\/td>\n<td><tt>%1<\/tt><\/td>\n<td>Match<\/td>\n<td>Match<\/td>\n<\/tr>\n<\/table>\n<p> That last entry was just to make sure that the test app was working, a valuable step when chasing a problem: First, make sure the problem is where you think it is. If the <tt>^..$<\/tt> hadn&#8217;t worked, then the problem would not have been with the regular expression but with some other part of the program.\n &#8220;Is the <tt>\\b<\/tt> operator broken?&#8221;\n No, the <tt>\\b<\/tt> operator is working just fine. The problem is that the <tt>\\b<\/tt> operator doesn&#8217;t do what you think it does.\n For those not familiar with this notation, well, first you were probably confused by the <tt>\\b<\/tt> in the original question and skipped the rest of this article. Anyway, <tt>\\w<\/tt> matches A through Z (either uppercase or lowercase), a digit 0 through 9, or an underscore. (It&#8217;s actually more complicated than that, but the above description is good enough for the current discussion.) By contrast, <tt>\\W<\/tt> matches every other character. And in regular expression speak, a &#8220;word&#8221; is a maximal contiguous string of <tt>\\w<\/tt> characters. Finally, the <tt>\\b<\/tt> operator matches the location between a <tt>\\w<\/tt> and a <tt>\\W<\/tt>, treating the beginning and end of the string as an invisible <tt>\\W<\/tt>. I will stop mentioning the pretend <tt>\\W<\/tt> at the ends of the string; just mentally insert them where applicable.\n Okay, let&#8217;s go back to the original regular expression of <tt>\\b%1\\b<\/tt>. Notice that the percent sign is not one of the things which is matched by <tt>\\w<\/tt>. Therefore, in order for the <tt>\\b<\/tt> that comes before it to match, the character before the percent sign must be a <tt>\\W<\/tt>. That way, the <tt>\\b<\/tt> comes between a <tt>\\w<\/tt> and a <tt>\\W<\/tt>. The pattern <tt>\\b%1\\b<\/tt> means &#8220;A percent sign which comes after a <tt>\\w<\/tt>, followed by a 1 which comes before a <tt>\\W<\/tt>.&#8221;\n Looking at it another way, the string <tt>%1<\/tt> breaks down like this:<\/p>\n<table border=\"1\">\n<tr>\n<td><tt>\\W<\/tt><\/td>\n<td>beginning of string (virtual)<\/td>\n<\/tr>\n<td><tt>\\W<\/tt><\/td>\n<td><tt>%<\/tt><\/td>\n<td><tt>\\w<\/tt><\/td>\n<td><tt>1<\/tt><\/td>\n<td><tt>\\W<\/tt><\/td>\n<td>end of string (virtual)<\/td>\n<\/table>\n<p> There is a <tt>\\b<\/tt> between the <tt>%<\/tt> and the <tt>1<\/tt> and another one between the <tt>1<\/tt> and the end of the string, but there is no <tt>\\b<\/tt> before the percent sign, because that location has <tt>\\W<\/tt> on both sides.\n The question started off on the wrong foot: You are having trouble writing a regular expression that matches a word that begins with <tt>%<\/tt> because <i>there are no words which begin with <tt>%<\/tt><\/i>. The percent sign is not a <tt>\\w<\/tt> and therefore cannot be part of a word.\n What the customer is looking for is something more like <tt>(?&lt;!\\w)%1\\b<\/tt>, a regular expression which means <i>a percent sign not preceded by a <tt>\\w<\/tt>, followed by a 1 which comes before a <tt>\\W<\/tt><\/i>.<\/p>\n<p> The customer realized the mistake once it was pointed out. &#8220;I keep forgetting that I can&#8217;t get <tt>%<\/tt> included in <tt>\\w<\/tt> just because I want it to.&#8221; <\/p>\n<p><a href=\"http:\/\/blogs.msdn.com\/michkap\/\"> Michael Kaplan<\/a><a href=\"http:\/\/blogs.msdn.com\/michkap\/archive\/2008\/11\/10\/9056364.aspx\"> covered this same topic some time ago<\/a><\/p>\n<p>. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A customer asked for help writing a regular expression that, in the customer&#8217;s words, matched the string %1 when it appeared as a standalone word. Match No match %1 %1b :%1: x%1 One of the things that people often forget to do when asking a question is to describe the things that they tried and [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-14263","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>A customer asked for help writing a regular expression that, in the customer&#8217;s words, matched the string %1 when it appeared as a standalone word. Match No match %1 %1b :%1: x%1 One of the things that people often forget to do when asking a question is to describe the things that they tried and [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/14263","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=14263"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/14263\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=14263"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=14263"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=14263"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}