{"id":9973,"date":"2011-08-04T07:00:00","date_gmt":"2011-08-04T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2011\/08\/04\/why-doesnt-b-match-word-boundaries-correctly\/"},"modified":"2011-08-04T07:00:00","modified_gmt":"2011-08-04T07:00:00","slug":"why-doesnt-b-match-word-boundaries-correctly","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20110804-00\/?p=9973","title":{"rendered":"Why doesn&#039;t b match word boundaries correctly?"},"content":{"rendered":"<p>A colleague of mine was having trouble getting the <code>\\b<\/code> metacharacter in a regular expression to work. Of course, when somebody asks a question like that, you first have to establish what their definition of &#8220;work&#8221; is. Fortunately, he provided some examples:<\/p>\n<table border=\"1\" style=\"border-collapse: collapse\" cellpadding=\"3\">\n<tr>\n<td><code>Regex.IsMatch(\"foo\", @\"\\b\" + @\"foo\" + @\"\\b\")<\/code><\/td>\n<td>true<\/td>\n<\/tr>\n<tr>\n<td><code>Regex.IsMatch(\"%1\" , @\"\\b\" + @\"%1\"&nbsp; + @\"\\b\")<\/code><\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td><code>Regex.IsMatch(\"%1\" , @\"\\b\" + @\"\\%1\" + @\"\\b\")<\/code><\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td><code>Regex.IsMatch(\"%1\" , @\"\\b\" + @\"\\%1\" + @\"\\b\")<\/code><\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td><code>Regex.IsMatch(\"%1\" , @\"..\")<\/code><\/td>\n<td>true<\/td>\n<\/tr>\n<tr>\n<td><code>Regex.IsMatch(\"%1\" , @\"%1\")<\/code><\/td>\n<td>true<\/td>\n<\/tr>\n<\/table>\n<p> &#8220;The last two entries are just sanity checks to make sure I didn&#8217;t make some stupid mistake like passing the parameters in the wrong order. I want to search for a string that contains <tt>%1<\/tt> with word boundaries on either side, something I would normally use <tt>\\b<\/tt> for. Is there something special about the % character? Notice that the match succeeds when I look for the word <tt>foo<\/tt>.&#8221;\n Everything is working as it should. Recall that the <tt>\\b<\/tt> metacharacter matches when there is a <tt>\\w<\/tt> on one side and a <tt>\\W<\/tt> on the other, where the beginning and end of the string are treated as if they were <tt>\\W<\/tt>.\n The string <tt>%1<\/tt> therefore breaks down as<\/p>\n<table border=\"1\" style=\"border-collapse: collapse\" cellpadding=\"3\">\n<tr>\n<td align=\"right\">virtual <tt>\\W<\/tt><\/td>\n<td>&nbsp;beginning of string<\/td>\n<\/tr>\n<tr>\n<td align=\"right\"><tt>\\W<\/tt><\/td>\n<td>&nbsp;% is not an alphanumeric or _<\/td>\n<\/tr>\n<tr>\n<td align=\"right\"><tt>\\w<\/tt><\/td>\n<td>&nbsp;1 is a digit<\/td>\n<\/tr>\n<tr>\n<td align=\"right\">virtual <tt>\\W<\/tt><\/td>\n<td>&nbsp;end of string<\/td>\n<\/tr>\n<\/table>\n<p> The only points where <tt>\\b<\/tt> would match are immediately before and after the 1, since those are the transition points between <tt>\\w<\/tt> and <tt>\\W<\/tt> and vice versa. In particular, the location immediately before the percent sign does not match since it is <a href=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2009\/09\/23\/9898231.aspx\"> surrounded<\/a> by <tt>\\W<\/tt> on both sides.<\/p>\n<p> My colleague responded, &#8220;D&#8217;oh! I keep forgetting that % won&#8217;t act like a <tt>\\w<\/tt> just because I want it to.&#8221; <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A colleague of mine was having trouble getting the \\b metacharacter in a regular expression to work. Of course, when somebody asks a question like that, you first have to establish what their definition of &#8220;work&#8221; is. Fortunately, he provided some examples: Regex.IsMatch(&#8220;foo&#8221;, @&#8221;\\b&#8221; + @&#8221;foo&#8221; + @&#8221;\\b&#8221;) true Regex.IsMatch(&#8220;%1&#8243; , @&#8221;\\b&#8221; + @&#8221;%1&#8243;&nbsp; + [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-9973","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>A colleague of mine was having trouble getting the \\b metacharacter in a regular expression to work. Of course, when somebody asks a question like that, you first have to establish what their definition of &#8220;work&#8221; is. Fortunately, he provided some examples: Regex.IsMatch(&#8220;foo&#8221;, @&#8221;\\b&#8221; + @&#8221;foo&#8221; + @&#8221;\\b&#8221;) true Regex.IsMatch(&#8220;%1&#8243; , @&#8221;\\b&#8221; + @&#8221;%1&#8243;&nbsp; + [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/9973","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=9973"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/9973\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=9973"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=9973"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=9973"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}