{"id":31113,"date":"2006-05-22T10:00:08","date_gmt":"2006-05-22T10:00:08","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2006\/05\/22\/how-do-i-write-a-regular-expression-that-matches-an-ipv4-dotted-address\/"},"modified":"2006-05-22T10:00:08","modified_gmt":"2006-05-22T10:00:08","slug":"how-do-i-write-a-regular-expression-that-matches-an-ipv4-dotted-address","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20060522-08\/?p=31113","title":{"rendered":"How do I write a regular expression that matches an IPv4 dotted address?"},"content":{"rendered":"<p>\nWriting a regular expression that matches an IPv4 dotted address is either\neasy or hard, depending on how good a job you want to do.\nIn fact, to make things easier, let&#8217;s match only the decimal\ndotted notation, leaving out the hexadecimal variant,\nas well as the non-dotted variants.\n<\/p>\n<p>\nFor the purpose of this discussion,\nI&#8217;ll restrict myself to the common subset\nof the regular expression languages\nshared by perl, JScript, and the .NET Framework, and\nI&#8217;ll assume ECMA mode, wherein <code>\\d<\/code> matches only the characters\n0 through 9.\n(By default, in the .NET Framework,\n<a HREF=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2004\/03\/09\/86555.aspx\">\n<code>\\d<\/code> matches any decimal digit, not just 0 through 9<\/a>.)\n<\/p>\n<p>\nThe easiest version is just to take any string of four decimal\nnumbers separated by periods.<\/p>\n<pre>\n\/^\\d+\\.\\d+\\.\\d+\\.\\d+$\/\n<\/pre>\n<p>\nThis is nice as far as it goes, but it erroneously accepts\nstrings like &#8220;448.90210.0.65535&#8221;.\nA proper decimal dotted address has no value larger than 255.\nBut writing a regular expression that matches the integers 0 through 255\nis hard work because\nregular expressions don&#8217;t understand arithmetic;\nthey operate purely textually.\nTherefore, you have to describe the integers 0 through 255 in purely\ntextual means.\n<\/p>\n<ul>\n<li>Any single digit is valid (representing 0 through 9).\n<li>Any nonzero digit followed by another digit is valid\n    (representing 10 through 99).<\/p>\n<li>A &#8220;1&#8221; followed by two digits is valid (100 through 199).\n<li>A &#8220;2&#8221; followed by &#8220;0&#8221; through &#8220;4&#8221; followed by another digit is valid\n    (200 through 249).<\/p>\n<li>A &#8220;25&#8221; followed by &#8220;0&#8221; through &#8220;5&#8221; is valid (250 throuth 255).\n<\/ul>\n<p>\nGiven this textual breakdown of the integers 0 through 255,\nyour first try would be something like this:\n<\/p>\n<pre>\n\/^\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5]$\/\n<\/pre>\n<p>\nThis can be shrunk a bit by recognizing that the first two rules above\ncould be combined into\n<\/p>\n<ul>\n<li>Any digit, optionally preceded by a nonzero digit, is valid.\n<\/ul>\n<p>\nyielding\n<\/p>\n<pre>\n\/^[1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5]$\/\n<\/pre>\n<p>\nNow we just have to do this four times with periods in between:\n<\/p>\n<div STYLE=\"overflow: auto;width: 100%\">\n<pre>\n\/^([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])$\/\n<\/pre>\n<\/div>\n<p>\nCongratulations, we have just taken a simple description of the\ndotted decimal notation in words and converted into a monstrous\nregular expression that is basically unreadable.\nImagine you were maintaining a program and stumbled across this\nregular expression.\nHow long would it take you to figure out what it did?\n<\/p>\n<p>\nOh, and it might not be right yet,\nbecause some parsers accept leading zeroes\nin front of each decimal value without affecting it.\n(For example, 127.0.0.001 is the same as 127.0.0.1.\nOn the other hand, some parsers treat a leading zero as an octal prefix.)\nUpdating our regular expression to accept leading decimal zeroes means\nthat we now have\n<\/p>\n<div STYLE=\"overflow: auto;width: 100%\">\n<pre>\n\/^0*([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.0*([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.0*([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.0*([1-9]?\\d|1\\d\\d|2[0-4]\\d|25[0-5])$\/\n<\/pre>\n<\/div>\n<p>\nThis is why I both love and hate regular expressions.\nThey are a great way to express simple patterns.\nAnd they are a horrific way to express complicated ones.\nRegular expressions are probably the world&#8217;s most popular\nwrite-only language.\n<\/p>\n<p>\nAha, but you see, all this time diving into regular expressions\nwas a mistake.\nBecause we failed to figure out\n<a HREF=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2006\/03\/23\/558887.aspx\">\nwhat the actual problem was<\/a>.\nThis was a case of somebody &#8220;solving&#8221; half of their problem\nand then asking for help with the other half:\n&#8220;I have a string and I want to check whether it is a dotted decimal\nIPv4 address.\nI know, I&#8217;ll write a regular expression!\nHey, can anybody help me write this regular expression?&#8221;\n<\/p>\n<p>\nThe real problem was not &#8220;How do I write a regular expression to\nrecognize a dotted decimal IPv4 address.&#8221;\nThe real problem was simply &#8220;How do I recognize a dotted decimal IPv4\naddress.&#8221;\nAnd with this broader goal in mind, you recognize that limiting\nyourself to a regular expression only made the problem harder.\n<\/p>\n<pre>\nfunction isDottedIPv4(s)\n{\n var match = s.match(\/^(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)$\/);\n return match != null &amp;&amp;\n        match[1] &lt;= 255 &amp;&amp; match[2] &lt;= 255 &amp;&amp;\n        match[3] &lt;= 255 &amp;&amp; match[4] &lt;= 255;\n}\nWScript.StdOut.WriteLine(isDottedIPv4(\"127.0.0.001\"));\nWScript.StdOut.WriteLine(isDottedIPv4(\"448.90210.0.65535\"));\nWScript.StdOut.WriteLine(isDottedIPv4(\"microsoft.com\"));\n<\/pre>\n<p>\nAnd this was just a simple dotted decimal IPv4 address.\nWoe unto you if you decide you want to\n<a HREF=\"http:\/\/blogs.msdn.com\/larryosterman\/archive\/2005\/01\/07\/348548.aspx\">\nparse e-mail addresses<\/a>.\n<\/p>\n<p>\nDon&#8217;t make regular expressions do what they&#8217;re not good at.\nIf you want to match a simple pattern, then match a simple pattern.\nIf you want to do math, then do math.\nAs commenter Maurits put it,\n&#8220;<a HREF=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2006\/03\/22\/558007.aspx#559985\">The trick is not to spend time developing a combination hammer\/screwdriver,\nbut just use a hammer and a screwdriver<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Writing a regular expression that matches an IPv4 dotted address is either easy or hard, depending on how good a job you want to do. In fact, to make things easier, let&#8217;s match only the decimal dotted notation, leaving out the hexadecimal variant, as well as the non-dotted variants. For the purpose of this discussion, [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-31113","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Writing a regular expression that matches an IPv4 dotted address is either easy or hard, depending on how good a job you want to do. In fact, to make things easier, let&#8217;s match only the decimal dotted notation, leaving out the hexadecimal variant, as well as the non-dotted variants. For the purpose of this discussion, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/31113","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=31113"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/31113\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=31113"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=31113"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=31113"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}