{"id":12833,"date":"2010-09-17T07:00:00","date_gmt":"2010-09-17T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2010\/09\/17\/whats-up-with-the-strange-treatment-of-quotation-marks-and-backslashes-by-commandlinetoargvw\/"},"modified":"2010-09-17T07:00:00","modified_gmt":"2010-09-17T07:00:00","slug":"whats-up-with-the-strange-treatment-of-quotation-marks-and-backslashes-by-commandlinetoargvw","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20100917-00\/?p=12833","title":{"rendered":"What&#039;s up with the strange treatment of quotation marks and backslashes by CommandLineToArgvW"},"content":{"rendered":"<p>The way the <code>CommandLineToArgvW<\/code> function treats quotation marks and backslashes has raised eyebrows at times. Let&#8217;s look at the problem space, and then see what algorithm would work.\n Here are some sample command lines and what you presumably want them to be parsed as:<\/p>\n<table border=\"1\" rules=\"all\" style=\"border-collapse: collapse\">\n<tbody>\n<tr>\n<th>Command line<\/th>\n<th>Result<\/th>\n<\/tr>\n<tr>\n<td valign=\"baseline\"><code>program.exe \"hello there.txt\"<\/code><\/td>\n<td valign=\"baseline\"><code>program.exe<br \/>                               hello there.txt<\/code><\/td>\n<\/tr>\n<tr>\n<td valign=\"baseline\"><code>program.exe \"C:\\Hello there.txt\"<\/code><\/td>\n<td valign=\"baseline\"><code>program.exe<br \/>                               C:\\Hello there.txt<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p> In the first example, we want quotation marks to protect spaces.\n In the second example, we want to be able to enclose a path in quotation marks to protect the spaces. Backslashes inside the path have no special meaning; they are copied as any other normal character.\n So far, the rule is simple: Inside quotation marks, just copy until you see the matching quotation marks. Now here&#8217;s another wrinkle:<\/p>\n<table border=\"1\" rules=\"all\" style=\"border-collapse: collapse\">\n<tbody>\n<tr>\n<th>Command line<\/th>\n<th>Result<\/th>\n<\/tr>\n<tr>\n<td valign=\"baseline\"><code>program.exe \"hello\\\"there\"<\/code><\/td>\n<td valign=\"baseline\"><code>program.exe<br \/>                               hello\"there<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p> In the third example, we want to embed a quotation mark inside a quotated string by protecting it with a backslash.\n Okay, to handle this case, we say that a backslash which precedes a quotation mark protects the quotation mark. The backslash itself should disappear; its job is to protect the quotation mark and not to be part of the string itself. (If we kept the backslash, then it would not be possible to put a quotation mark into the command line parameter without a preceding backslash.)\n But what if you wanted a backslash at the end of the string? Then you protect the backslash with a backslash, leaving the quotation mark unprotected.<\/p>\n<table border=\"1\" rules=\"all\" style=\"border-collapse: collapse\">\n<tbody>\n<tr>\n<th>Command line<\/th>\n<th>Result<\/th>\n<\/tr>\n<tr>\n<td valign=\"baseline\"><code>program.exe \"hello\\\\\"<\/code><\/td>\n<td valign=\"baseline\"><code>program.exe<br \/>                               hello\\<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p> Okay, so what did we come up with?\n We want a backslash before a quotation mark to protect the quotation mark, and we want a backslash before a backslash to protect the backslash (so you can end a string with a backslash). Otherwise, we want the backslash to be given no special treatment.\n The <code>CommandLineToArgvW<\/code> function therefore works like this:<\/p>\n<ul>\n<li>A string of backslashes not followed by a quotation mark     has no special meaning. <\/li>\n<li>An even number of backslashes followed by a quotation mark     is treated as pairs of protected backslashes, followed by     a word terminator. <\/li>\n<li>An odd number of backslashes followed by a quotation mark     is treated as pairs of protected backslashes, followed by     a protected quotation mark. <\/li>\n<\/ul>\n<p> The backslash rule is confusing, but it&#8217;s necessary to permit the very important second example, where you can just put quotation marks around a path without having to go in and double all the internal path separators.\n Personally, I would have chosen a different backslash rule:<\/p>\n<blockquote class=\"m\"><p>  <b>Warning &#8211; these are not the actual backslash rules. These are Raymond&#8217;s hypothetical &#8220;If I ran the world&#8221; backslash rules.<\/b> <\/p>\n<ul>\n<li>A backslash followed by another backslash produces a backslash. <\/li>\n<li>A backslash followed by a quotation mark produces a quotation mark. <\/li>\n<li>A backslash followed by anything else is just a backslash followed     by that other character. <\/li>\n<\/ul>\n<\/blockquote>\n<p> I prefer these rules because they can be implemented by a state machine. On the other hand, it makes quoting regular expressions a total nightmare. It also breaks <code>\"\\\\server\\share\\path with spaces\"<\/code>, which is pretty much a deal-breaker. Hm, perhaps a better set of rules would be<\/p>\n<blockquote class=\"m\"><p>  <b>Warning &#8211; these are not the actual backslash rules. These are Raymond&#8217;s second attempt at hypothetical &#8220;If I ran the world&#8221; backslash rules.<\/b> <\/p>\n<ul>\n<li>Backslashes have no special meaning at all. <\/li>\n<li>If you are outside quotation marks, then a     <tt>\"<\/tt> takes you inside quotation marks but generates no output. <\/li>\n<li>If you are inside quotation marks, then     a sequence of 2N quotation marks represents N quotation marks in     the output. <\/li>\n<li>If you are inside quotation marks, then     a sequence of 2N+1 quotation marks represents N quotation marks in     the output, and then you exit quotation marks. <\/li>\n<\/ul>\n<\/blockquote>\n<p> This can also be implemented by a state machine, and quoting an existing string is very simple: Stick a quotation mark in front, a quotation mark at the end, and double all the internal quotation marks.\n But what&#8217;s done is done, and the first set of backslash rules is what <code>CommandLineToArgvW<\/code> implements. And since the behavior has been shipped and documented, it can&#8217;t change.\n If you don&#8217;t like these parsing rules, then feel free to write your own parser that follows whatever rules you like.<\/p>\n<p> <b>Bonus chatter<\/b>: Quotation marks are even more screwed up. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>The way the CommandLineToArgvW function treats quotation marks and backslashes has raised eyebrows at times. Let&#8217;s look at the problem space, and then see what algorithm would work. Here are some sample command lines and what you presumably want them to be parsed as: Command line Result program.exe &#8220;hello there.txt&#8221; program.exe hello there.txt program.exe &#8220;C:\\Hello [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-12833","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>The way the CommandLineToArgvW function treats quotation marks and backslashes has raised eyebrows at times. Let&#8217;s look at the problem space, and then see what algorithm would work. Here are some sample command lines and what you presumably want them to be parsed as: Command line Result program.exe &#8220;hello there.txt&#8221; program.exe hello there.txt program.exe &#8220;C:\\Hello [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/12833","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=12833"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/12833\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=12833"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=12833"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=12833"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}