{"id":633,"date":"2014-06-27T07:00:00","date_gmt":"2014-06-27T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2014\/06\/27\/undefined-behavior-can-result-in-time-travel-among-other-things-but-time-travel-is-the-funkiest\/"},"modified":"2024-11-04T11:19:37","modified_gmt":"2024-11-04T19:19:37","slug":"20140627-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20140627-00\/?p=633","title":{"rendered":"Undefined behavior can result in time travel (among other things, but time travel is the funkiest)"},"content":{"rendered":"<p>The C and C++ languages are notorious for the very large section of the map labeled <a href=\"http:\/\/en.wikipedia.org\/wiki\/Here_be_dragons\"> <i>here be dragons<\/i><\/a>, or more formally, <i>undefined behavior<\/i>.<\/p>\n<p>When undefined behavior is invoked, anything is possible. For example, <a href=\"http:\/\/markshroyer.com\/2012\/06\/c-both-true-and-false\/\"> a variable can be both true and false<\/a>. John Regehr has <a href=\"http:\/\/blog.regehr.org\/archives\/759\"> a list of interesting examples<\/a>, as well as some <a href=\"http:\/\/blog.regehr.org\/archives\/767\"> winners<\/a> of the ensuing contest.<\/p>\n<p>Consider the following function:<\/p>\n<pre>int table[4];\r\nbool exists_in_table(int v)\r\n{\r\n    for (int i = 0; i &lt;= 4; i++) {\r\n        if (table[i] == v) return true;\r\n    }\r\n    return false;\r\n}\r\n<\/pre>\n<p><!-- A HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2013\/10\/11\/10455907.aspx#10456678\" --> What does this have to do with time travel, you ask? Hang on, impatient one.<\/p>\n<p>First of all, you might notice the off-by-one error in the loop control. The result is that the function reads one past the end of the <code>table<\/code> array before giving up. A classical compiler wouldn&#8217;t particularly care. It would just generate the code to read the out-of-bounds array element (despite the fact that doing so is a violation of the language rules), and it would return <code>true<\/code> if the memory one past the end of the array happened to match.<\/p>\n<p>A post-classical compiler, on the other hand, might perform the following analysis:<\/p>\n<ul>\n<li>The first four times through the loop, the function might return <code>true<\/code>.<\/li>\n<li>When <code>i<\/code> is 4, the code performs undefined behavior. Since undefined behavior lets me do anything I want, I can totally ignore that case and proceed on the assumption that <code>i<\/code> is never 4. (If the assumption is violated, then something unpredictable happens, but that&#8217;s okay, because undefined behavior grants me permission to be unpredictable.)<\/li>\n<li>The case where <code>i<\/code> is 5 never occurs, because in order to get there, I first have to get through the case where <code>i<\/code> is 4, which I have already assumed cannot happen.<\/li>\n<li>Therefore, all legal code paths return <code>true<\/code>.<\/li>\n<\/ul>\n<p>As a result, a post-classical compiler can optimize the function to<\/p>\n<pre>bool exists_in_table(int v)\r\n{\r\n    return true;\r\n}\r\n<\/pre>\n<p>Okay, so that&#8217;s already kind of weird. A function got optimized to basically nothing due to undefined behavior. Note that even if the value isn&#8217;t in the table (not even in the illegal-to-access <a href=\"http:\/\/www.amazon.com\/dp\/0800195175?tag=tholneth-20\"> fifth element<\/a>), the function will <i>still return true<\/i>.<\/p>\n<p>Now we can take this post-classical behavior one step further: Since the compiler can assume that undefined behavior never occurs (because if it did, then the compiler is allowed to do <i>anything it wants<\/i>), the compiler can use undefined behavior to guide optimizations.<\/p>\n<pre>int value_or_fallback(int *p)\r\n{\r\n return p ? *p : 42;\r\n}\r\n<\/pre>\n<p>The above function accepts a pointer to an integer and either returns the pointed-to value or (if the pointer is null) returns the fallback value 42. So far so good.<\/p>\n<p>Let&#8217;s add a line of debugging to the function.<\/p>\n<pre>int value_or_fallback(int *p)\r\n{\r\n <span style=\"border: solid 1px currentcolor;\">printf(\"The value of *p is %d\\n\", *p);<\/span>\r\n return p ? *p : 42;\r\n}\r\n<\/pre>\n<p>This new line introduces a bug: It dereferences the pointer <code>p<\/code> without checking if it is null. This tiny bug actually has wide-ranging consequences. A post-classical compiler will optimize the function to<\/p>\n<pre>int value_or_fallback(int *p)\r\n{\r\n printf(\"The value of *p is %d\\n\", *p);\r\n <span style=\"border: solid 1px currentcolor;\">return *p;<\/span>\r\n}\r\n<\/pre>\n<p>because it observes that the null pointer check is no longer needed: If the pointer were null, then the <code>printf<\/code> already engaged in undefined behavior, so the compiler is allowed to do anything in the case the pointer is null (including acting as if it weren&#8217;t).<\/p>\n<p>Okay, so that&#8217;s not too surprising. That may even be an optimization you expect from a compiler. (For example, if the ternary operator was hidden inside a macro, you would have expected the compiler to remove the test that is provably false.)<\/p>\n<p>But a post-classical compiler can now use this buggy function to start doing time travel.<\/p>\n<pre>void unwitting(bool door_is_open)\r\n{\r\n if (door_is_open) {\r\n  walk_on_in();\r\n } else {\r\n  ring_bell();\r\n\r\n  \/\/ wait for the door to open using the fallback value\r\n  fallback = value_or_fallback(nullptr);\r\n  wait_for_door_to_open(fallback);\r\n }\r\n}\r\n<\/pre>\n<p>A post-classical compiler can optimize this entire function to<\/p>\n<pre>void unwitting(bool door_is_open)\r\n{\r\n walk_on_in();\r\n}\r\n<\/pre>\n<p>Huh?<\/p>\n<p>The compiler observed that the call <code>value_<wbr \/>or_<wbr \/>fallback(<wbr \/>nullptr)<\/code> invokes undefined behavior on all code paths. Propagating this analysis backward, the compiler then observes that if <code>door_<wbr \/>is_<wbr \/>open<\/code> is false, then the <code>else<\/code> branch invokes undefined behavior on all code paths. Therefore, <i>the entire <code>else<\/code> branch can be treated as unreachable<\/i>.\u00b2<\/p>\n<p>Okay, now here comes the time travel:<\/p>\n<pre>void keep_checking_door()\r\n{\r\n for (;;) {\r\n  printf(\"Is the door open? \");\r\n  fflush(stdout);\r\n  char response;\r\n  if (scanf(\"%c\", &amp;response) != 1) return;\r\n  bool door_is_open = response == 'Y';\r\n  unwitting(door_is_open);\r\n }\r\n}\r\n<\/pre>\n<p>A post-modern compiler may propagate the analysis that &#8220;if <code>door_<wbr \/>is_<wbr \/>open<\/code> is false, then the behavior is undefined&#8221; and rewrite this function to<\/p>\n<pre>void keep_checking_door()\r\n{\r\n for (;;) {\r\n  printf(\"Is the door open? \");\r\n  fflush(stdout);\r\n  char response;\r\n  if (scanf(\"%c\", &amp;response) != 1) return;\r\n  bool door_is_open = response == 'Y';\r\n  <span style=\"border: solid 1px currentcolor; border-bottom: none;\">if (!door_is_open) abort();<\/span>\r\n  <span style=\"border: solid 1px currentcolor; border-top: none;\">walk_on_in();              <\/span>\r\n }\r\n}\r\n<\/pre>\n<p>Observe that even though the original code rang the bell before crashing, the rewritten function skips over ringing the bell and just crashes immediately. You might say that the compiler <i>went back in time and unrung the bell<\/i>.<\/p>\n<p>This &#8220;going back in time&#8221; is possible even for objects with external visibility like files, because the standard allows for <i>anything at all<\/i> to happen when undefined behavior is encountered. And that includes hopping in a time machine and pretending you never called <code>fwrite<\/code>.<\/p>\n<p>Even if you claim that the compiler is not allowed to perform time travel,\u00b9 it&#8217;s still possible to see earlier operations become undone. For example, it&#8217;s possible that the undefined operation resulted in the file buffers being corrupted, so the data never actually got written. Even if the buffers were flushed, the undefined operation may have resulted in a call to <code>ftruncate<\/code> to logically remove the data you just wrote. Or it may have resulted in a <code>Delete\u00adFile<\/code> to delete the file you thought you had created.<\/p>\n<p>All of these behaviors have the same observable effect, namely that the earlier action appears not to have occurred. Whether they actually occurred and were reversed or never occurred at all is moot from a compiler-theoretic point of view.<\/p>\n<p>The compiler may as well have propagated the effect of the undefined operation backward in time.<\/p>\n<p>\u00b9 For the record, the standard explicitly permits time travel in the face of undefined behavior:<\/p>\n<blockquote class=\"q\"><p>However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (<u>not even with regard to operations preceding the first undefined operation<\/u>).<\/p><\/blockquote>\n<p>(Emphasis mine.)<\/p>\n<p>\u00b2 Another way of looking at this transformation is that the compiler saw that the <code>else<\/code> branch invokes undefined behavior on all code paths, so it rewrote the code as<\/p>\n<pre>void unwitting(bool door_is_open)\r\n{\r\n if (door_is_open) {\r\n  walk_on_in();\r\n } else {\r\n  <span style=\"border: solid 1px currentcolor;\">walk_on_in();<\/span>\r\n }\r\n}\r\n<\/pre>\n<p>taking advantage of the rule that undefined behavior allows anything to happen, so in this case, it decided that &#8220;anything&#8221; was &#8220;calling <code>walk_<wbr \/>on_<wbr \/>in<\/code> by mistake.&#8221;<\/p>\n<p><b>Bonus chatter<\/b>: Note that there are some categories of undefined behavior which may not be obvious. For example, dereferencing a null pointer is undefined behavior <i>even if you try to counteract the dereference before it does anything dangerous<\/i>.<\/p>\n<pre>int *p = nullptr;\r\nint&amp; i = *p;\r\nfoo(&amp;i); \/\/ undefined\r\n<\/pre>\n<p>You might think that the <code>&amp;<\/code> and the <code>*<\/code> cancel out and the result is as if you had written <code>foo(p)<\/code>, but the fact that you created a reference to a nonexistent object, even if you never carried through on it, invokes undefined behavior (\u00a78.5.3(1)).<\/p>\n<p><b>Related reading<\/b>: What Every C Programmer Should Know About Undefined Behavior, <a href=\"http:\/\/blog.llvm.org\/2011\/05\/what-every-c-programmer-should-know.html\"> Part 1<\/a>, <a href=\"http:\/\/blog.llvm.org\/2011\/05\/what-every-c-programmer-should-know_14.html\"> Part 2<\/a>, <a href=\"http:\/\/blog.llvm.org\/2011\/05\/what-every-c-programmer-should-know_21.html\"> Part 3<\/a>.<\/p>\n<p><b>Update<\/b>: Broke the <code>&amp;*<\/code> into two lines because it is the lone <code>*<\/code> that is the problem.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The rules of logic no longer apply when you cross the line.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-633","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>The rules of logic no longer apply when you cross the line.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=633"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/633\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}