{"id":110091,"date":"2024-08-02T07:00:00","date_gmt":"2024-08-02T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110091"},"modified":"2024-08-02T08:52:43","modified_gmt":"2024-08-02T15:52:43","slug":"20240802-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20240802-00\/?p=110091","title":{"rendered":"The difference between undefined behavior and ill-formed C++ programs"},"content":{"rendered":"<p>The C++ language has two large categories of &#8220;don&#8217;t do that&#8221; known as <i>undefined behavior<\/i> and <i>ill-formed program<\/i>. What&#8217;s the difference?<\/p>\n<p>Undefined behavior (commonly abbreviated <i>UB<\/i>) is a runtime concept. If a program does something which the language specified as &#8220;a program isn&#8217;t allowed to do that&#8221;, then the behavior at runtime is undefined: The program is permitted by the standard to do anything it wants. Furthermore, the effect of undefined behavior can <a title=\"Undefined behavior can result in time travel (among other things, but time travel is the funkiest)\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20140627-00\/?p=633\"> go backward in time<\/a> and invalidate operations that occurred prior to the undefined behavior. It can do things like <a title=\"Undefined behavior can literally erase your hard disk\" href=\"https:\/\/blog.tchatzigiannakis.com\/undefined-behavior-can-literally-erase-your-hard-disk\/\"> execute dead code<\/a>. However, if your program avoids the code paths which trigger undefined behavior, then you are safe.<\/p>\n<pre>int nervous(bool is_scary, int n)\r\n{\r\n    if (is_scary) {\r\n        return 100 \/ n;\r\n    } else {\r\n        return 0;\r\n    }\r\n}\r\n\r\nint main()\r\n{\r\n    return nervous(false, 0);\r\n}\r\n<\/pre>\n<p>There is no undefined behavior in this program because <code>nervous<\/code> is called with <code>is_scary<\/code> set to <code>false<\/code>, so the <code>return 100 \/ n<\/code> never executes, and we avoid the division by zero.<\/p>\n<p>Avoiding code paths with undefined behavior is something you do all the time.<\/p>\n<pre>\/\/ Check the pointer before using it\r\nif (p != nullptr) p-&gt;DoSomething();\r\n\r\n\/\/ Avoid division by zero\r\nreturn (n == 0) ? 0 : 100 \/ n;\r\n\r\nint a[5]{};\r\n\/\/ Check array index before using it\r\nif (n &lt; 5) return a[n];\r\n<\/pre>\n<p>Even if a program contains undefined behavior, the compiler is still obligated to produce a runnable program. And if the code path containing undefined behavior is not executed, then the program&#8217;s behavior is still constrained by the standard.<\/p>\n<p>By comparison, an <i>ill-formed<\/i> program is a program that breaks one of the rules for how programs are written. For example, maybe you try to modify a variable declared as <code>const<\/code>, or maybe you called a function that returns <code>void<\/code> and tried to store the result into a variable.<\/p>\n<p>There are two subcategories of ill-formed programs. One is the plain vanilla ill-formed program, for which the standard requires a <i>diagnostic<\/i>, meaning that the standard requires that the compiler report the error, and the compiler is no longer obligated to produce a runnable program. Indeed, by default, most compilers will refuse to produce <i>any<\/i> program at all, runnable or not.<\/p>\n<pre>const int size = 4;\r\n\r\nvoid expand()\r\n{\r\n    \/\/ ill-formed: Modifying a const variable.\r\n    size = 9;\r\n}\r\n<\/pre>\n<p>The above program is ill-formed because it tries to modify a const variable, and the compiler reports a compile-time error and refuses to produce a program. The plain vanilla ill-formed programs aren&#8217;t scary because the compiler lets you know that you broke a rule and typically refuses to let you proceed until you fix it.<\/p>\n<p>The other subcategory of ill-formed programs is the scary one: ill-formed no diagnostic required, commonly abbreviated <i>IFNDR<\/i>. These are programs which are ill-formed, but for which the standard does not require the compiler to report an error. The compiler is welcome to do so if it chooses, but it is also permitted to remain silent. In practice, IFNDR is used to describe things which are &#8220;bad&#8221; but which compilers are not equipped to detect. For example, if you have two translation units (standard-speak for &#8220;a .cpp file&#8221;), both of them must agree on the bodies of any inline functions.<\/p>\n<pre>\/\/ file1.cpp\r\n\r\ninline int magic() { return 42; }\r\nint get_value() { return magic(); }\r\n\r\n\/\/ file2.cpp\r\n\r\nextern int get_value();\r\ninline int magic() { return 99; }\r\n\r\nint main(int argc, char** argv)\r\n{\r\n    if (argc &gt; 1) return get_value();\r\n    return 0;\r\n}\r\n<\/pre>\n<p>In the above example, we have a project that consists of two .cpp files. The two files disagree on what the <code>int magic()<\/code> inline function does, which is a category of IFNDR. The compiler is permitted but not required to detect this mismatch, and if you run the resulting program, the results are undefined: If you run the resulting program with a command line argument, the <code>get_value()<\/code> function might return 42. It might return 99. It might return 31415. It might reformat your hard drive. It might hang.<\/p>\n<p>Even worse, even if you run the program with no command line options (so that <code>get_value()<\/code> is never called), the results are <i>still<\/i> undefined. It could still reformat your hard drive.<\/p>\n<p>That&#8217;s what&#8217;s scary about IFNDR: the resulting program is <i>already broken even before you run it<\/i>. If a program contains IFNDR, the standard imposes no requirements on the behavior when you run the resulting program. The program is <i>fundamentally invalid<\/i>, and no good will come out of running it.<\/p>\n<p>In practice, IFNDR causes your program to behave erratically: In the above example, when you try to debug the program, you may observe that calls to <code>magic()<\/code> return values that don&#8217;t make sense because the compiler chose to use a copy of the inline function that is different from the one you expected. If your IFNDR results from an inconsistent class declaration, you may experience memory corruption because two different parts of the program disagree on what the class layout is.<\/p>\n<p>A common source of IFNDR is making changes to a class dependent upon a preprocessor setting that is different in different .cpp files.<\/p>\n<pre>\/\/ common.h\r\n\r\nstruct Widget\r\n{\r\n    Widget();\r\n\r\n    \u27e6 more stuff \u27e7\r\n\r\n#ifdef EXTRA_WIDGET_DEBUGGING\r\n    Logger m_logger;\r\n\r\n    void Log(std::string const&amp; message) {\r\n        m_logger.log(message);\r\n    }\r\n#else\r\n    void Log(std::string const&amp; message) {\r\n        \/\/ no extra logging\r\n    }\r\n#endif\r\n\r\n    std::string m_name;\r\n};\r\n<\/pre>\n<p>If two .cpp files include this common header file, and one of them defines <code>EXTRA_<wbr \/>WIDGET_<wbr \/>DEBUGGING<\/code> but the other does not, then you have a big problem because they will disagree on <code>sizeof(Widget)<\/code>, they will disagree on where the <code>m_name<\/code> is, and they will disagree on what the <code>Log()<\/code> function does. The result is that any use of the <code>Widget<\/code> will choose one definition or the other, and if not everybody chooses consistently (and in practice, there&#8217;s a good chance they won&#8217;t), then you have memory corruption on your hands.<\/p>\n<p>Visual Studio has <a title=\"Diagnosing Hidden ODR Violations in Visual C++ (and fixing LNK2022)\" href=\"https:\/\/devblogs.microsoft.com\/cppblog\/diagnosing-hidden-odr-violations-in-visual-c-and-fixing-lnk2022\/\"> an unofficial command line option to help identify certain classes of IFNDR<\/a>. And you can code defensively and use <a title=\"Using #pragma detect_mismatch to help catch ODR violations\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20160803-00\/?p=94015\"> <code>#pragma detect_mismatch<\/code><\/a> to help catch these types of mismatches.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>They are two kinds of undefined-ness, one for runtime and one for compile-time.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-110091","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>They are two kinds of undefined-ness, one for runtime and one for compile-time.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110091"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110091\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}