{"id":106489,"date":"2022-04-18T07:00:00","date_gmt":"2022-04-18T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106489"},"modified":"2022-04-18T07:14:53","modified_gmt":"2022-04-18T14:14:53","slug":"20220418-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220418-00\/?p=106489","title":{"rendered":"The x86 architecture is the weirdo, part 2"},"content":{"rendered":"<p>Some time ago I noted that <a title=\"The x86 architecture is the weirdo\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20040914-00\/?p=37873\"> The x86 architecture is the weirdo<\/a>. (And by x86 I mean specifically x86-32.) I was reminded by the compiler folks of another significant place where the x86 architecture is different from all the others, and that&#8217;s in how Windows structured exceptions are managed.<\/p>\n<p>On Windows, all the other architectures track exception handling by using unwind codes and other information declared as metadata. If you step through a function on any other architecture, you won&#8217;t see any instructions related to exception handling. Only when an exception occurs does the system look up the instruction pointer in the exception-handling information in the metadata, and use that to decide what to do: Which exception handler should run? What objects need to be destructed? That sort of thing.<\/p>\n<p>But the x86 is the weirdo. On Windows, the x86 tracks exception information at runtime. When control enters a function that needs to deal with exceptions (either because it it wants to handle the exception, or just because it wants to run destructors when an exception is thrown out of the function), the code must create an entry in a linked list threaded through the stack and anchored by the value in <code>fs:[0]<\/code>. In the Microsoft Visual C++ implementation, the linked list node also <a title=\"The case of the orphaned critical section despite being managed by an RAII type\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20181228-00\/?p=100585\"> contains an integer which represents the current progress through the function<\/a>, and that integer is updated whenever there is a change to the list of objects requiring destruction. It is updated immediately after the construction of an object completes, and immediately before the destruction of an object commences.<\/p>\n<p>This special integer is a real pain in the neck, because the optimizer sees it as a dead store and really wants to optimize it out. Indeed sometimes, it really <i>is<\/i> a dead store, but sometimes it isn&#8217;t.<\/p>\n<p>Consider:<\/p>\n<pre>struct S { S(); ~S(); };\r\n\r\nvoid f1();\r\nvoid f2();\r\n\r\nS g()\r\n{\r\n    S s1;\r\n    f1();\r\n    S s2;\r\n    f2();\r\n    return S();\r\n}\r\n<\/pre>\n<p>The code generation for this function goes like this:<\/p>\n<pre>struct ExceptionNode\r\n{\r\n    ExceptionNode* next;\r\n    int (__stdcall *handler)(PEXCEPTION_POINTERS);\r\n    int state;\r\n};\r\n\r\nS g()\r\n{\r\n    \/\/ Create a new node\r\n    ExceptionNode node;\r\n    node.next = fs:[0];\r\n    node.handler = exception_handler_function;\r\n    node.state = -1; \/\/ nothing needs to be destructed\r\n\r\n    \/\/ Make it the new head of the linked list\r\n    fs:[0] = &amp;node;\r\n\r\n    construct s1;\r\n    node.state = 0; \/\/ s1 needs to be destructed\r\n\r\n    f1();\r\n\r\n    construct s2;\r\n    node.state = 1; \/\/ s1 and s2 need to be destructed\r\n\r\n    f2();\r\n\r\n    construct return value;\r\n    node.state = 2; \/\/ s1, s2, and return value need to be destructed\r\n\r\n    node.state = 3; \/\/ s1 and return value need to be destructed\r\n    destruct s2;\r\n\r\n    node.state = 4; \/\/ return value needs to be destructed\r\n    destruct s1;\r\n}\r\n<\/pre>\n<p>The unwind state variable is updated whenever the list of &#8220;objects requiring destruction&#8221; changes. As far as the optimizer is concerned, all of these updates to <code>state<\/code> look like dead stores, since it seems that nobody reads them.<\/p>\n<p>Aha, but somebody does read them: The <code>exception_<wbr \/>handler_<wbr \/>function<\/code>. The problem is that the call to the <code>exception_<wbr \/>handler_<wbr \/>function<\/code> is invisible: It is called when an exception is thrown by the <code>f1()<\/code> or <code>f2()<\/code> function, or by the destructor of the <code>S<\/code> objects.\u00b9<\/p>\n<p>But wait, some of these really are dead stores. For example, the assignments of 2 to <code>node.state<\/code> is a dead store, because it is immediately followed by a store of 3, and there is nothing in between, so no exception could occur while the value is 2. Similarly, the store of 3 is dead because the destructor of <code>S<\/code> is implicitly <code>noexcept<\/code>.\u00b9 And the store of 4 is dead for the same reason: No exception can occur when destructing <code>s1<\/code>.<\/p>\n<p>Further dead store elimination becomes possible if <code>f1<\/code> or <code>f2<\/code> are changed to <code>noexcept<\/code>.<\/p>\n<p>So the optimizer is in a tricky spot here: It wants to eliminate dead stores, but the simple algorithm for identifying dead stores doesn&#8217;t work here because of the potential for exceptions.<\/p>\n<p>Coroutines make this even worse: When a coroutine suspends, the exception-handling node needs to be copied from the stack into the coroutine frame, and then removed from the stack frame. And when the coroutine resumes, the state needs to be copied from the coroutine frame back into the stack, and linked into the chain of exception handlers.<\/p>\n<p>Knowing exactly when to do this unlinking and relinking is tricky, because you still have to catch exceptions that occur in <code>await_suspend<\/code> and store them in the promise. But we learned that <code>await_suspend<\/code> is fragile because the coroutine may have resumed and run to completion before <code>await_suspend<\/code> returns.<\/p>\n<pre>void await_suspend(coroutine_handle&lt;&gt; handle)\r\n{\r\n  arrange_for_resumption(handle);\r\n  throw oops; \/\/ who catches this?\r\n}\r\n<\/pre>\n<p>The language says that the thrown exception is caught by the coroutine framework, which calls <code>promise.unhandled_exception()<\/code>. But the promise may no longer exist!<\/p>\n<p>Dealing with all these crazy edge cases makes exception handling on x86, and <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/cpp20-coroutine-improvements-in-visual-studio-2019-version-16-11\/\"> particularly exception handling on x86 <i>in coroutines<\/i><\/a>, quite a complicated undertaking.<\/p>\n<p><b>Bonus reading<\/b>: <a title=\"Zero-cost exceptions aren't zero cost\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220228-00\/?p=106296\"> Zero-cost exceptions aren&#8217;t zero cost<\/a>.<\/p>\n<p>\u00b9 Destructors default to noexcept if no members or base classes have potentially-throwing destructors, but you can mark your destructor as potentially-throwing,\u00b2 and then exceptions thrown from destructors become something the compiler has to worry about.<\/p>\n<p>\u00b2 Please don&#8217;t do that.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It does everything differently, because of course it does.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-106489","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>It does everything differently, because of course it does.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106489","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106489"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106489\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106489"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106489"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106489"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}