{"id":110738,"date":"2025-01-09T07:00:00","date_gmt":"2025-01-09T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110738"},"modified":"2025-01-13T13:08:00","modified_gmt":"2025-01-13T21:08:00","slug":"20250109-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20250109-00\/?p=110738","title":{"rendered":"Inside STL: Waiting for a <CODE>std::atomic&lt;std::shared_ptr&lt;T&gt;&gt;<\/CODE> to change, part 2"},"content":{"rendered":"<p>Last time, <a title=\"Deeper inside STL: Waiting for a std::atomic&lt;std::shared_ptr&lt;T&gt;&gt; to change, part 1\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20250108-00\/?p=110732\"> we looked at how the Microsoft C++ standard library implements <code>wait<\/code> and <code>notify_*<\/code> for <code>std::atomic&lt;std::shared_ptr&lt;T&gt;&gt;<\/code><\/a>. Today, we&#8217;ll look at the other library that (as of this writing) implements <code>std::atomic&lt;std::shared_ptr&lt;T&gt;&gt;<\/code>: libstdc++.<\/p>\n<p>The first thing to note is that the traditional &#8220;wait for a value to change&#8221; mechanism on unix is the futex, but futexes (<a title=\"Gesellschaft zur St\u00e4rkung der Verben: The German Society for the Irregularization of Verbs\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20250107-01\/?p=110724\">futexen<\/a>?) are limited to 4-byte values, which is insufficient for a 64-bit pointer, much less the <i>two<\/i> pointers inside a <code>shared_ptr<\/code>.<\/p>\n<p>At this point, I will refer you to learn about <a title=\"Implementing C++20 atomic waiting in libstdc++\" href=\"https:\/\/developers.redhat.com\/articles\/2022\/12\/06\/implementing-c20-atomic-waiting-libstdc\"> how libstdc++ implements waits on atomic values<\/a>, particularly the section on <a href=\"https:\/\/developers.redhat.com\/articles\/2022\/12\/06\/implementing-c20-atomic-waiting-libstdc#how_to_handle_those_types_that_do_not_fit_in_a___platform_wait_t-h2\"> how it handles types that do not fit in a <code>__platform_wait_t<\/code><\/a>. The remainder of this discussion will treat that as an already-solved problem and focus on the shared pointer part.<\/p>\n<p>Okay, back to <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/dc01f249db5c4d08b76dc2783b1539290a800f2d\/libstdc%2B%2B-v3\/include\/bits\/shared_ptr_atomic.h#L729\"> <code>atomic&lt;<wbr \/>shared_ptr&lt;T&gt;&gt;::<wbr \/>wait()<\/code><\/a>:<\/p>\n<pre>\/\/ atomic&lt;shared_ptr&lt;T&gt;&gt;::wait\r\nvoid\r\nwait(value_type __old,\r\n     memory_order __o = memory_order_seq_cst) const noexcept\r\n{\r\n    _M_impl.wait(std::move(__old), __o);\r\n}\r\n<\/pre>\n<p>When you wait on a <code>shared_ptr<\/code>, the work is done by <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/dc01f249db5c4d08b76dc2783b1539290a800f2d\/libstdc%2B%2B-v3\/include\/bits\/shared_ptr_atomic.h#L606\"> <code>_Sp_atomic::wait<\/code><\/a>:<\/p>\n<pre>\/\/ _Sp_atomic&lt;shared_ptr&lt;T&gt;&gt;::wait\r\nvoid\r\nwait(value_type __old, memory_order __o) const noexcept\r\n{\r\n    auto __pi = _M_refcount.lock(memory_order_acquire);\r\n    if (_M_ptr == __old._M_ptr &amp;&amp; __pi == __old._M_refcount._M_pi)\r\n      _M_refcount._M_wait_unlock(__o);\r\n    else\r\n      _M_refcount.unlock(memory_order_relaxed);\r\n}\r\n<\/pre>\n<p>The code locks the <code>shared_ptr<\/code> (by <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/dc01f249db5c4d08b76dc2783b1539290a800f2d\/libstdc%2B%2B-v3\/include\/bits\/shared_ptr_atomic.h#L455\"> setting the bottom bit of the control block pointer<\/a>, <a title=\"Inside STL: The atomic shared_ptr\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20241219-00\/?p=110663\">as we discussed earlier<\/a>), then checks whether the stored pointer and control block pointer both match. If not, then the wait is satisfied, and we release the lock and return. Otherwise, we ask <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/dc01f249db5c4d08b76dc2783b1539290a800f2d\/libstdc%2B%2B-v3\/include\/bits\/shared_ptr_atomic.h#L496\"> <code>_Atomic_count::<wbr \/>_M_wait_unlock<\/code><\/a> to finish the wait.<\/p>\n<pre>\/\/ _Atomic_count::_M_wait_unlock\r\nvoid\r\n_M_wait_unlock(memory_order __o) const noexcept\r\n{\r\n    auto __v = _M_val.fetch_sub(1, memory_order_relaxed);\r\n    _M_val.wait(__v &amp; ~_S_lock_bit, __o);\r\n}\r\n\r\nmutable __atomic_base&lt;uintptr_t&gt; _M_val{0};\r\n<\/pre>\n<p>As the name suggests, <code>_M_wait_unlock<\/code> clears the lock bit (thereby unlocking the shared pointer) and then waits for value to change from its current value.<\/p>\n<p>Meanwhile, <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/dc01f249db5c4d08b76dc2783b1539290a800f2d\/libstdc%2B%2B-v3\/include\/bits\/shared_ptr_atomic.h#L735\"> the <code>notify_*<\/code> methods<\/a> do something similar:<\/p>\n<pre>\/\/ atomic&lt;shared_ptr&lt;T&gt;&gt;::notify_*\r\nvoid\r\nnotify_one() noexcept\r\n{\r\n    _M_impl.notify_one();\r\n}\r\n\r\nvoid\r\nnotify_all() noexcept\r\n{\r\n    _M_impl.notify_all();\r\n}\r\n<\/pre>\n<p>They forward to <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/dc01f249db5c4d08b76dc2783b1539290a800f2d\/libstdc%2B%2B-v3\/include\/bits\/shared_ptr_atomic.h#L735\"> <code>_Sp_atomic::<wbr \/>notify_*<\/code><\/a>:<\/p>\n<pre>\/\/ _Sp_atomic&lt;shared_ptr&lt;T&gt;&gt;::notify_*\r\nvoid\r\nnotify_one() noexcept\r\n{\r\n    _M_refcount.notify_one();\r\n}\r\n\r\nvoid\r\nnotify_all() noexcept\r\n{\r\n    _M_refcount.notify_all();\r\n}\r\n<\/pre>\n<p>And those forward to <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/dc01f249db5c4d08b76dc2783b1539290a800f2d\/libstdc%2B%2B-v3\/include\/bits\/shared_ptr_atomic.h#L505\"> <code>_Atomic_count::<wbr \/>notify_*<\/code><\/a>:<\/p>\n<pre>\/\/ _Atomic_count::notify_*\r\nvoid\r\nnotify_one() noexcept\r\n{\r\n    _M_val.notify_one();\r\n}\r\n\r\nvoid\r\nnotify_all() noexcept\r\n{\r\n    _M_val.notify_all();\r\n}\r\n\r\nmutable __atomic_base&lt;uintptr_t&gt; _M_val{0};\r\n<\/pre>\n<p>which forward the notify to the atomic value.<\/p>\n<p>So at the end of the day, waiting on and notifying an atomic shared pointer boils down to waiting on and notifying its control block pointer.<\/p>\n<p>But hang on a second. The language specification says that a wait on an atomic shared pointer is satisfied when <i>either<\/i> the stored pointer <i>or<\/i> the control block pointer changes. But this code waits only for the control block pointer to change. Do we have a bug?<\/p>\n<p>Let&#8217;s write a test program to see whether our theory holds up, or whether there&#8217;s something else (like msvc&#8217;s exponential backoff) that saves us.<\/p>\n<pre>#include &lt;memory&gt;\r\n#include &lt;chrono&gt;\r\n#include &lt;thread&gt;\r\n\r\nstd::shared_ptr&lt;int&gt; q = std::make_shared&lt;int&gt;(42);\r\nstd::atomic&lt;std::shared_ptr&lt;int&gt;&gt; p = q;\r\n\r\nvoid signaler()\r\n{\r\n    std::this_thread::sleep_for(std::chrono::seconds(1));\r\n    p.store({ q, nullptr });\r\n    p.notify_one();\r\n    std::this_thread::sleep_for(std::chrono::seconds(1));\r\n    std::terminate();\r\n}\r\n\r\nint main(int, char**)\r\n{\r\n    std::thread(signaler).detach();\r\n    p.wait(q);\r\n    return 0;\r\n}\r\n<\/pre>\n<p>This program starts a thread that waits one second to give the main thread a chance to reach <code>p.wait()<\/code>. It then changes the atomic shared pointer by modifying only the stored pointer and reusing the control block, and then notifies the main thread. If the program is still running after one second, then the wait was not woken, and we terminate the program.<\/p>\n<p>Meanwhile, after starting the signaler thread, the main thread waits on the atomic shared pointer, and when the wait is satisfied, it exits the program.<\/p>\n<p>You expect this program to exit cleanly. The signaling thread modifies the atomic shared pointer, which satisfies the wait. (Even if we didn&#8217;t sleep one second before modifying the atomic shared pointer, the wait would still be satisfied because the value in the atomic shared pointer no longer matches <code>q<\/code>.)<\/p>\n<p>In practice, this program crashes at the <code>std::<wbr \/>terminate<\/code>.<\/p>\n<p>So it looks like we found a bug in libstdc++. (<code>-dumpversion<\/code> says 14.2.0.) Waiting on an atomic shared pointer does not notify if the the shared pointer changed only its stored pointer and not its control block. The atomic wait should be a 16-byte wait that covers both the stored pointer and the control block pointer.<\/p>\n<p><b>Bonus chatter<\/b>: I find it interesting that the language specification <a title=\"P1644R0: Add wait\/notify to atomic&lt;shared_ptr&lt;T&gt;&gt;\" href=\"http:\/\/wg21.link\/p1644r0\"> added wait\/notify support to atomic shared pointers<\/a> as an afterthought, with barely any discussion or contemplation, as if it had been deemed too trivial to be worth worrying about. And two of the three major implementations messed it up. (What about the third major implementation, clang&#8217;s libc++? Oh, <a title=\"Inside STL: The atomic shared_ptr\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20241219-00\/?p=110663\"> they haven&#8217;t implemented it yet<\/a>!)<\/p>\n<p>I was curious about this topic because the first thing that struck me about notify\/wait on atomic shared pointers was &#8220;Gosh, shared pointers are twice the size of regular pointers. I wonder how the implementations manage to wait atomically on something that is larger than a register?&#8221; And when I dug into the implementations, I found that the answer was &#8220;not correctly.&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Digging into the libstdc++ implementation.<\/p>\n","protected":false},"author":1069,"featured_media":110434,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-110738","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Digging into the libstdc++ implementation.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110738"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110738\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/110434"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110738"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110738"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}