{"id":109892,"date":"2024-06-13T07:00:00","date_gmt":"2024-06-13T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=109892"},"modified":"2024-06-13T09:27:42","modified_gmt":"2024-06-13T16:27:42","slug":"20240613-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20240613-00\/?p=109892","title":{"rendered":"Lock-free reference-counting a TLS slot using atomics, part 2"},"content":{"rendered":"<p>Last time, we wrote a class that <a title=\"Lock-free reference-counting a TLS slot using atomics, part 1\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20240612-00\/?p=109887\"> allocated a TLS on demand and freed the TLS slot when the last client disconnected<\/a>. We finished with a version that used a mutex, and we noted that profiling might reveal that the mutex is too expensive because the code is constantly creating and destroying <code>Tls\u00adUsage<\/code> objects. (Then again, it might not, in which case, you may as well stick with the mutex.)<\/p>\n<p>If you just follow the traditional pattern for singleton construction, you might come up with something like this for lazy allocation of the TLS slot:<\/p>\n<pre>struct TlsManager\r\n{\r\n    std::atomic&lt;DWORD&gt; m_count = 0;\r\n    std::atomic&lt;DWORD&gt; m_tls = TLS_OUT_OF_INDEXES;\r\n\r\n    \/\/ Don't use this code. See text.\r\n    DWORD Acquire()\r\n    {\r\n        if (++m_count != 1) {\r\n            return m_tls.load();\r\n        }\r\n\r\n        \/\/ Lazy-create the TLS slot.\r\n        auto tls = TlsAlloc();\r\n        THROW_LAST_ERROR_IF(tls == TLS_OUT_OF_INDEXES);\r\n        DWORD previous = TLS_OUT_OF_INDEXES;\r\n        if (!m_tls.compare_exchange_strong(previous, tls)) {\r\n            \/\/ Lost the race.\r\n            TlsFree(tls);\r\n            tls = previous;\r\n        }\r\n        return tls;\r\n    }\r\n};\r\n<\/pre>\n<p>The thinking here is that if the previous reference count was already nonzero, then we can count on the TLS already having been allocated. Otherwise, we are the ones who incremented the count from zero to one, so we have to create it, using the standard singleton creation pattern.<\/p>\n<p>You might wonder, &#8220;Wait, if we attempt to create the TLS slot only when the reference count goes from 0 to 1, then how could the <code>compare_<wbr \/>exchange_<wbr \/>strong<\/code> fail?&#8221; That&#8217;s a fair question, and thinking about it leads you to discovering why this code is wrong.<\/p>\n<p>The danger is that after we increment the reference count from 0 to 1, another thread may call <code>Acquire<\/code> and increment the count from 1 to 2. That other thread sees that the reference count was already nonzero, so it says, &#8220;Well, then clearly <i>I<\/i> don&#8217;t need to initialize the TLS slot because the guy who incremented from 0 to 1 is responsible for doing that,&#8221; and returns with the existing TLS slot.<\/p>\n<p>The problem is that &#8220;the guy responsible for doing that&#8221; is not finished doing that.<\/p>\n<p>One way to solve this is to make it the responsibility of <i>every<\/i> increment to initialize the TLS. Even if you incremented from 1 to 2, it&#8217;s possible that the thread that incremented from to 0 to 1 hasn&#8217;t finished the initialization yet, so you have to try to do it too, just in case.<\/p>\n<p>This does mean that every increment has to attempt an initialization, which is a lot of wasted calls to <code>TlsAlloc<\/code>. We can avoid those wasted calls by peeking at <code>m_tls<\/code> and returning early if we see that its value is not <code>TLS_<wbr \/>OUT_<wbr \/>OF_<wbr \/>INDEXES<\/code>.<\/p>\n<pre>    DWORD Acquire()\r\n    {\r\n        if (++m_count != 1) {\r\n            return m_tls.load();\r\n        }\r\n\r\n        <span style=\"border: solid 1px currentcolor; border-bottom: none;\">DWORD previous = m_tls.load();       <\/span>\r\n        <span style=\"border: 1px currentcolor; border-style: none solid;\">if (previous != TLS_OUT_OF_INDEXES) {<\/span>\r\n        <span style=\"border: 1px currentcolor; border-style: none solid;\">    \/\/ Already created               <\/span>\r\n        <span style=\"border: 1px currentcolor; border-style: none solid;\">    return previous;                 <\/span>\r\n        <span style=\"border: solid 1px currentcolor; border-top: none;\">}                                    <\/span>\r\n\r\n        \/\/ Lazy-create the TLS slot.\r\n        auto tls = TlsAlloc();\r\n        THROW_LAST_ERROR_IF(tls == TLS_OUT_OF_INDEXES);\r\n        auto previous = TLS_OUT_OF_INDEXES;\r\n        if (!m_tls.compare_exchange_strong(previous, tls)) {\r\n            \/\/ Lost the race.\r\n            TlsFree(tls);\r\n            tls = previous;\r\n        }\r\n        return tls;\r\n    }\r\n};\r\n<\/pre>\n<p>But wait, the adventure is only beginning. The <code>Release<\/code> method is much trickier. A na\u00efve attempt might go like this:<\/p>\n<pre>    \/\/ Don't use this code. See text.\r\n    void Release()\r\n    {\r\n        if (--m_count == 0) {\r\n            TlsFree(m_tls.exchange(TLS_OUT_OF_INDEXES));\r\n        }\r\n    }\r\n<\/pre>\n<p>The idea here is that if we decrement to zero, then we atomically reset the <code>m_tls<\/code> back to its &#8220;no TLS&#8221; sentinel value and free the TLS slot that used to be there.<\/p>\n<p>Unfortunately, this doesn&#8217;t work because it&#8217;s possible that after the <code>--m_count<\/code> decrements the counter to zero, but before we can free the TLS slot in <code>m_tls<\/code>, another thread sneaks in and calls <code>Acquire()<\/code>.<\/p>\n<p>That concurrent call to <code>Acquire()<\/code> bumps the reference count back up to 1, and then sees that we already have a TLS allocated, so it just returns. The <code>Release()<\/code> function then proceeds to exchange and free the TLS slot that the <code>Acquire()<\/code> was using.<\/p>\n<table style=\"border-collapse: collapse;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td style=\"border: solid 1px currentcolor; padding: 1ex;\">Thread 1<\/td>\n<td style=\"border: solid 1px currentcolor; padding: 1ex;\">Thread 2<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px currentcolor; border-style: none solid; padding: 1ex 1ex 0 1ex;\"><code>Release()<\/code><br \/>\n\u00a0\u00a0<code>--m_count<\/code> (decrements to zero)<\/td>\n<td style=\"border: 1px currentcolor; border-style: none solid; padding: 1ex 1ex 0 1ex;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px currentcolor; border-style: none solid; padding: 0 1ex;\">\u00a0<\/td>\n<td style=\"border: 1px currentcolor; border-style: none solid; padding: 0 1ex;\"><code>Acquire()<\/code><br \/>\n\u00a0\u00a0<code>++m_count<\/code> (increments to one)<br \/>\n\u00a0\u00a0<code>previous = m_tls.load()<\/code> (valid TLS) \u00a0\u00a0<code>return previous;<\/code><\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px currentcolor; border-style: none solid solid; padding: 0 1ex 1ex 1ex;\">\u00a0\u00a0<code>tls = m_tls.exchange(INVALID)<\/code><br \/>\n\u00a0\u00a0<code>TlsFree(tls)<\/code> (oops! frees in-use TLS!)<\/td>\n<td style=\"border: 1px currentcolor; border-style: none solid solid; padding: 0 1ex 1ex 1ex;\">\u00a0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Addressing this race between <code>Acquire()<\/code> and <code>Release()<\/code> is going to require a different approach. We&#8217;ll investigate further next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Getting it is easy. Getting rid of it is hard.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-109892","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Getting it is easy. Getting rid of it is hard.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/109892","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=109892"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/109892\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=109892"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=109892"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=109892"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}