{"id":98375,"date":"2018-03-29T07:00:00","date_gmt":"2018-03-29T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=98375"},"modified":"2020-03-30T06:36:51","modified_gmt":"2020-03-30T13:36:51","slug":"20180329-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180329-00\/?p=98375","title":{"rendered":"What&#8217;s up with <CODE>compare_<\/CODE><CODE>exchange_<\/CODE><CODE>weak<\/CODE> anyway?"},"content":{"rendered":"<p><a href=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20180328-00\/?p=98365\">Last time<\/a>, I left you with a homework assignment: <a href=\"https:\/\/www.youtube.com\/watch?v=ZQFzMfHIxng\">Watch this video on <code>std::atomic<\/code><\/a>.<\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=ZQFzMfHIxng&amp;t=33m03s\">At time code 33:03<\/a>, the presenter notes the weak version of compare-exchange (which is permitted to fail even if the value matches the expected value) and <a href=\"https:\/\/www.youtube.com\/watch?v=ZQFzMfHIxng&amp;t=36m26s\">tries to reverse-engineer<\/a> what kind of hardware would require this operation, eventually settling on a NUMA architecture where cross-node memory accesses can time out.<\/p>\n<p>But there&#8217;s no need to speculate about something that exotic, because the answer is all around us. In fact, it&#8217;s probably happening right now on a computer in the presenter&#8217;s pocket.<\/p>\n<p>Most RISC processors do not have a compare-exchange instruction. Instead, they use a <i>load locked\/store conditional<\/i> pattern. This pattern is employed by the ARM architecture, and we also saw it for <a href=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20170817-00\/?p=96835\">Alpha AXP<\/a>, and we&#8217;ll see it later for MIPS and PowerPC.<\/p>\n<p>The <i>load locked\/store conditional<\/i> pattern goes like this:<\/p>\n<ul>\n<li>Issue a <i>load locked<\/i> instruction which reads a value from memory and instructs the processor to monitor that location for writes from other processors.<\/li>\n<li>Perform some computations.<\/li>\n<li>Issue a <i>store conditional<\/i> instruction which writes a value to the same memory location that was locked, provided the processor can prove that the memory has not been written to in the interim.<\/li>\n<\/ul>\n<p>The conditional store can fail if another processor has written to the memory, or memory on the same cache line or other unit of monitoring granularity, or if the processor took an interrupt.<\/p>\n<p>On an ARM, a strong compare-exchange contains a loop because the only way that <code>compare_<\/code><code>exchange_<\/code><code>strong<\/code> is permitted to fail is when the current value of the atomic variable does not match the expected value. If the failure reason was because of contention, then the strong version must perform an internal retry loop until the operation succeeds, or until the failure condition is met.<\/p>\n<pre>    ; r0 is the proposed new value\r\n    ; r1 is the expected old value\r\n    ; r2 is the address of the atomic variable\r\n\r\nretry:\r\n    DMB                     ; data memory barrier\r\n    LDREX   r3, [r2]        ; load current value and lock it\r\n    CMP     r3, r1          ; is it what we expected?\r\n    BNE     fail            ; N: operation failed\r\n                            ; actual current value is in r3\r\n\r\n    STREX   r4, r0, [r2]    ; try to store new value\r\n    CBNZ    r4, retry       ; lost the lock, try again\r\n    DMB                     ; data memory barrier\r\n<\/pre>\n<p>Consider the compare-exchange loop in the code sample in the presentation:<\/p>\n<pre>    do { new_n-&gt;next = old_h; }\r\n    while (!head.compare_exchange_strong(old_h, new_n));\r\n<\/pre>\n<p>The <code>compare_<\/code><code>exchange_<\/code><code>strong<\/code> has an embedded loop, and it&#8217;s part of another loop. So we have to generate two loops:<\/p>\n<pre>    ; r0 is new_n\r\n    ; r1 is old_h\r\n    ; r2 is the address of the atomic variable \"head\"\r\n\r\nouter_loop:\r\n    STR     r1, [r0]        ; new_n-&gt;next = old_h\r\n\r\nretry:\r\n    DMB                     ; data memory barrier\r\n    LDREX   r3, [r2]        ; locked load of head\r\n    CMP     r3, r1          ; is it what we expected?\r\n    BNE     fail            ; N: operation failed\r\n\r\n    STREX   r4, r0, [r2]    ; try to store new value\r\n    CBNZ    r4, retry       ; lost the lock, try again\r\n\r\n    DMB                     ; data memory barrier\r\n\r\n    ; succeeded - continue with code that comes after\r\n\r\n    ...\r\n\r\n    ; This code goes at the end of the function because ARM\r\n    ; statically predicts forward-jumps as not-taken.\r\nfail:\r\n    DMB                     ; data memory barrier\r\n    MOV     r1, r3          ; old_h = current value of head\r\n    B       outer_loop      ; restart the outer loop\r\n<\/pre>\n<p>The outer loop drives the loop written by the C++ programmer. The inner loop is the one required by <code>compare_<\/code><code>exchange_<\/code><code>strong<\/code>.<\/p>\n<p>The weak version avoids this nested loop:<\/p>\n<pre>    do { new_n-&gt;next = old_h; }\r\n    while (!head.compare_exchange_<span style=\"color: blue;\">weak<\/span>(old_h, new_n));\r\n<\/pre>\n<p>With this version, the compiler can simply bail out at the first sign of trouble. It avoids having to create a separate <code>fail<\/code> label and reduces register pressure because it doesn&#8217;t need to carry the expected and actual values through the (no-longer present) inner loop.<\/p>\n<pre>    ; r0 is new_n\r\n    ; r1 is old_h\r\n    ; r2 is the address of the atomic variable \"head\"\r\n\r\nouter_loop:\r\n    STR     r1, [r0]        ; new_n-&gt;next = old_h\r\n\r\n    MOV     r3, r1          ; save old_h before we overwrite it\r\n    DMB                     ; data memory barrier\r\n    LDREX   r1, [r2]        ; locked load of head into old_h\r\n    CMP     r3, r1          ; is it what we expected?\r\n    BNE     outer_loop      ; N: retry with revised old_h\r\n\r\n    STREX   r3, r0, [r2]    ; try to store new value\r\n    CBNZ    r3, outer_loop  ; lost the lock, try again\r\n\r\n    DMB                     ; data memory barrier\r\n\r\n    ; succeeded - continue with code that comes after\r\n<\/pre>\n<p>When should you prefer the strong version of compare-exchange as opposed to the weak version? We&#8217;ll take up that question next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s handy for certain classes of processors.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-98375","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>It&#8217;s handy for certain classes of processors.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/98375","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=98375"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/98375\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=98375"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=98375"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=98375"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}