{"id":99485,"date":"2018-08-14T07:00:00","date_gmt":"2018-08-14T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=99485"},"modified":"2019-03-13T00:38:19","modified_gmt":"2019-03-13T07:38:19","slug":"20180814-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180814-00\/?p=99485","title":{"rendered":"The PowerPC 600 series, part 7: Atomic memory access and cache coherency"},"content":{"rendered":"<p>On the PowerPC 600 series, memory accesses to suitably-aligned locations by a single register are atomic,&sup1; meaning that even in the face of a conflicting operation on another processor, the result will be the entire previous value or the entire final value, never a mix of the two. <\/p>\n<p>To perform atomic update operations (load-modify-store, also known as interlocked operations), you use the <code>lwarx<\/code> and <code>stwcx.<\/code> instructions: <\/p>\n<pre>\n    lwarx   rd, ra\/0, rb      ; load rd from ra\/0 + rb and reserve\n    stwcx.  rd, ra\/0, rb      ; store rd conditionally to ra\/0 + rb, update cr0\n<\/pre>\n<p>Note that the only supported addressing mode is <code>x<\/code>. No plain instruction, and no <code>u<\/code> forms. <\/p>\n<p>The <code>lwarx<\/code> instruction loads a word and creates a reservation which monitors the memory for changes. Any modification to that address or an address nearby causes the reservation to be lost. The definition of &#8220;nearby&#8221; is left up to the processor. <\/p>\n<p>The <code>stwcx.<\/code> instruction tries to store <var>rd<\/var> to memory. The store will succeed if the reservation is still in effect and the store is to the same address as the most recent <code>lwarx<\/code>. The result of the operation is reported in the <var>eq<\/var> bit of <var>cr0<\/var>: <var>eq<\/var> is set on success and clear on failure. The instruction also updates the other bits of <var>cr0<\/var> by clearing the <var>lt<\/var> and <var>gt<\/var> bits and capturing the summary overflow bit. <\/p>\n<p>Note that the <code>stwcx.<\/code> instruction ends with a dot because it implicitly updates <var>cr0<\/var>. There is no undotted form. <\/p>\n<p>Regardless of whether the store succeeded, the reservation is cleared. <\/p>\n<p>If you attempt to store back to a location different from the most recent preceding <code>lwarx<\/code>, and the reservation is still valid, the store might or might not succeed, and the <var>eq<\/var> bit will be unpredictable; it need not reflect the actual success of the store. So don&#8217;t do that.&sup2; <\/p>\n<p>If you&#8217;ve seen the other RISC architecture atomic operations, this should feel very familiar. Here&#8217;s a sample interlocked increment: <\/p>\n<pre>\n    ; atomically increment the word stored at address r3\nloop:\n    lwarx   r4, 0, r3         ; load with reservation\n    addi    r4, r4, 1         ; increment\n    stwcx.  r4, 0, r3         ; store conditional\n    bne-    loop              ; if failed (unlikely), try again\n    ; on exit r4 contains incremented value\n<\/pre>\n<p>You are allowed to abandon a reservation. For example, a compare-exchange starts with a reservation, but if the value is incorrect, it just gives up without ever storing anything. <\/p>\n<pre>\n    ; if the word at r3 is equal to r4, then replace it with r5\nloop:\n    lwarx   r6, 0, r3         ; load with reservation\n    cmpw    r6, r4            ; contains correct value?\n    bne-    stop              ; if not, then give up\n    stwcx.  r5, 0, r3         ; store conditional\n    bne-    loop              ; if failed (unlikely), try again\nstop:\n    ; r6 contains previous value stored at r3\n<\/pre>\n<\/p>\n<p>As noted above, simple accesses to suitably-aligned locations are atomic, and you can use the <code>lwarx<\/code>\/<code>stwcx.<\/code> instructions to construct more complex atomic operations, but none of those instructions impose any memory ordering. In practice, the interlocked operations will usually erect a memory barrier before and\/or after the atomic update. <\/p>\n<pre>\n    sync                      ; full memory barrier\n    isync                     ; acquire\n    lwsync                    ; release\n<\/pre>\n<p>The <code>sync<\/code> instruction is a full memory barrier. <\/p>\n<p>The <code>isync<\/code> instruction officially discards prefetch, but that has a side effect of preventing future memory operations from starting (because they were discarded), which is effectively an acquire. You usually use it after taking a lock, so that reads intended to be under the lock do not get advanced to before the lock is taken. <\/p>\n<p>The <code>lwsync<\/code> waits for preceding loads and stores to complete, but allows future loads to start. You usually use it just before releasing a lock, so that all accesses that were intended to be protected by the lock are finished before the lock is dropped. <\/p>\n<p>And then there&#8217;s this guy: <\/p>\n<pre>\n    eieio                     ; enforce in-order execution of I\/O\n<\/pre>\n<p>This instruction is so famous <a HREF=\"https:\/\/en.wikipedia.org\/wiki\/Enforce_In-order_Execution_of_I\/O\">it has its own Wikipedia page<\/a>. Somebody worked really hard to <a HREF=\"https:\/\/en.wikipedia.org\/wiki\/Old_MacDonald_Had_a_Farm\">backronym that mnemonic<\/a>. It&#8217;s intended as a memory barrier for memory-mapped I\/O, but it is generally useful as well. It acts like a lightweight <code>lwsync<\/code>: It ensures that all pending stores are completed, but it does not prevent future loads from starting or force preceding loads to complete. You can use this just before exiting a lock if the purpose of the lock was to update some data rather than to read some data. The compiler, of course, doesn&#8217;t usually have this level of insight into your code, so you&#8217;re unlikely to see this in practice. <\/p>\n<p>There are other types of barriers but you&#8217;re not likely to encounter them. There are also special instructions to tell the processor that you&#8217;ve written new code to memory, so it should discard any prefetch or instruction cache. <\/p>\n<p>When reading code, you don&#8217;t need to worry too much about the distinctions between these different types of barriers. You can assume that the compiler used the correct barrier. (Well, unless you&#8217;re chasing a compiler bug.) <\/p>\n<p>The PowerPC permits implementations to have separate I-cache and D-cache, so you cannot assume that writing code to memory will immediately take effect at execution. You have to explicitly tell the processor that instructions have changed. This is mostly relevant only for jitters, so I won&#8217;t go into details. I never had to debug a jitter on this guy, and even if I were called upon to do it, I&#8217;d just assume that whoever wrote the memory barrier stuff knew what they were doing. <\/p>\n<p><a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20180815-00\/?p=99495\">Next time<\/a>, we&#8217;ll look at control flow instructions and their absurd mnemonics. <\/p>\n<p>&sup1; Although not available in little-endian mode, there are instructions in big-endian mode that can load and store multiple registers. Each individual register access is atomic if suitably aligned, but the entire operation is not. <\/p>\n<p>&sup2; Interrupts and traps do not clear the reservation. This means that if the operating system wants to perform a context switch, it needs to perform a <code>stwcx.<\/code> to a harmless location to force the reservation to be cleared. Otherwise, the thread being switched to might be in the middle of an atomic operation, and its <code>stwcx.<\/code> might succeed based on the previous thread&#8217;s reservation! This is a rare case where you will intentionally perform a <code>stwcx.<\/code> to an address that doesn&#8217;t match the preceding <code>lwarx<\/code>. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to avoid a break-up.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-99485","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>How to avoid a break-up.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99485","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=99485"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99485\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=99485"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=99485"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=99485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}