{"id":96845,"date":"2017-08-18T07:00:00","date_gmt":"2017-08-18T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=96845"},"modified":"2019-03-13T01:18:07","modified_gmt":"2019-03-13T08:18:07","slug":"20170818-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20170818-00\/?p=96845","title":{"rendered":"The Alpha AXP, part 10: Atomic updates to byte and word memory units"},"content":{"rendered":"<p>Today we&#8217;re going to do a little exercise based on what we&#8217;ve learned so far. We learned how to perform byte and word <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20170815-00\/?p=96816\">loads<\/a> and <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20170816-00\/?p=96825\">stores<\/a> to memory. And we also learned how to perform atomic memory operations on longs and quads. But how about atomic memory operations on bytes and words? <\/p>\n<p>We will have to put together what we&#8217;ve learned: Combine the byte and word access patterns with the atomic memory update pattern. <\/p>\n<p>To recap: The sequence for reading an aligned word in memory goes like this: <\/p>\n<pre>\n    LDQ_U  t1, (t0)\n    EXTWL  t1, t0, t1\n<\/pre>\n<p>The sequence for writing an aligned word in memory goes like this: <\/p>\n<pre>\n    LDQ_U   t5, (t0)                  ; t5 = yyBA xxxx\n    INSWL   t1, t0, t3                ; t3 = 00ba 0000\n    MSKWL   t5, t0, t5                ; t5 = yy00 xxxx\n    BIS     t5, t3, t5                ; t5 = yyba xxxx\n    STQ_U   t5, (t0)\n\n    ; Byte sequence is the same, except you use INSBL and MSKBL\n<\/pre>\n<p>And the sequence for an atomic quad update goes like this: <\/p>\n<pre>\nretry:\n    LDQ_L   t1, (t0)        ; load locked\n    ... calculate new value of t1 based on old value ...\n    STQ_C   t1, (t0)        ; store conditional\n                            ; t1 = 1 if store was successful\n    BEQ     t1, failed      ; jump if store failed\n    ... continue execution ...\n\nfailed:\n    BR      zero, retry     ; try again\n<\/pre>\n<\/p>\n<p>What we need to do is insert the byte or word extraction, calculation, and insertion code where it says &#8220;calculate new value of <var>t1<\/var> based on old value&#8221;. The trick is that there is no <code>LDQ_LU<\/code> instruction. You can read for unaligned or you can read locked, but you can&#8217;t read for unaligned locked. <\/p>\n<p>Fortunately, this is easy to work around: We emulate the behavior of <code>LDQ_U<\/code> in software. Recall that <code>LDQ_U<\/code> is the same as <code>LDQ<\/code> except that it ignores the bottom 3 bits of the address. So let&#8217;s mask out the bottom 3 bits of the address. <\/p>\n<pre>\n    ; atomically increment the word at the aligned address t0\n    BIC     t3, #3, t0      ; force-align t0 to t3\nretry:\n    LDQ_L   t1, (t3)        ; load locked\n    ... calculate new value of t1 based on old value ...\n    STQ_C   t1, (t3)        ; store conditional\n                            ; t1 = 1 if store was successful\n    BEQ     t1, failed      ; jump if store failed\n    ... continue execution ...\n\nfailed:\n    BR      zero, retry     ; try again\n<\/pre>\n<p>Okay, we&#8217;ve successfully emulated the <code>LDQ_LU<\/code> and <code>STQ_LU<\/code> instructions. Now to do the extraction, calculation, and insertion: <\/p>\n<pre>\n    ; atomically increment the word at the aligned address t0\n    BIC     t3, #3, t0      ; force-align t0 to t3\nretry:\n    LDQ_L   t1, (t3)        ; load locked\n                            ; t1 = yyBA xxxx\n\n    ; Extract\n    EXTWL   t1, t3, t2      ; t2 = 0000 00BA (the word value)\n\n    ; Calculate\n    ADDL    t2, #1, t2      ; increment t2\n\n    ; Insert\n    INSWL   t2, t0, t2      ; t2 = 00ba 0000\n    MSKWL   t1, t0, t1      ; t1 = yy00 xxxx\n    BIS     t1, t2, t1      ; t1 = yyba xxxx\n\n    STQ_C   t1, (t3)        ; store conditional\n                            ; t1 = 1 if store was successful\n    BEQ     t1, failed      ; jump if store failed\n    ... continue execution ...\n\nfailed:\n    BR      zero, retry     ; try again\n<\/pre>\n<p>Fortunately, our extraction, calculation, and insertion could be performed in under 20 instructions with no additional memory access, and no use of potentially-emulated instructions, so it all fits between the <code>LDQ_L<\/code> and <code>STQ_C<\/code>. <\/p>\n<p><b>Exercise<\/b>: What could we do if our calculation required additional memory access or required more than 20 instructions? <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Putting together some things we&#8217;ve learned.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-96845","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Putting together some things we&#8217;ve learned.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/96845","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=96845"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/96845\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=96845"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=96845"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=96845"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}