Today we’re going to do a little exercise based on what we’ve learned so far. We learned how to perform byte and word loads and stores to memory. And we also learned how to perform atomic memory operations on longs and quads. But how about atomic memory operations on bytes and words?
We will have to put together what we’ve learned: Combine the byte and word access patterns with the atomic memory update pattern.
To recap: The sequence for reading an aligned word in memory goes like this:
LDQ_U t1, (t0) EXTWL t1, t0, t1
The sequence for writing an aligned word in memory goes like this:
LDQ_U t5, (t0) ; t5 = yyBA xxxx INSWL t1, t0, t3 ; t3 = 00ba 0000 MSKWL t5, t0, t5 ; t5 = yy00 xxxx BIS t5, t3, t5 ; t5 = yyba xxxx STQ_U t5, (t0) ; Byte sequence is the same, except you use INSBL and MSKBL
And the sequence for an atomic quad update goes like this:
retry: LDQ_L t1, (t0) ; load locked ... calculate new value of t1 based on old value ... STQ_C t1, (t0) ; store conditional ; t1 = 1 if store was successful BEQ t1, failed ; jump if store failed ... continue execution ... failed: BR zero, retry ; try again
What we need to do is insert the byte or word extraction, calculation, and insertion code where it says “calculate new value of t1 based on old value”. The trick is that there is no LDQ_LU
instruction. You can read for unaligned or you can read locked, but you can’t read for unaligned locked.
Fortunately, this is easy to work around: We emulate the behavior of LDQ_U
in software. Recall that LDQ_U
is the same as LDQ
except that it ignores the bottom 3 bits of the address. So let’s mask out the bottom 3 bits of the address.
; atomically increment the word at the aligned address t0 BIC t3, #3, t0 ; force-align t0 to t3 retry: LDQ_L t1, (t3) ; load locked ... calculate new value of t1 based on old value ... STQ_C t1, (t3) ; store conditional ; t1 = 1 if store was successful BEQ t1, failed ; jump if store failed ... continue execution ... failed: BR zero, retry ; try again
Okay, we’ve successfully emulated the LDQ_LU
and STQ_LU
instructions. Now to do the extraction, calculation, and insertion:
; atomically increment the word at the aligned address t0 BIC t3, #3, t0 ; force-align t0 to t3 retry: LDQ_L t1, (t3) ; load locked ; t1 = yyBA xxxx ; Extract EXTWL t1, t3, t2 ; t2 = 0000 00BA (the word value) ; Calculate ADDL t2, #1, t2 ; increment t2 ; Insert INSWL t2, t0, t2 ; t2 = 00ba 0000 MSKWL t1, t0, t1 ; t1 = yy00 xxxx BIS t1, t2, t1 ; t1 = yyba xxxx STQ_C t1, (t3) ; store conditional ; t1 = 1 if store was successful BEQ t1, failed ; jump if store failed ... continue execution ... failed: BR zero, retry ; try again
Fortunately, our extraction, calculation, and insertion could be performed in under 20 instructions with no additional memory access, and no use of potentially-emulated instructions, so it all fits between the LDQ_L
and STQ_C
.
Exercise: What could we do if our calculation required additional memory access or required more than 20 instructions?
0 comments