Last time, I left you with a homework assignment: Watch this video on std::atomic
.
At time code 33:03, the presenter notes the weak version of compare-exchange (which is permitted to fail even if the value matches the expected value) and tries to reverse-engineer what kind of hardware would require this operation, eventually settling on a NUMA architecture where cross-node memory accesses can time out.
But there’s no need to speculate about something that exotic, because the answer is all around us. In fact, it’s probably happening right now on a computer in the presenter’s pocket.
Most RISC processors do not have a compare-exchange instruction. Instead, they use a load locked/store conditional pattern. This pattern is employed by the ARM architecture, and we also saw it for Alpha AXP, and we’ll see it later for MIPS and PowerPC.
The load locked/store conditional pattern goes like this:
- Issue a load locked instruction which reads a value from memory and instructs the processor to monitor that location for writes from other processors.
- Perform some computations.
- Issue a store conditional instruction which writes a value to the same memory location that was locked, provided the processor can prove that the memory has not been written to in the interim.
The conditional store can fail if another processor has written to the memory, or memory on the same cache line or other unit of monitoring granularity, or if the processor took an interrupt.
On an ARM, a strong compare-exchange contains a loop because the only way that compare_
exchange_
strong
is permitted to fail is when the current value of the atomic variable does not match the expected value. If the failure reason was because of contention, then the strong version must perform an internal retry loop until the operation succeeds, or until the failure condition is met.
; r0 is the proposed new value ; r1 is the expected old value ; r2 is the address of the atomic variable retry: DMB ; data memory barrier LDREX r3, [r2] ; load current value and lock it CMP r3, r1 ; is it what we expected? BNE fail ; N: operation failed ; actual current value is in r3 STREX r4, r0, [r2] ; try to store new value CBNZ r4, retry ; lost the lock, try again DMB ; data memory barrier
Consider the compare-exchange loop in the code sample in the presentation:
do { new_n->next = old_h; } while (!head.compare_exchange_strong(old_h, new_n));
The compare_
exchange_
strong
has an embedded loop, and it’s part of another loop. So we have to generate two loops:
; r0 is new_n ; r1 is old_h ; r2 is the address of the atomic variable "head" outer_loop: STR r1, [r0] ; new_n->next = old_h retry: DMB ; data memory barrier LDREX r3, [r2] ; locked load of head CMP r3, r1 ; is it what we expected? BNE fail ; N: operation failed STREX r4, r0, [r2] ; try to store new value CBNZ r4, retry ; lost the lock, try again DMB ; data memory barrier ; succeeded - continue with code that comes after ... ; This code goes at the end of the function because ARM ; statically predicts forward-jumps as not-taken. fail: DMB ; data memory barrier MOV r1, r3 ; old_h = current value of head B outer_loop ; restart the outer loop
The outer loop drives the loop written by the C++ programmer. The inner loop is the one required by compare_
exchange_
strong
.
The weak version avoids this nested loop:
do { new_n->next = old_h; }
while (!head.compare_exchange_weak(old_h, new_n));
With this version, the compiler can simply bail out at the first sign of trouble. It avoids having to create a separate fail
label and reduces register pressure because it doesn’t need to carry the expected and actual values through the (no-longer present) inner loop.
; r0 is new_n ; r1 is old_h ; r2 is the address of the atomic variable "head" outer_loop: STR r1, [r0] ; new_n->next = old_h MOV r3, r1 ; save old_h before we overwrite it DMB ; data memory barrier LDREX r1, [r2] ; locked load of head into old_h CMP r3, r1 ; is it what we expected? BNE outer_loop ; N: retry with revised old_h STREX r3, r0, [r2] ; try to store new value CBNZ r3, outer_loop ; lost the lock, try again DMB ; data memory barrier ; succeeded - continue with code that comes after
When should you prefer the strong version of compare-exchange as opposed to the weak version? We’ll take up that question next time.
0 comments