When managing reference counts, there is an asymmetry between incrementing and decrementing: Incrementing the reference count can use relaxed semantics, but decrementing requires release semantics (and destroying requires acquire semantics).
The asymmetry may strike you as odd, but maybe it shouldn’t. After all, it’s not surprising that it’s easier to pull your toys out out than to put them away.
Incrementing a reference count can be done with relaxed semantics (no memory ordering with respect to other memory locations) because the object is not at risk of being destroyed, and any memory operations that occur after the increment may as well have occurred before the increment. Incrementing a reference count doesn’t really impose any ordering requirements on memory accesses to the object.
Decrementing a reference count is a different story.
The danger with decrementing a reference count is that the the object is destructed when the reference count goes to zero. Now, maybe you didn’t decrement the reference count to zero, but it’s possible that another thread decrements it to zero after you do. Therefore, any decrement must be done with release semantics so that any straggling writes to memory are visible to the destructing thread before it frees the memory. One reason is that you want the destructor to see a consistent object. And even if the delayed write doesn’t affect consistency, you don’t want it to complete after the memory is freed. That would be a use-after-free, which is undefined behavior. In practice, this will corrupt whatever object was allocated into the memory that was previously occupied by the destrutced object.
Meanwhile, the thread that decrements the reference count to zero must perform an acquire to ensure that it doesn’t start destructing the object until all previous writes have drained.
There are two approaches to this double responsibility on the decrement.
One is to decrement with release semantics, and then establish an acquire fence if you realize that you are the one to do the decrement. This is the strategy employed by C++/WinRT:
static uint32_t __stdcall Release(fast_abi_forwarder* self) noexcept { uint32_t const remaining = self->m_references. fetch_sub(1, std::memory_order_release) - 1; if (remaining == 0) { std::atomic_thread_fence(std::memory_order_acquire); delete self; } return remaining; }
Another approach is to use an acquire-release on the decrement, thereby avoiding the need for a separate acquire when the reference count goes to zero. This is the strategy employed by Microsoft’s STL:
void _Decref() noexcept { // decrement use count if (_MT_DECR(_Uses) == 0) { _Destroy(); _Decwref(); } } void _Decwref() noexcept { // decrement weak reference count if (_MT_DECR(_Weaks) == 0) { _Delete_this(); } }
where _MT_DECR
is defined as
#define _MT_DECR(x) _INTRIN_ACQ_REL(_InterlockedDecrement)(reinterpret_cast(&x))
and _INTRIN_ACQ_REL
performs an acquire-release atomic operation, or at least the closest version supported by the processor provided it is at least as strong as an acquire-release.
The libcxx library (llvm) also uses acquire-release, as does the libc++ library (gcc).
There's an extra wrinkle / strategy used by shared_ptr and weak_ptr in LLVM's libc++. For the weak count, libc++ does a load acquire on the weak ref count, and if this is the last reference, it doesn't even do the decrement. If it's not the last one, then it does the acq_rel decrement. This saves a potentially expensive atomic store in the extremely common case of going from a ref count of 1 to 0, at the expense of a unnecessary loads when there are a lot of weak_ptrs around. I even left a big comment...