Last time, we created an awaitable signal that can be awaited multiple times, but noted that it took kernel transitions a lot. Let’s implement the entire thing in user mode.
struct awaitable_event { void set() const { shared->set(); } auto await_ready() const noexcept { return shared->await_ready(); } auto await_suspend( std::experimental::coroutine_handle<> handle) const { return shared->await_suspend(handle); } auto await_resume() const noexcept { return shared->await_resume(); } private: struct state { std::atomic<bool> signaled = false; winrt::slim_mutex mutex; std::vector<std::experimental::coroutine_handle<>> waiting; void set() { std::vector<std::experimental::coroutine_handle<>> ready; { auto guard = winrt::slim_lock_guard(mutex); signaled.store(true, std::memory_order_relaxed); std::swap(waiting, ready); } for (auto&& handle : ready) handle(); } bool await_ready() const noexcept { return signaled.load(std::memory_order_relaxed); } bool await_suspend( std::experimental::coroutine_handle<> handle) { auto guard = winrt::slim_lock_guard(mutex); if (signaled.load(std::memory_order_relaxed)) return false; waiting.push_back(handle); return true; } void await_resume() const noexcept { } }; std::shared_ptr<state> shared = std::make_shared<state>(); };
The awaitable_
contains a shared_
to an internal state
object, which is where all the work really happens. Operations on the awaitable_
are all forwarded to the state
object, so all of the public methods are relatively uninteresting. The excitement happens in the state
object, so let’s focus on that.
To wait for the awaitable_
, we begin with await_
, which returns whether the event is already signaled. If it is already signaled, then await_
returns true
, which bypasses the suspension entirely. An event that represents “initialization complete” will spend nearly all of its time in the signaled state, and this short-circuit gives an optimized path for the compiler so it doesn’t have to spill register variables in the case that the event is already signaled.
If the event is not signaled, then we get to await_
. We take the lock and check a second time whether the event has been signaled. If so, then we return false
meaning “I reject the suspension. Keep running.”¹
On the other hand, if the event is truly not signaled, then we push the coroutine handle onto our list of waiting coroutine handles, and we’re done.
To signal the event, we take the lock, mark the event as signaled, and swap out the vector of waiting coroutine handles for an empty list. These coroutine handles are now ready: We iterate over the vector and resume each one.
This works relatively well, except that once you have a large number of waiting coroutines (say, because initialization is taking a really long time), the push_back
on the vector might take a long time if the vector needs to be reallocated. The operation is still amortized O(1), but the per-instance cost can be as high as O(n).
Furthermore, the push_back
can throw an exception due to low memory (note that await_suspend
is not marked noexcept
).
We’ll address both of these issues next time.
¹ I always have to pause to think whenever I get to the return
statements in the await_
and await_
methods, because the return values have opposite sense. I have to remember that you want to “suspend if not ready”.
I think there’s a potential data race on signaled. If await_ready() and set() execute concurrently, there can be a concurrent read and write to signaled, giving UB.
You’re right. Changed to a std::atomic with explicitly relaxed access.
I think the lock orders it. Signaled is only changed when the lock is held. await_ready is just an optimisation anyway. If await_ready returns false, await_suspend will be called. That is protected by the lock. And signalled is again checked. So even if set and await_ready race, the lock ensures the correct outcome.
If the program has a data race, you can’t say it just works; the presence of a data race is UB. You cannot reason about the behavior of a program when a data race is present. And it doesn’t matter if signaled is only changed by one thread concurrently if it can be read while it’s being changed; that’s still a data race.
This comment has been deleted.