Creating a co_await awaitable signal that can be awaited multiple times, part 3

Raymond Chen

Last time, we created an awaitable signal that can be awaited multiple times, but noted that it took kernel transitions a lot. Let’s implement the entire thing in user mode.

struct awaitable_event
  void set() const { shared->set(); }

  auto await_ready() const noexcept
    return shared->await_ready();

  auto await_suspend(
    std::experimental::coroutine_handle<> handle) const
    return shared->await_suspend(handle);

  auto await_resume() const noexcept
    return shared->await_resume();

  struct state
    std::atomic<bool> signaled = false;
    winrt::slim_mutex mutex;
    std::vector<std::experimental::coroutine_handle<>> waiting;

    void set()
      std::vector<std::experimental::coroutine_handle<>> ready;
        auto guard = winrt::slim_lock_guard(mutex);, std::memory_order_relaxed);
        std::swap(waiting, ready);
      for (auto&& handle : ready) handle();

    bool await_ready() const noexcept
    { return signaled.load(std::memory_order_relaxed); }

    bool await_suspend(
      std::experimental::coroutine_handle<> handle)
      auto guard = winrt::slim_lock_guard(mutex);
      if (signaled.load(std::memory_order_relaxed)) return false;
      return true;

    void await_resume() const noexcept { }

  std::shared_ptr<state> shared = std::make_shared<state>();

The awaitable_event contains a shared_ptr to an internal state object, which is where all the work really happens. Operations on the awaitable_event are all forwarded to the state object, so all of the public methods are relatively uninteresting. The excitement happens in the state object, so let’s focus on that.

To wait for the awaitable_event, we begin with await_ready, which returns whether the event is already signaled. If it is already signaled, then await_ready returns true, which bypasses the suspension entirely. An event that represents “initialization complete” will spend nearly all of its time in the signaled state, and this short-circuit gives an optimized path for the compiler so it doesn’t have to spill register variables in the case that the event is already signaled.

If the event is not signaled, then we get to await_suspend. We take the lock and check a second time whether the event has been signaled. If so, then we return false meaning “I reject the suspension. Keep running.”¹

On the other hand, if the event is truly not signaled, then we push the coroutine handle onto our list of waiting coroutine handles, and we’re done.

To signal the event, we take the lock, mark the event as signaled, and swap out the vector of waiting coroutine handles for an empty list. These coroutine handles are now ready: We iterate over the vector and resume each one.

This works relatively well, except that once you have a large number of waiting coroutines (say, because initialization is taking a really long time), the push_back on the vector might take a long time if the vector needs to be reallocated. The operation is still amortized O(1), but the per-instance cost can be as high as O(n).

Furthermore, the push_back can throw an exception due to low memory (note that await_suspend is not marked noexcept).

We’ll address both of these issues next time.

¹ I always have to pause to think whenever I get to the return statements in the await_ready and await_suspend methods, because the return values have opposite sense. I have to remember that you want to “suspend if not ready”.


Discussion is closed. Login to edit/delete existing comments.

  • Jason Cobb 0

    I think there’s a potential data race on signaled. If await_ready() and set() execute concurrently, there can be a concurrent read and write to signaled, giving UB.

    • Sunil Joshi 0

      I think the lock orders it. Signaled is only changed when the lock is held. await_ready is just an optimisation anyway. If await_ready returns false, await_suspend will be called. That is protected by the lock. And signalled is again checked. So even if set and await_ready race, the lock ensures the correct outcome.

      • anonymous 0

        This comment has been deleted.

      • Jason Cobb 0

        If the program has a data race, you can’t say it just works; the presence of a data race is UB. You cannot reason about the behavior of a program when a data race is present. And it doesn’t matter if signaled is only changed by one thread concurrently if it can be read while it’s being changed; that’s still a data race.

    • Raymond ChenMicrosoft employee 0

      You’re right. Changed to a std::atomic with explicitly relaxed access.

Feedback usabilla icon