Creating a co_await awaitable signal that can be awaited multiple times, part 3

Last time, we created an awaitable signal that can be awaited multiple times, but noted that it took kernel transitions a lot. Let’s implement the entire thing in user mode.

struct awaitable_event
{
  void set() const { shared->set(); }

  auto await_ready() const noexcept
  {
    return shared->await_ready();
  }

  auto await_suspend(
    std::experimental::coroutine_handle<> handle) const
  {
    return shared->await_suspend(handle);
  }

  auto await_resume() const noexcept
  {
    return shared->await_resume();
  }

private:
  struct state
  {
    std::atomic<bool> signaled = false;
    winrt::slim_mutex mutex;
    std::vector<std::experimental::coroutine_handle<>> waiting;

    void set()
    {
      std::vector<std::experimental::coroutine_handle<>> ready;
      {
        auto guard = winrt::slim_lock_guard(mutex);
        signaled.store(true, std::memory_order_relaxed);
        std::swap(waiting, ready);
      }
      for (auto&& handle : ready) handle();
    }

    bool await_ready() const noexcept
    { return signaled.load(std::memory_order_relaxed); }

    bool await_suspend(
      std::experimental::coroutine_handle<> handle)
    {
      auto guard = winrt::slim_lock_guard(mutex);
      if (signaled.load(std::memory_order_relaxed)) return false;
      waiting.push_back(handle);
      return true;
    }

    void await_resume() const noexcept { }
  };

  std::shared_ptr<state> shared = std::make_shared<state>();
};

The awaitable_event contains a shared_ptr to an internal state object, which is where all the work really happens. Operations on the awaitable_event are all forwarded to the state object, so all of the public methods are relatively uninteresting. The excitement happens in the state object, so let’s focus on that.

To wait for the awaitable_event, we begin with await_ready, which returns whether the event is already signaled. If it is already signaled, then await_ready returns true, which bypasses the suspension entirely. An event that represents “initialization complete” will spend nearly all of its time in the signaled state, and this short-circuit gives an optimized path for the compiler so it doesn’t have to spill register variables in the case that the event is already signaled.

If the event is not signaled, then we get to await_suspend. We take the lock and check a second time whether the event has been signaled. If so, then we return false meaning “I reject the suspension. Keep running.”¹

On the other hand, if the event is truly not signaled, then we push the coroutine handle onto our list of waiting coroutine handles, and we’re done.

To signal the event, we take the lock, mark the event as signaled, and swap out the vector of waiting coroutine handles for an empty list. These coroutine handles are now ready: We iterate over the vector and resume each one.

This works relatively well, except that once you have a large number of waiting coroutines (say, because initialization is taking a really long time), the push_back on the vector might take a long time if the vector needs to be reallocated. The operation is still amortized O(1), but the per-instance cost can be as high as O(n).

Furthermore, the push_back can throw an exception due to low memory (note that await_suspend is not marked noexcept).

We’ll address both of these issues next time.

¹ I always have to pause to think whenever I get to the return statements in the await_ready and await_suspend methods, because the return values have opposite sense. I have to remember that you want to “suspend if not ready”.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

5 comments

Jason Cobb March 3, 2021

I think there’s a potential data race on signaled. If await_ready() and set() execute concurrently, there can be a concurrent read and write to signaled, giving UB.

Raymond Chen Author March 16, 2021

You’re right. Changed to a std::atomic with explicitly relaxed access.
Sunil Joshi March 3, 2021

I think the lock orders it. Signaled is only changed when the lock is held. await_ready is just an optimisation anyway. If await_ready returns false, await_suspend will be called. That is protected by the lock. And signalled is again checked. So even if set and await_ready race, the lock ensures the correct outcome.
- Jason Cobb March 5, 2021
  
  If the program has a data race, you can’t say it just works; the presence of a data race is UB. You cannot reason about the behavior of a program when a data race is present. And it doesn’t matter if signaled is only changed by one thread concurrently if it can be read while it’s being changed; that’s still a data race.
- Akash Bagh March 3, 2021
  
  This comment has been deleted.

Discussion is closed. Login to edit/delete existing comments.

Jason Cobb March 3, 2021

I think there’s a potential data race on signaled. If await_ready() and set() execute concurrently, there can be a concurrent read and write to signaled, giving UB.
- Raymond Chen Author March 16, 2021
  
  You’re right. Changed to a std::atomic with explicitly relaxed access.
- Sunil Joshi March 3, 2021
  
  I think the lock orders it. Signaled is only changed when the lock is held. await_ready is just an optimisation anyway. If await_ready returns false, await_suspend will be called. That is protected by the lock. And signalled is again checked. So even if set and await_ready race, the lock ensures the correct outcome.
  - Jason Cobb March 5, 2021
    
    If the program has a data race, you can’t say it just works; the presence of a data race is UB. You cannot reason about the behavior of a program when a data race is present. And it doesn’t matter if signaled is only changed by one thread concurrently if it can be read while it’s being changed; that’s still a data race.
  - Akash Bagh March 3, 2021
    
    This comment has been deleted.