I was experiencing occasional crashes in C++/WinRT’s resume_foreground
function when it tries to resume execution on a dispatcher queue. Here’s a simplified version of that function:
auto resume_foreground(DispatcherQueue const& dispatcher) { struct awaitable { DispatcherQueue m_dispatcher; bool m_queued = false; bool await_ready() { return false; } bool await_suspend(coroutine_handle<> handle) { m_queued = m_dispatcher.TryEnqueue([handle] { handle(); }); return m_queued; } bool await_resume() { return m_queued; } }; return awaitable{ dispatcher }; }
All you need to know about the DispatcherÂQueue
object is that the TryÂEnqueue
method takes a delegate and schedules it to run on the dispatcher queue’s thread. If it is unable to do so (say, because the thread has already exited), then the function returns false
. The return value of the TryÂEnqueue
method is the result of the co_await
.
Let’s walk through how this function is intended to work.
The resume_
foreground
method returns an object that acts as its own awaiter. When a co_await
occurs, the coroutine first calls await_ready
, which returns false
, meaning “Go ahead and suspend me.”
Next, the coroutine calls await_suspend
. This method tries to queue the resumption of the coroutine onto the dispatcher thread and remembers whether it succeeded in the m_queued
member variable.
Returning the value of m_queued
means that if the continuation was successfully scheduled (true
), the coroutine remains suspended until it is resumed when the handle is invoked. On the other hand, if the continuation was not successfully scheduled (false
), then the suspension is abandoned, and execution resumes immediately on the same thread.
Either way, when the coroutine resumes, it is told whether the rescheduling onto the dispatcher thread succeeded.
Okay, now that you see how it is intended to work, can you spot the defect?
This code violates one of the rules we gave when we were getting started with awaitable objects: Once you arrange for the handle
to be called, you cannot access any member variables because the coroutine may have resumed before async_suspend
finishes.
And that’s what’s happening here: The dispatcher queue is running the lambda even before the async_suspend
can save the answer into m_queued
. As a result, the code crashes (if you’re lucky) or corrupts memory (if you’re not).
So we need to make sure the lambda doesn’t race ahead of async_
.suspend
Next time, we’ll make our first attempt to fix this.
(The fact that I call it our first attempt gives you a clue that it may take more than one try.)
I dealt with this problem while wrapping coroutines around I/O in C#. What would happen is that I would begin an OVERLAPPED operation but the completed OVERLAPPED would get posted to the thread pool before I even received the continuation delegate from the async builder state machine! This made the awaiter logic more complicated.
The way I solved it was to have an Interlocked.Decrement(...) countdown variable inside the awaiter initialized to 2 if the API...
And someone was wondering why I might want to do EnterCriticalSection from one thread and release it in another.
async IO appears to be a good reason.
And no comments show up on this article w/o being logged in.
This feels similar to IoMarkIrpPending, STATUS_PENDING, and PendingReturned in WDM drivers. Conditionally transferring ownership to another execution agent.
How about this kind of solution:
/* In case the lambda is dequeued and run
* before TryEnqueue even returns. */
m_queued = true;
if (m_dispatcher.TryEnqueue([handle]
...