Understanding and mitigating a stack overflow in our task sequencer

A customer was using the v2 task_sequencer class we developed some time ago. (Here’s the v1 task sequencer.) They found that they occasionally suffered from stack overflow crashes.

QueueTaskAsync::<lambda_2>::operator()+0x714
std::coroutine_handle<void>::resume+0x6c
task_sequencer::chained_task::complete+0x88
task_sequencer::completer::~completer+0x58
QueueTaskAsync::<lambda_2>::operator()+0xaf8
std::coroutine_handle<void>::resume+0x6c
task_sequencer::chained_task::complete+0x88
task_sequencer::completer::~completer+0x58
QueueTaskAsync::<lambda_2>::operator()+0xaf8
std::coroutine_handle<void>::resume+0x6c
task_sequencer::chained_task::complete+0x88
task_sequencer::completer::~completer+0x58
QueueTaskAsync::<lambda_2>::operator()+0xaf8
std::coroutine_handle<void>::resume+0x6c
task_sequencer::chained_task::complete+0x88
task_sequencer::completer::~completer+0x58
QueueTaskAsync::<lambda_2>::operator()+0xaf8
std::coroutine_handle<void>::resume+0x6c
task_sequencer::chained_task::complete+0x88
task_sequencer::completer::~completer+0x58
...

Reading from the bottom up (to see the sequence chronologically), a coroutine completed, so we resumed the lambda coroutine inside QueueTaskAsync:

        auto task = [](auto&& current, auto&& makerParam,
                       auto&& contextParam, auto& suspend)
                    -> Async
        {
            completer completer{ std::move(current) };
            auto maker = std::move(makerParam);
            auto context = std::move(contextParam);

            co_await suspend;
            co_await context;
            co_return co_await maker();
        }(current, std::forward<Maker>(maker),
          winrt::apartment_context(), suspend);

When one completer destructs, it resumes the co_await suspend in this lambda. The lambda then switches to the correct thread (which we don’t see in the stack because we are already on the correct thread), asks the maker to start the next coroutine (which we don’t see on the stack because it returned), and then awaits that coroutine. We don’t see that coroutine on the stack, which means that it completed synchronously. And then that’s the end of the lambda, so we start the next one.

Therefore, we run into this problem if there is a sequence of queued tasks, where all those tasks complete synchronously.

So what can we do about it?

We could force the stack to unwind by throwing a co_await winrt::resume_background() into the lambda after the co_await suspend, so that the coroutine resumes on a background thread’s fresh stack, releasing the thread it was resumed on so it can unwind.

This does soak up a threadpool thread in the case that the apartment_context() is a single-threaded apartment, because the IContextCallback blocks the calling thread while the callback is running. Most people don’t worry about this problem, but I do because I’ve had to debug deadlocks that trace back to threadpool exhaustion because all the threads are just waiting for another thread to be ready or finish doing something.

The customer noted that the task_sequencer is always used from the same thread, which happens to be a UI thread. So we can give the task sequencer a DispatcherQueue that it can use to get back to the UI thread asynchronously via TryEnqueue().

struct task_sequencer
{
    task_sequencer(winrt::DispatcherQueue const& queue = nullptr)
        : m_queue(queue) {}                                      
    task_sequencer(const task_sequencer&) = delete;
    void operator=(const task_sequencer&) = delete;

private:
    using coro_handle = std::experimental::coroutine_handle<>;

    struct suspender
    {
        bool await_ready() const noexcept { return false; }
        void await_suspend(coro_handle h)
            noexcept { handle = h; }
        void await_resume() const noexcept { }

        coro_handle handle;
    };

    static void* completed()
    { return reinterpret_cast<void*>(1); }

    struct chained_task
    {
        chained_task(void* state = nullptr) : next(state) {}

        void continue_with(coro_handle h) {
            if (next.exchange(h.address(),
                        std::memory_order_acquire) != nullptr) {
                h();
            }
        }

        void complete() {
            auto resume = next.exchange(completed());
            if (resume) {
                coro_handle::from_address(resume).resume();
            }
        }

        std::atomic<void*> next;
    };

    struct completer
    {
        ~completer()
        {
            chain->complete();
        }
        std::shared_ptr<chained_task> chain;
    };

    winrt::slim_mutex m_mutex;
    std::shared_ptr<chained_task> m_latest =
        std::make_shared<chained_task>(completed());

public:
    template<typename Maker>
    auto QueueTaskAsync(Maker&& maker) ->decltype(maker())
    {
        auto node = std::make_shared<chained_task>();

        suspender suspend;

        using Async = decltype(maker());
        auto task = [&]() -> Async
        {
            completer completer{ current };
            auto local_maker = std::forward<Maker>(maker);
            auto local_queue = m_queue;

            co_await suspend;
            if (m_queue == nullptr) {                          
                co_await winrt::resume_background();           
            } else {                                           
                co_await winrt::resume_foreground(local_queue);
            }                                                  
            co_return co_await local_maker();
        }();

        {
            winrt::slim_lock_guard guard(m_mutex);
            m_latest.swap(node);
        }

        node->continue_with(suspend.handle);

        return task;
    }
};

You provide a DispatcherQueue when you create the task_sequencer, so that the task sequencer knows which thread the tasks should be started on. if you pass nullptr (or don’t bother to provide a parameter at all), then they start on a background thread. Otherwise, they start on the thread corresponding to the dispatcher queue.

1 comment

Discussion is closed. Login to edit/delete existing comments.

LB December 31, 2025

At this point I’d consider making a custom promise type and using symmetric transfer in the final suspend awaiter, since this lambda and RAII type stuff is starting to feel a bit spaghettified by these additional concerns. But, seeing as this is using the old experimental coroutine stuff, I guess symmetric transfer isn’t an option for this codebase…

Understanding and mitigating a stack overflow in our task sequencer

Author

1 comment

Read next

Swapping two blocks of memory that reside inside a larger block, in constant memory

How can you swap two adjacent blocks of memory using only forward iterators?

Author

1 comment

Read next

Swapping two blocks of memory that reside inside a larger block, in constant memory

How can you swap two adjacent blocks of memory using only forward iterators?

Stay informed