September 9th, 2022

The case of the APC that never arrives

Raymond Chen

A customer encountered found that sometimes, their application hung in its clean up code. Here’s a simplified version.

bool ShuttingDown = false;

void MainThread()
{
    DWORD id;
    auto hThread = CreateThread(nullptr, 0, WorkerThread,
                                nullptr, 0, &id); // succeeds

    BlahBlahBlah(); // do useful work

    // Time to clean up. Post an APC to the worker thread
    // to tell it that it's time to go home.
    QueueUserAPC(WakeWorker, hThread, 0); // succeeds

    WaitForSingleObject(hThread, INFINITE); // hangs

    CloseHandle(hThread);
}

void CALLBACK WakeWorker(ULONG_PTR)
{
    ShuttingDown = true;
}

DWORD CALLBACK WorkerThread(void*)
{
    // Do work until shut down.
    do
    {
        // All work is posted via APCs.
        SleepEx(INFINITE, TRUE);
    } while (!ShuttingDown);

    return 0;
}

The idea is that the program has a worker thread to, y’know, do some work. All of the work items are posted via QueueUserAPC, and the worker thread simply calls SleepEx alertably, over and over again. Each call to SleepEx sleeps the thread until an APC is queued, at which point it returns (because the bAlertable parameter is TRUE).

One way of looking at this design is that it’s a sneaky way of making the operating system manage your work queue for you. Another way of looking at it is as a single-threaded I/O completion port.

Call it what you will.

Anyway, the problem is that the worker thread is stuck in SleepEx, as if it never got the WakeWorker APC that tells it to exit.

But is that really the problem?

The customer was able to get some full memory dumps of systems that got into this state, and the telling detail is that the ShuttingDown variable was set to true. So it wasn’t that the WakeWorker APC never arrived. It totally did arrive and set ShuttingDown to true. Yet somehow that didn’t wake up the SleepEx.

One of my colleagues offered the possibility that when the code entered the SleepEx loop, the WakeWorker APC had already run. If that’s the case, then the SleepEx is going to wait for an APC that has already arrived.

I was able to confirm with a test program that this was indeed a possibility.

DWORD apc = 0;

void CALLBACK WakeWorker(ULONG_PTR)
{
    apc = GetCurrentThreadId();
}

DWORD CALLBACK WorkerThread(void*)
{
    if (apc == GetCurrentThreadId()) DebugBreak();
    return 0;
}

int __cdecl main()
{
    DWORD id;
    while (true)
    {
        apc = 0;
        auto h = CreateThread(nullptr, 0, WorkerThread, nullptr, 0, &id);
        QueueUserAPC(WakeWorker, h, 0);
        WaitForSingleObject(h, INFINITE);
        CloseHandle(h);
        Sleep(10);
    }
    // notreached
}

This program creates a thread and immediately queues an APC to it. The thread procedure checks if the APC already ran on the same thread, and if so, it breaks into the debugger.

If you run this program, it breaks immediately. If you set a breakpoint on the WakeWorker function, you’ll see that it does indeed run on the thread before the WorkerThread function starts.

From the stack trace on that breakpoint, it appears that the kernel processes APCs during the early phases of thread creation, so if you had any queued APCs, they will get processed before the thread procedure even starts.

So that’s the bug: The ShuttingDown flag was already set by the time the thread procedure started, but the code assumes that it can only be set by an APC that is processed by the SleepEx.

The fix is simple: Change the loop from a do...while to a while, so that the test is at the top of the loop instead of at the bottom.

DWORD CALLBACK WorkerThread(void*)
{
    // Do work until shut down.
    while (!ShuttingDown)
    {
        // All work is posted via APCs.
        SleepEx(INFINITE, TRUE);
    }

    return 0;
}

The case where this happens is if the BlahBlahBlah() function returns very quickly, allowing the QueueUserAPC to win the race against the WorkerThread.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

18 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :

Newest

Newest Popular Oldest

Ismo Salonen September 11, 2022 0

How dow changing the loop to a “while().. ” help in general case ? There is still possible execution race ( no datarace as all runs in one thread) if there is alertable function between the test and SleepEx. That was my quess for the bug but the example did not give enough informarion to make judgement call.
- Harry Johnston September 14, 2022 1
  
  It wouldn’t really make sense to have any other code in that loop, but if you did, you’d have to check for the exit flag after every step that performs an alertable wait.
Leif Strand September 9, 2022 1

"From the stack trace on that breakpoint, it appears that the kernel processes APCs during the early phases of thread creation, so if you had any queued APCs, they will get processed before the thread procedure even starts."

Indeed, this is documented behavior:

"If an application queues an APC before the thread begins running, the thread begins by calling the APC function. After the thread calls an APC function, it calls the APC functions for all APCs in its APC queue."

-- https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-queueuserapc

My understanding is that this is precisely how the DllMain routines fire before main(), or before the thread start function: they...
Read more
“From the stack trace on that breakpoint, it appears that the kernel processes APCs during the early phases of thread creation, so if you had any queued APCs, they will get processed before the thread procedure even starts.”

Indeed, this is documented behavior:

“If an application queues an APC before the thread begins running, the thread begins by calling the APC function. After the thread calls an APC function, it calls the APC functions for all APCs in its APC queue.”

— https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-queueuserapc

My understanding is that this is precisely how the DllMain routines fire before main(), or before the thread start function: they are executed inside an APC that was queued before the process/thread starts.

Read less
- Henke37 September 10, 2022 0
  
  I feel like I read somewhere in the hundreds of pages that is Windows Internals that APCs are used during thread startup for uhm, something? I’d pull a quote, but that’s too many pages.
Joshua Hudson September 9, 2022 0

Wait how does that fix work? My first hypothesis was the APC woke up in the function prolog for SleepEx.
- Sunil Joshi September 9, 2022 0
  
  Before it sleeps, it checks whether shutdown is already set. If it’s already set then it doesn’t go to sleep.
  The bug is that it goes to sleep after shutdown is already set but without checking for that which means the APC has already run.
  The APC was run by the kernel before entering the thread procedure, as it appears the kernel drains user APCs before running the thread procedure.
  - Joshua Hudson September 9, 2022 0
    
    “The bug is that it goes to sleep after shutdown is already set but without checking for”
    
    This doesn’t answer the question. There’s still a time between checking shutdown and making the system call.
  - Sunil Joshi September 11, 2022 0
    
    This is a distinction between Win32 and Unix signal handlers; also, David Culter’s early OS VMS. Win32 does not believe in allowing user mode code to be interrupted by other user mode code running on the same thread. This is because it is almost impossible to reason about that sort of situation – are we holding the heap lock? Who knows? So, APCs only run when the thread announced it is ready for them. This way, it is possible to reason about re-entrance issues.
  - Louis Wilson September 9, 2022 2
    
    According to this MSDN page a user-mode APC only executes when the thread is in an “alertable state”, which is during SleepEx and WaitForMultipleObjectsEx & friends. It won’t fire between arbitrary instructions.
  - Neil Rashbrook September 10, 2022 0
    
    And this presumably why the customer thought their code was safe, because they hadn’t read the QueueUserAPC documentation well enough to notice that an APC can also execute during thread startup.
  - Joshua Hudson September 9, 2022 1
    
    And there’s the answer. You live you learn.
Adam Rosenfield September 9, 2022 0

ShuttingDown should also be declared volatile so that the compiler doesn’t “helpfully” optimize the code by hoisting it out of the loop, but that’s not the point of this article.
- MGetz September 9, 2022 · Edited 0
  
  In this case no is not only not necessary but may cause weird behavior, it's important to remember that is not atomic and does not solve data race issues. Using satisfies that use case but if you look at the code they are queuing the User APC onto the "owning" thread. Which means the code has no undefined behavior because it's inherently data race free. Could this in theory be improved by using an atomic? Potentially an acquire operation just incase the thread got moved cross CPU and the cache is not coherent but it should...
  Read more
  In this case no volatile is not only not necessary but may cause weird behavior, it’s important to remember that volatile is not atomic and does not solve data race issues. Using std::atomic_bool satisfies that use case but if you look at the code they are queuing the User APC onto the “owning” thread. Which means the code has no undefined behavior because it’s inherently data race free. Could this in theory be improved by using an atomic? Potentially an acquire operation just incase the thread got moved cross CPU and the cache is not coherent but it should be fine. I’d personally probably use the atomic with an acquire and release operations because I’m paranoid.
  
  Read less
  - Douglas Hill September 9, 2022 · Edited 0
    
    Actually, wouldn’t the APC be vaguely equivalent to a signal handler? In signal handlers, the only things C (and C++) give you safe access to are lock-free atomic types and `volatile sig_atomic_t`. The only atomic that’s guaranteed to be lock-free is `atomic_flag`, although Win32 probably gives you more.
  - Harry Johnston September 9, 2022 1
    
    No. From the program’s point of view the APC is called in a perfectly normal way by the SleepEx function. There are no restrictions analogous to those in a signal handler.
  - MGetz September 9, 2022 · Edited 1
    
    No, User mode APCs don't break execution like Unix signal handlers do, which is a major distinction. They can only be triggered on a thread that's in an "Alertable state" which is what the second parameter of does. It's closer to how apples grand central dispatch works honestly. You don't have to worry about clobbering an active stack like you do with signal handlers either. Inherently the thread you're running on was 'stopped' for all practical purposes until you got run. Also you can queue them directly onto a specific thread which this code does. I'm pretty sure Raymond...
    Read more
    No, User mode APCs don’t break execution like Unix signal handlers do, which is a major distinction. They can only be triggered on a thread that’s in an “Alertable state” which is what the second parameter of SleepEx does. It’s closer to how apples grand central dispatch works honestly. You don’t have to worry about clobbering an active stack like you do with signal handlers either. Inherently the thread you’re running on was ‘stopped’ for all practical purposes until you got run. Also you can queue them directly onto a specific thread which this code does. I’m pretty sure Raymond has covered cases where APC never run because the thread never went alertable. That said if you wanted to be very very sure you’d use atomic types as I mentioned. That means that when the well meaning programmer™️ comes in later and tries to use ShuttingDown cross thread where they shouldn’t be it won’t cause stupid bugs it will just be slower that it needs to be.
    
    Read less
  - Kalle Niemitalo September 12, 2022 · Edited 0
    
    There is now QueueUserAPC2 as a documented way to queue a special user-mode APC that does not require an alertable state.
  - Sunil Joshi September 9, 2022 1
    
    Also volatile on non-x86/x64 systems may cause poor performance as the compiler may try to keep things sequentially consistent when acquire / release will do.