August 23rd, 2024

What if I need to wait for more than MAXIMUM_WAIT_OBJECTS threads?

A customer had a test that created a lot of threads, and they wanted their test to wait for all of the threads to exit before proceeding to the next step. However, the number of threads exceeded the maximum number of handles, more than MAXIMUM_WAIT_OBJECTS of them, so what is the best way to wait for all of them if they can’t do a single Wait­For­Multiple­Objects?

The customer noted that the documentation had a few suggestions. One is to divide the objects into groups of size at most MAXIMUM_WAIT_OBJECTS and for each group, create a thread to call Wait­For­Multiple­Objects. Another suggestion is to call Register­Wait­For­Single­Object on each handle.

The customer thought these approaches were unnecessarily complicated. What about just dividing the objects into groups of size at most MAXIMUM_WAIT_OBJECTS and just going into a loop calling Wait­For­Multiple­Objects on each group? “Is there some subtlety that we’re missing?”

Process handles and thread handles have the property that waits on them are idempotent. Waiting on a process or thread handle waits for the process or thread to exit, but it has no effect on the process or thread itself. This is different from some other types of handles: Waiting on semaphores, mutexes, and auto-reset events has side effects: Consuming a semaphore token, taking ownership of the mutex, and resetting an auto-reset event.

Furthermore, process and thread handles also have the property that once they become signaled, they never become unsignaled. This means that once you have successfully waited on them to become signaled, you don’t have to worry about the possibility that in the future, they might not be signaled any more.

Therefore, if you are waiting for a group of process and thread handles all to be signaled, you have the liberty to wait for them in any order and not require the special behavior of Wait­For­Multiple­Objects where it doesn’t create any wait side-effects until all the objects become signaled simultaneously.

So yes, you can wait for them in blocks of MAXIMUM_WAIT_OBJECTS. But really, even that is too much work. You can just wait for them one at a time.

for (auto&& handle : m_threadHandles)
{
    REQUIRE(WaitForSingleObject(handle, INFINITE)
            == WAIT_OBJECT_0);
}
Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Discussion is closed. Login to edit/delete existing comments.

  • Acc Reg

    Performance question here, Didn’t it will costs more NtWaitForSingleObject system call instead of one NtWaitForMultipleObjects?

  • Jan Ringoš · Edited

    This post has finally pushed me to investigate how the Vista+ Thread Pool manages to (on Windows 8+) wait for more than 64 events on a single thread. I've been wondering if I can reuse the underlying tech (NT API) to use it with custom I/O Completion Port, outside of the system Thread Pool. It turns out it's pretty simple.

    Is there any particular reason this functionality haven't been lifted to Win32 API for general use?Read more

    • Luca Bacci · Edited

      I believe there's a good reason for the MAXIMUM_WAIT_OBJECTS limit. The system doesn't know what arguments have changed between two consecutive calls to WaitForMultipleObjects(Ex); you may have just added (or removed) one HANDLE, or you may have replaced ALL HANDLEs. As such the system has to scan every passed HANDLE in each invocation. In addition the system has to remove the waits on return and re-arm them on the next call (because some waits have...

      Read more
      • 紅樓鍮

        operates like and on Unix, and the limitations you mentioned are also limitations of and . Problem is, , Linux's rough equivalent to IOCP, supports most types of file descriptors that exist on Linux, including eventfd; IOCP on the other hand only supports a small set of operations that are mostly just reading, writing and accepting connections, and that's despite Jan's discovery that the NT kernel apparently does contain all necessary...

        Read more
      • Jan Ringoš · Edited

        The reasoning on the limit makes sense. Which makes it even more curious why there’s no API that would allow applications to be more efficient. And yes, assigning handles to IOCP solves it only partially, as it, for example, can’t handle acquiring Mutexes.

  • 紅樓鍮 · Edited

    When I first learned about threads in C++ it took me a bit of time to wrap my head around the fact that, given , you can simply call to wait for all of them to finish despite the fact that the threads themselves may finish in any order. Of course, with even explicitly calling has become unnecessary.

    Read more