August 23rd, 2024

What if I need to wait for more than `MAXIMUM_WAIT_OBJECTS` threads?

Raymond Chen

A customer had a test that created a lot of threads, and they wanted their test to wait for all of the threads to exit before proceeding to the next step. However, the number of threads exceeded the maximum number of handles, more than MAXIMUM_WAIT_OBJECTS of them, so what is the best way to wait for all of them if they can’t do a single WaitForMultipleObjects?

The customer noted that the documentation had a few suggestions. One is to divide the objects into groups of size at most MAXIMUM_WAIT_OBJECTS and for each group, create a thread to call WaitForMultipleObjects. Another suggestion is to call RegisterWaitForSingleObject on each handle.

The customer thought these approaches were unnecessarily complicated. What about just dividing the objects into groups of size at most MAXIMUM_WAIT_OBJECTS and just going into a loop calling WaitForMultipleObjects on each group? “Is there some subtlety that we’re missing?”

Process handles and thread handles have the property that waits on them are idempotent. Waiting on a process or thread handle waits for the process or thread to exit, but it has no effect on the process or thread itself. This is different from some other types of handles: Waiting on semaphores, mutexes, and auto-reset events has side effects: Consuming a semaphore token, taking ownership of the mutex, and resetting an auto-reset event.

Furthermore, process and thread handles also have the property that once they become signaled, they never become unsignaled. This means that once you have successfully waited on them to become signaled, you don’t have to worry about the possibility that in the future, they might not be signaled any more.

Therefore, if you are waiting for a group of process and thread handles all to be signaled, you have the liberty to wait for them in any order and not require the special behavior of WaitForMultipleObjects where it doesn’t create any wait side-effects until all the objects become signaled simultaneously.

So yes, you can wait for them in blocks of MAXIMUM_WAIT_OBJECTS. But really, even that is too much work. You can just wait for them one at a time.

for (auto&& handle : m_threadHandles)
{
    REQUIRE(WaitForSingleObject(handle, INFINITE)
            == WAIT_OBJECT_0);
}

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :

Newest

Newest Popular Oldest

Acc Reg August 27, 2024 0

Performance question here, Didn’t it will costs more NtWaitForSingleObject system call instead of one NtWaitForMultipleObjects?
Jan Ringoš August 24, 2024 · Edited 1

This post has finally pushed me to investigate how the Vista+ Thread Pool manages to (on Windows 8+) wait for more than 64 events on a single thread. I've been wondering if I can reuse the underlying tech (NT API) to use it with custom I/O Completion Port, outside of the system Thread Pool. It turns out it's pretty simple.

Is there any particular reason this functionality haven't been lifted to Win32 API for general use?Read more
This post has finally pushed me to investigate how the Vista+ Thread Pool manages to (on Windows 8+) wait for more than 64 events on a single thread. I’ve been wondering if I can reuse the underlying tech (NT API) to use it with custom I/O Completion Port, outside of the system Thread Pool. It turns out it’s pretty simple.

Is there any particular reason this functionality haven’t been lifted to Win32 API for general use?
People are still battling with the MAXIMUM_WAIT_OBJECTS limit.

Read less
- Luca Bacci August 25, 2024 · Edited 1
  
  I believe there's a good reason for the MAXIMUM_WAIT_OBJECTS limit. The system doesn't know what arguments have changed between two consecutive calls to WaitForMultipleObjects(Ex); you may have just added (or removed) one HANDLE, or you may have replaced ALL HANDLEs. As such the system has to scan every passed HANDLE in each invocation. In addition the system has to remove the waits on return and re-arm them on the next call (because some waits have...
  Read more
  I believe there’s a good reason for the MAXIMUM_WAIT_OBJECTS limit. The system doesn’t know what arguments have changed between two consecutive calls to WaitForMultipleObjects(Ex); you may have just added (or removed) one HANDLE, or you may have replaced ALL HANDLEs. As such the system has to scan every passed HANDLE in each invocation. In addition the system has to remove the waits on return and re-arm them on the next call (because some waits have side-effects, e.g mutexes). All of that is costly and to lower the overhead you spread the arguments in batches and make multiple parallel invocations.
  
  With IOCP on the other hand you push wait items and the system pops them on completion. The system has to do much less work that way. That said, IOCPs are different in even more ways, like being able to wake up one thread only from a set of waiting threads.
  
  Read less
  - Acc Reg August 27, 2024 0
    
    Why you would want to do two consecutive calls of WaitForMultipleObjects(Ex)? Isn’t Kernel handles are not reference-counted?
  - 紅樓鍮 August 26, 2024 1
    
    operates like and on Unix, and the limitations you mentioned are also limitations of and . Problem is, , Linux's rough equivalent to IOCP, supports most types of file descriptors that exist on Linux, including eventfd; IOCP on the other hand only supports a small set of operations that are mostly just reading, writing and accepting connections, and that's despite Jan's discovery that the NT kernel apparently does contain all necessary...
    Read more
    WaitForMultipleObjects operates like select and poll on Unix, and the limitations you mentioned are also limitations of select and poll. Problem is, epoll, Linux’s rough equivalent to IOCP, supports most types of file descriptors that exist on Linux, including eventfd; IOCP on the other hand only supports a small set of operations that are mostly just reading, writing and accepting connections, and that’s despite Jan’s discovery that the NT kernel apparently does contain all necessary facilities to use events and mutexes with IOCP after all.
    
    Read less
  - Jan Ringoš August 25, 2024 · Edited 1
    
    The reasoning on the limit makes sense. Which makes it even more curious why there’s no API that would allow applications to be more efficient. And yes, assigning handles to IOCP solves it only partially, as it, for example, can’t handle acquiring Mutexes.
紅樓鍮 August 23, 2024 · Edited 0

When I first learned about threads in C++ it took me a bit of time to wrap my head around the fact that, given , you can simply call to wait for all of them to finish despite the fact that the threads themselves may finish in any order. Of course, with even explicitly calling has become unnecessary.

Read more
When I first learned about threads in C++ it took me a bit of time to wrap my head around the fact that, given std::thread t(…), u(…), v(…);, you can simply call t.join(); u.join(); v.join(); to wait for all of them to finish despite the fact that the threads themselves may finish in any order. Of course, with std::jthread even explicitly calling join() has become unnecessary.

Read less