Why not just share a single event across all critical section?

Raymond Chen

Neil Rashbrook wonder why there isn’t a single event that all critical sections share. When a critical section is exited, and there is a waiting thread, then “signal all threads waiting on critical sections”. One thread will get the critical section, and threads that are waiting for unrelated critical sections will just think that they lost a nonexistent race and go back to sleep.

There are a few problems with this idea.

The first problem is specific to the way Windows kernel events work. How would you “signal all threads waiting on critical sections” with a kernel event? If you use an automatic-reset event, then only one thread will wake. In order to wake all threads, you need a manual-reset event. But how do you know when to reset the event? You want to reset the event once all threads have woken up, but you have no way of knowing when that has happened.

You might think of using PulseEvent and then realize that you have no easy way of closing the race condition between a thread realizing that it needs to go to sleep and the actual sleep. You would have to use something like SignalObjectAndWait, but that means that every critical section acquisition would need to take a kernel mutex so it could prevent the owner from pulsing the event before the waiter could reach the WaitForSingleObject on the kernel event. But if you’re going to take a kernel mutex on every acquisition, then you destroyed the purpose of the critical section, which is to be a lightweight alternative to a kernel mutex! You may as well just use a kernel mutex as your critical section.

And all that is on top of the fact that PulseEvent is fundamentally flawed.

Now, maybe you could come up with an alternative to a kernel mutex and a kernel event. Say, a global slim reader-writer lock and a paired condition variable. The condition variable lets you notify_all in a reliable way, and then every thread checks whether their critical section is available. Could we use that for our critical section design?

I guess you could do that, but you wouldn’t want to. Waking every thread that’s waiting for a critical section, even though you know that only one will succeed, is the definition of a thundering herd problem. You take a lot of context switches, lose a lot of CPU cache, page in a lot of memory, and waste a lot of CPU time with all the pointless work. You really want to wake just one candidate thread and leave the others sleeping. That’s why our critical section built out of WaitOnAddress uses WakeByAddressSingle.

Note that the candidate thread you wake up may not actually succeded at claiming the critical section, because another thread may sneak in and claim the critical section before your candidate can wake up. Although this sounds unfair, unfairness is actually a feature, because it avoids lock convoys.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Discussion is closed. Login to edit/delete existing comments.

Dzmitry Konanka March 24, 2020

i guess here you could use SuspendThread
EnterCriticalSection(cs)
{
for (;;) {
DWORD prev = InterlockedCompareExchange(&cs->locker, GetCurrentThreadId(), 0);
if (prev == 0 || prev == GetCurrentThreadId()) break;
HANDLE wh;
if (DupicateHandle(GetCurrentProcess(), GetCurrentThread(), GetCurrentProcess(), &wh, THREAD_SUSPEND_RESUME)) {
if (AddIntoWaiters(cs, wh)) {
SuspendThread(GetCurrentThread());

} else { // last resort if running in out-of-memory condition
CloseHandle(wh);
Sleep(10);
}

} else { // last resort if...
Read more
i guess here you could use SuspendThread
EnterCriticalSection(cs)
{
for (;;) {
DWORD prev = InterlockedCompareExchange(&cs->locker, GetCurrentThreadId(), 0);
if (prev == 0 || prev == GetCurrentThreadId()) break;
HANDLE wh;
if (DupicateHandle(GetCurrentProcess(), GetCurrentThread(), GetCurrentProcess(), &wh, THREAD_SUSPEND_RESUME)) {
if (AddIntoWaiters(cs, wh)) {
SuspendThread(GetCurrentThread());

} else { // last resort if running in out-of-memory condition
CloseHandle(wh);
Sleep(10);
}

} else { // last resort if running in out-of-handles condition
Sleep(10);
}
}
cs->locks++;
}

LeaveCriticalSection(cs)
{
cs->locks–;
if (cs->locks == 0) {
DWORD prev = InterlockedExchange(&cs->locker, 0);
assert(prev == GetCurrentThreadId());
HANDLE wh = FetchWaiter(cs);
if (wh != INVALID_HANDLE_VALUE) {
while (ResumeThread(wh) == 0) Sleep(0);
CloseHandle(wh);
}
} else {
assert(cs->locks > 0);
}
}

where AddIntoWaiters/FetchWaiter manages internal per-critical section lock-free list

Read less
- Joshua Hudson March 27, 2020
  
  Actually that’s pretty clever, and a good use of SuspendThread for once. Would need a little more work to be production-ready mainly due to there’s a much better way to handle compare-exchange for single-linked lists.
  - Dzmitry Konanka March 27, 2020
    
    You mean putting waiters list entry on stack frame of EnterCriticalSection (to avoid dealing with dynamic allocations)? Or something else?
    There is another thing to optimize however - instead of DupicateHandle/CloseHandle each time - open it once per thread and cache in TLS variable (and clean it up with FlsAlloc'ed callback). Or if we're part of OS reserve space for cached thread handle in TEB.
    Also note that this approach provides predictable 'fair' fifo scheduling and allows to easily (and 'for free') implement prioritization. Consider EnterCriticalSectionQuickAsPossible that will put current thread into head of waiters list, and 'usual' EnterCriticalSection will...
    Read more
    You mean putting waiters list entry on stack frame of EnterCriticalSection (to avoid dealing with dynamic allocations)? Or something else?
    There is another thing to optimize however – instead of DupicateHandle/CloseHandle each time – open it once per thread and cache in TLS variable (and clean it up with FlsAlloc’ed callback). Or if we’re part of OS reserve space for cached thread handle in TEB.
    Also note that this approach provides predictable ‘fair’ fifo scheduling and allows to easily (and ‘for free’) implement prioritization. Consider EnterCriticalSectionQuickAsPossible that will put current thread into head of waiters list, and ‘usual’ EnterCriticalSection will put at its end.
    
    Read less
Ian Boyd March 24, 2020

My concern with the proposal that you wake up everyone waiting on a critical section: is that it might wake up everyone waiting on a critical section.
Neil Rashbrook March 24, 2020

Wow, I had no idea that threads spend so much time waiting on critical sections.
- smf March 26, 2020
  
  It depends on what they are doing, or not doing.
  
  If they have nothing to do, then they will be waiting.
  If they have something to do that involves talking to something else then they are waiting.
  
  Most threads are waiting most of the time.
  - Neil Rashbrook March 27, 2020
    
    I do realise that most threads spend most of their time waiting, but surely they’re not mostly waiting on critical sections? There are plenty of other synchronisation primitives for them to wait on.

Why not just share a single event across all critical section?

Author

7 comments

Read next

Accessing a member of a Windows Runtime class raises an `InvalidCastException` / throws a `hresult_no_interface`, what does this mean?

How can I check whether a Windows Runtime object supports a member before I try to use it?

Author

7 comments

Read next

Accessing a member of a Windows Runtime class raises an Invalid­Cast­Exception / throws a hresult_no_interface, what does this mean?

How can I check whether a Windows Runtime object supports a member before I try to use it?

Stay informed

Accessing a member of a Windows Runtime class raises an `InvalidCastException` / throws a `hresult_no_interface`, what does this mean?