The history of Win32 critical sections so far

Raymond Chen

Raymond

The CRITICAL_SECTION structure has gone through
a lot of changes since its introduction back oh so many decades ago.
The amazing thing is that as long as you stick to the documented API,
your code is completely unaffected.

Initially, the critical section object had an owner field
to keep track of which thread entered the critical section, if any.
It also had
a lock count to keep track of how many times the owner thread
entered the critical section, so that the critical section would
be released when the matching number of
Leave­Critical­Section calls was made.
And there was an auto-reset event used to manage contention.
We’ll look more at that event later.
(It’s actually more complicated than this, but the details
aren’t important.)

If you’ve ever looked at the innards of a critical section
(for entertainment purposes only),
you may have noticed that the lock count was off by one:
The lock count was the number of active calls to
Enter­Critical­Section minus one.
The bias was needed because the original version of the
interlocked increment and decrement operations

returned only the sign of the result, not the revised value
.
Biasing the result by 1 means that all three states could be detected:
Unlocked (negative), locked exactly once (zero), reentrant lock (positive).
(It’s actually more complicated than this, but the details
aren’t important.)

If a thread tries to enter a critical section but can’t
because the critical section is owned by another thread,
then it sits and waits on the contention event.
When the owning thread releases all its claims on the critical section,
it signals the event to say,
“Okay, the door is unlocked.
The next guy can come in.”

The contention event is allocated only when contention occurs.
(This is what older versions of MSDN meant when they said that
the event is “allocated on demand.”)
Which leads to a nasty problem:
What if contention occurs,
but the attempt to create the contention event fails?
Originally,
the answer was “The kernel raises an out-of-memory exception.”

Now you’d think that a clever program could catch this exception
and try to recover from it, say, by unwinding everything that led
up to the exception.
Unfortunately, the weakest link in the chain is the critical section
object itself:
In the original version of the code,
the out-of-memory exception was raised while the critical section was
in an unstable state.
Even if you managed to catch the exception and unwind everything you could,
the critical section was itself irretrievably corrupted.

Another problem with the original design became apparent on multiprocessor
systems: If a critical section was typically held for a very brief time,
then by the time you called into kernel to wait on the contention event,
the critical section was already freed!

void SetGuid(REFGUID guid)
{
 EnterCriticalSection(&g_csGuidUpdate);
 g_theGuid = guid;
 LeaveCriticalSection(&g_csGuidUpdate);
}
void GetGuid(GUID *pguid)
{
 EnterCriticalSection(&g_csGuidUpdate);
 *pguid = g_theGuid;
 LeaveCriticalSection(&g_csGuidUpdate);
}

This imaginary code uses a critical section to protect accesses
to a GUID.
The actual protected region is just nine instructions long.
Setting up to wait on a kernel object is way,
way more than nine instructions.
If the second thread immediately waited on the critical section
contention event,
it would find that by the time the kernel got around to entering
the wait state, the event would say,
“Dude, what took you so long? I was signaleded, like, a bazillion
cycles ago!”

Windows 2000 added the
Initialize­Critical­Section­And­Spin­Count
function,
which lets you avoid the problem where waiting for a critical section
costs more than the code the critical section was protecting.
If you initialize with a spin count, then when a thread tries to
enter the critical section and can’t,
it goes into a loop trying to enter it over and over again,
in the hopes that it will be released.

“Are we there yet?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?”

If the critical section is not released after the requested number
of iterations,
then the old slow wait code is executed.

Note that spinning on a critical section is helpful only on
multiprocessor systems,
and only in the case where you know that all the protected code
segments are very short in duration.
If the critical section is held for a long time,
then spinning is wasteful since the odds that the critical section
will become free during the

spin cycle

are very low,
and you wasted a bunch of CPU.

Another feature added in Windows 2000 is the ability to
preallocate the contention event.
This avoids the dreaded
“out of memory” exception in
Enter­Critical­Section,
but at a cost of a kernel event for every critical section,
whether actual contention occurs or not.

Windows XP solved the problem of the dreaded “out of memory”
exception by using a fallback algorithm that could be used
if the contention event could not be allocated.
The fallback algorithm is not as efficient, but at least
it avoids the “out of memory” situation.
(Which is a good thing, because nobody really expects
Enter­Critical­Section to fail.)
This also means that requests for the contention event to be preallocated
are now ignored, since the reason for preallocating
(avoiding the “out of memory” exception) no longer exists.

(And while they were there, the kernel folks also fixed
Initialize­Critical­Section so that
a failed initialization left the critical section object in
a stable state so you could safely try again later.)

In Windows Vista, the internals of the critical section object
were rejiggered once again,
this time to add convoy resistance.

The internal bookkeeping completely changed
;
the lock count is no longer a 1-biased count of the number of
Enter­Critical­Section calls which are pending.
As a special concession to backward compatibility with people
who violated the API contract and

looked directly at the internal data structures
,
the new algorithm goes to some extra effort to ensure that
if a program breaks the rules and
looks at a specific offset inside the critical section
object,
the value stored there is −1 if and only if the critical section
is unlocked.

Often, people will remark that “your compatibility problems would go
away if you just open-sourced the operating system.”
I think there is some confusion over what “go away” means.
If you release the source code to the operating system,
it makes it even easier for people to take undocumented
dependencies on it,
because they no longer have the barrier of “Well, I can’t find any
documentation, so maybe it’s not documented.”
They can just read the source code and say,
“Oh, I see that if the critical section is unlocked,
the Lock­Count variable has the value −1.”
Boom, instant undocumented dependency.
Compatibility is screwed.
(Unless what people are saying “your compatibility problems would
go away if you just open-sourced all applications,
so that these problems can be identified and fixed as soon as they
are discovered.”)

Exercise:
Why isn’t it important that the fallback algorithm be highly efficient?

Raymond Chen
Raymond Chen

Follow Raymond