A special case of the singleton constructor is simply lazy-initializing a bunch of variables. In a single-threaded application you can do something like this:
// suppose that any valid values for a and b stipulate that
// a ≥ 0 and b ≥ a. Therefore, -1 is never a valid value,
// and we use it to mean "not yet initialized".
int a = -1, b = -1;
void LazyInitialize()
{
if (a != -1) return; // initialized already
a = calculate_nominal_a();
b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
a = max(0, a);
b = max(a, b);
}
This works fine in a single-threaded program, but if the program is multi-threaded, then two threads might end up trying to lazy-initialize the variables, and there are race conditions which can result in one thread using values before they have been initialized:
| Thread 1 | Thread 2 |
|---|---|
if (a != -1) [not taken] |
|
a = calculate_nominal_a(); // returns 2 |
|
if (a != -1) return; // premature return! |
Observe that if the first thread is pre-empted after
the value for a is initially set,
the second thread will think that everything is initialized
and may end up using an uninitialized b.
“Aha,” you say, “that’s easy to fix.
Instead of a,
I’ll just use b to tell if initialization is complete.”
void LazyInitialize()
{
if (b != -1) return; // initialized already (test b, not a)
a = calculate_nominal_a();
b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
a = max(0, a);
b = max(a, b);
}
This still suffers from a race condition:
| Thread 1 | Thread 2 |
|---|---|
if (b != -1) [not taken] |
|
a = calculate_nominal_a(); // returns 2 |
|
b = calculate_nominal_b(); // returns 1 |
|
if (b != -1) return; // premature return! |
“I can fix that too.
I’ll just compute the values of a and b
in local variables, and update the globals only after the final
values have been computed.
That way, the second thread won’t see partially-calculated values.”
void LazyInitialize()
{
if (b != -1) return; // initialized already
// perform all calculations in temporary variables first
int temp_a = calculate_nominal_a();
int temp_b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
temp_a = max(0, temp_a);
temp_b = max(temp_a, temp_b);
// make the temporary values permanent
a = temp_a;
b = temp_b;
}
Nearly there, but there is still a race condition:
| Thread 1 | Thread 2 |
|---|---|
if (b != -1) [not taken] |
|
temp_a = calculate_nominal_a(); // returns 2 |
|
temp_b = calculate_nominal_b(); // returns 1 |
|
temp_a = max(0, temp_a); // temp_a = 2 |
|
temp_b = max(temp_a, temp_b); // temp_b = 2 |
|
a = temp_a; // store issued to memory |
|
b = temp_b; // store issued to memory |
|
store of b completes to memory |
|
if (b != -1) return; // premature return! |
|
store of a completes to memory |
There is no guarantee that the assignment b = 2 will
become visible to other processors after the assignment
a = 2.
Even though the store operations are issued in that order,
the memory management circuitry might
complete the memory operations in the opposite order,
resulting in Thread 2 seeing a = -1 and b = 2.
To prevent this from happening, the store to b must
be performed with
Release semantics,
indicating that all prior memory stores must complete before
the store to b can be made visible to other processors.
void LazyInitialize()
{
if (b != -1) return; // initialized already
// perform all calculations in temporary variables first
int temp_a = calculate_nominal_a();
int temp_b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
temp_a = max(0, temp_a);
temp_b = max(temp_a, temp_b);
// make the temporary values permanent
a = temp_a;
// since we use "b" as our indication that
// initialization is complete, we must store it last,
// and we must use release semantics.
InterlockedCompareExchangeRelease(
reinterpret_cast<LONG*>&b, temp_b, -1);
}
If you look at the final result,
you see that this is pretty much a re-derivation of the
singleton initialization pattern:
Do a bunch of calculations off to the side, then
publish the result with a single
InterlockedCompareExchangeRelease
operation.
The general pattern is therefore
void LazyInitializePattern()
{
if (global_signal_variable != sentinel_value) return;
... calculate values into local variables ...
globalvariable1 = temp_variable1;
globalvariable2 = temp_variable2;
...
globalvariableN = temp_variableN;
// publish the signal variable last, and with release
// semantics to ensure earlier values are visible as well
InterlockedCompareExchangeRelease(
reinterpret_cast<LONG*>&global_signal_variable,
temp_signal_variable, sentinel_value);
}
If this all is too much for you
(and given some of the subtlety of double-check-locking
that I messed up the first time through,
it’s clearly too much for me),
you can let the Windows kernel team do the thinking
and use the
one-time initialization functions,
which encapsulate all of this logic.
(My pal
Doron
called out the one-time initialization functions
a while back.)
Version 4 of the .NET Framework has corresponding functionality
in the
Lazy<T> class.
Exercise:
What hidden assumptions are being made about the functions
calculate_nominal_a and
calculate_nominal_b?
Exercise:
What are the consequences if we use
InterlockedExchange
instead of InterlockedCompareExchangeRelease?
Exercise:
In the final version of LazyInitialize, are the variables
temp_a and temp_b really necessary,
or are they just leftovers from previous attempts at fixing
the race condition?
Exercise: What changes (if any) are necessary to the above pattern if the global variables are pointers? Floating point variables?
Update: See discussion below between Niall and Anon regarding the need for acquire semantics on the initial read.
0 comments