The Old New Thing

# Lock-free algorithms: The one-time initialization

Raymond Chen

A special case of the singleton constructor is simply lazy-initializing a bunch of variables. In a single-threaded application you can do something like this:

```// suppose that any valid values for a and b stipulate that
// a ≥ 0 and b ≥ a. Therefore, -1 is never a valid value,
// and we use it to mean "not yet initialized".
int a = -1, b = -1;
void LazyInitialize()
{
if (a != -1) return; // initialized already
a = calculate_nominal_a();
b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
a = max(0, a);
b = max(a, b);
}
```

This works fine in a single-threaded program, but if the program is multi-threaded, then two threads might end up trying to lazy-initialize the variables, and there are race conditions which can result in one thread using values before they have been initialized:

`if (a != -1)` [not taken]
`a = calculate_nominal_a();` // returns 2
`if (a != -1) return;` // premature return!

Observe that if the first thread is pre-empted after the value for `a` is initially set, the second thread will think that everything is initialized and may end up using an uninitialized `b`.

“Aha,” you say, “that’s easy to fix. Instead of `a`, I’ll just use `b` to tell if initialization is complete.”

```void LazyInitialize()
{
if (b != -1) return; // initialized already (test b, not a)
a = calculate_nominal_a();
b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
a = max(0, a);
b = max(a, b);
}
```

This still suffers from a race condition:

`if (b != -1)` [not taken]
`a = calculate_nominal_a();` // returns 2
`b = calculate_nominal_b();` // returns 1
`if (b != -1) return;` // premature return!

“I can fix that too. I’ll just compute the values of `a` and `b` in local variables, and update the globals only after the final values have been computed. That way, the second thread won’t see partially-calculated values.”

```void LazyInitialize()
{
if (b != -1) return; // initialized already
// perform all calculations in temporary variables first
int temp_a = calculate_nominal_a();
int temp_b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
temp_a = max(0, temp_a);
temp_b = max(temp_a, temp_b);
// make the temporary values permanent
a = temp_a;
b = temp_b;
}
```

Nearly there, but there is still a race condition:

`if (b != -1)` [not taken]
`temp_a = calculate_nominal_a();` // returns 2
`temp_b = calculate_nominal_b();` // returns 1
`temp_a = max(0, temp_a);` // temp_a = 2
`temp_b = max(temp_a, temp_b);` // temp_b = 2
`a = temp_a;` // store issued to memory
`b = temp_b;` // store issued to memory
store of `b` completes to memory
`if (b != -1) return;` // premature return!
store of `a` completes to memory

There is no guarantee that the assignment `b = 2` will become visible to other processors after the assignment `a = 2`. Even though the store operations are issued in that order, the memory management circuitry might complete the memory operations in the opposite order, resulting in Thread 2 seeing `a = -1` and `b = 2`.

To prevent this from happening, the store to `b` must be performed with Release semantics, indicating that all prior memory stores must complete before the store to `b` can be made visible to other processors.

```void LazyInitialize()
{
if (b != -1) return; // initialized already
// perform all calculations in temporary variables first
int temp_a = calculate_nominal_a();
int temp_b = calculate_nominal_b();
// Adjust the values to conform to our constraint.
temp_a = max(0, temp_a);
temp_b = max(temp_a, temp_b);
// make the temporary values permanent
a = temp_a;
// since we use "b" as our indication that
// initialization is complete, we must store it last,
// and we must use release semantics.
InterlockedCompareExchangeRelease(
reinterpret_cast<LONG*>&b, temp_b, -1);
}
```

If you look at the final result, you see that this is pretty much a re-derivation of the singleton initialization pattern: Do a bunch of calculations off to the side, then publish the result with a single `Interlocked­Compare­Exchange­Release` operation.

The general pattern is therefore

```void LazyInitializePattern()
{
if (global_signal_variable != sentinel_value) return;
... calculate values into local variables ...
globalvariable1 = temp_variable1;
globalvariable2 = temp_variable2;
...
globalvariableN = temp_variableN;
// publish the signal variable last, and with release
// semantics to ensure earlier values are visible as well
InterlockedCompareExchangeRelease(
reinterpret_cast<LONG*>&global_signal_variable,
temp_signal_variable, sentinel_value);
}
```

If this all is too much for you (and given some of the subtlety of double-check-locking that I messed up the first time through, it’s clearly too much for me), you can let the Windows kernel team do the thinking and use the one-time initialization functions, which encapsulate all of this logic. (My pal Doron called out the one-time initialization functions a while back.) Version 4 of the .NET Framework has corresponding functionality in the `Lazy<T>` class.

Exercise: What hidden assumptions are being made about the functions `calculate_nominal_a` and `calculate_nominal_b`?

Exercise: What are the consequences if we use `Interlocked­Exchange` instead of `Interlocked­Compare­Exchange­Release`?

Exercise: In the final version of `Lazy­Initialize`, are the variables `temp_a` and `temp_b` really necessary, or are they just leftovers from previous attempts at fixing the race condition?

Exercise: What changes (if any) are necessary to the above pattern if the global variables are pointers? Floating point variables?

Update: See discussion below between Niall and Anon regarding the need for acquire semantics on the initial read.