Abstract
There is a dormant race condition in
.NET
which affects even single threaded code when finalizers are executed. The cause is primarily the fact that finalizers are called on a separate thread by.NET
and may access objects which have already been garbage collected due to aggressive lifetime determination by the.NET
JIT compiler in newer versions of the.NET
runtime. A solution for this problem, in the form of automatic generation of calls toSystem::GC::KeepAlive
, has been implemented in the Microsoft C++ compiler and is available in version 16.10 and later.
Introduction
C++/CLI is primarily meant to be an interop language bridging the native
and .NET
worlds efficiently. Consequently, a frequently occuring code
pattern is wrapping of native pointers in managed classes. E.g.
class NativeClass { ... };
ref class ManagedClass {
...
private:
NativeClass* ptr;
};
Often, the managed wrapper class will new
an instance of
NativeClass
, which controls and accesses a system resource (e.g. a
file), uses the resources and to make sure that the resource is properly
released back, delegates this task to the finalizer. Elaborating the
above example, we could have code like:
1 using Byte = System::Byte;
2 using String = System::String^;
3 using Char = System::Char;
4
5 class File {
6 FILE* fp;
7 public:
8 explicit File(const Char* path, const Char* mode)
9 {
10 fp = _wfopen(path, mode);
11 }
12 void Read() { ... }
13 void Write(const void*, size_t) { ... }
14 void Seek() { ... }
15 void Close()
16 {
17 if (fp) {
18 fclose(fp); fp = nullptr;
19 }
20 }
21 ~File() { Close(); }
22 };
26 ref class DataOnDisk
27 {
28 public:
29 DataOnDisk(String path, String mode)
30 {
31 cli::pin_ptr<const Char> path_ptr = PtrToStringChars(path);
32 cli::pin_ptr<const Char> mode_ptr = PtrToStringChars(mode);
33 ptr = new File(path_ptr, mode_ptr);
34 }
35 ~DataOnDisk() { this->!DataOnDisk(); }
36 !DataOnDisk()
37 {
38 if (ptr) {
39 delete ptr; ptr = nullptr;
40 }
41 }
42 void Close() { this->!DataOnDisk(); }
43 void WriteData(array<Byte>^ data) { ... }
44 private:
45 File* ptr; // Pointer to native implementation class.
46 };
In the above code, class File
controls the actual file via the native
C++ interface, while DataOnDisk
uses the native class to read/write
structured data to file (details have been omitted for clarity). While
Close
can be called explicitly when there is no more use for the file,
the finalizer is meant to do this when the DataOnDisk
object is
collected.
As we shall see in the following section, while the above code appears correct, there is a hidden race condition that can cause program errors.
Race Condition
Let us define the member WriteData
from the above code
49 void DataOnDisk::WriteData(array<Byte>^ buffer)
50 {
51 pin_ptr<Byte> buffer_ptr = &buffer[0];
52 this->ptr->Write(buffer_ptr, buffer->Length);
53 }
This function itself might be called in this context:
55 void test_write()
56 {
57 DataOnDisk^ dd = gcnew DataOnDisk(...);
58 array<Byte>^ buf = make_test_data();
59 dd->WriteData(buf);
60 }
So far, nothing catches the eye or looks remotely dangerous. Starting
from test_write
, let us examine what happens in detail.
- A
DataOnDisk
object is created (line 57), some test data is created andWriteData
is called to write this data to file (line 59). - The
WriteData
carefully pins the buffer array object (line 51) before taking the address of an element and calling theWrite
member function of the underlying nativeFile
object. The pinning is important because we don’t want.NET
 to move the buffer bytes while the write is happening. - However, since the
.NET
 garbage collector knows nothing about native types, theptr
field ofDataOnDisk
is just a bit pattern with no other meaning attached. The.NET
 JIT compiler has analyzed the code and determined that the last use of thedd
object is to accessptr
(line 52), before its value is passed as the implicit object parameter ofFile::Write
. Following this reasoning by the JIT compiler, once the value ofptr
is fetched from the object, the objectdd
is no longer needed and becomes eligible for garbage collection.The fact thatptr
points to a live native object is opaque to.NET
 because it does not track native pointers. - From here onward, things can go wrong. The object
dd
is scheduled for collection and as part of the process, the finalizer is run, typically on a second thread. Now, we have potentially two things happening at the same time without any ordering between them, a classic race condition: theWrite
member function is executing and the finalizer!DataOnDisk
is executing as well, the latter willdelete
the file object referenced byptr
whileFile::Write
is possibly still running, which can then result in a crash or other incorrect behavior.
Wait — Wha…?
Several questions immediately come to mind:
- Is this a new bug? Yes — and no. The issue has potentially been
around since
.NET
 2.0. - What changed? The
.NET
 JIT compiler started being aggressive with lifetime determination in.NET
 4.8. From the perspective of managed code, it is doing the right thing. - But, this affects a core C++/CLI native interop scenario. What can be done? Read on.
Solutions
It is easy to see that when the call to Write
happens (line 52), if
this
is kept alive, the race condition disappears since dd
will no
longer be collected before the call to Write
returns. This could be
done in several different ways:
- Treat the change in the behavior of the JIT compiler as a bug and
revert back to old behavior. Doing this requires a system update
for
.NET
 and potentially disables optimizations. Freezing the.NET
 framework at version 4.7 is also an option but not one that will work in the longer term, especially since the same JIT behavior can happen in.NET
ÂCore
as well. - Manually insert
System::GC::KeepAlive(this)
calls where needed. This works but is error prone and requires examining the user source and changing it, so this is not a viable solution for large source bases. - Have the compiler inject
System::GC::KeepAlive(this)
calls, when needed. This is the solution we have implemented in the Microsoft C++ compiler.
Details
We could brute-force a solution by issuing a call to KeepAlive
every
time we see a call to native function, but for performance reasons we
want to be more clever. We want to issue such calls where there is a
possibility of a race condition but nowhere else. The following is the
algorithm that the Microsoft C++ compiler follows to determine if an
implicit KeepAlive
call is to be issued at a point in the code where:
- We are at a return statement or implicit return from a member function of a managed class;
- The managed class has a member of type ‘reference or pointer to unmanaged type’, including members in its direct or indirect base classes, or embedded in members of class-types occuring anywhere in the class hierarchy;
- A call to a function
FUNC
is found in the current (managed member) function, which satisfies one or more of these conditions:FUNC
doesn’t have a__clrcall
calling convention, orFUNC
doesn’t takethis
either as an implicit or explicit argument, or- A reference to
this
doesn’t follow the call toFUNC
In essence, we are looking for indicators that show this
is in no
danger of getting garbage collected during the call to FUNC
. Hence, if
the above conditions are satisfied, we insert a
System::GC::KeepAlive(this)
call immediately following the call to
FUNC
. Even though a call to KeepAlive
looks very much like a
function call in the generated MSIL, the JIT compiler treats it as a
directive to consider the current object alive at that point.
How to get the fix
The above Microsoft C++ compiler behavior is on by default in Visual
Studio version 16.10 and up but in in cases where unforeseen
problems occur due to the new implicit emission of KeepAlive
calls,
the Microsoft C++ compiler provides two escape hatches:
- the driver switch
/clr:implicitKeepAlive-
, which turns off all such calls in the translation unit. This switch is not available in project system settings but must be added explicitly to the command-line option list (Property Pages > Command Line > Additional Options
). #pragma implicit_keepalive
, which provides fine-grained control over the emission of such calls at the function level.
A Final Nit
The astute reader will have noted that there is still a possible race condition at line 39. To see why, imagine that both the finalizer thread and user code call the finalizer at the same time. The possibility of a double-delete in this case is obvious. Fixing this requires a critical section but is beyond the scope of this article and left to the reader as an exercise.
I advise to wrap native pointers in SafeHandle and then continue working with solely managed objects.
DataOnDisk::Close() should not call the finalizer directly, but rather 'delete this' to result in a call to IDisposable::Dispose(true) and GC::SuppressFinalize(this). This ensures deterministic cleanup of base classes and members of ref class type (as in R, not R^).
With regard to your final nit, the only direct call to the finalizer should be in the destructor. The destructor is called in the context of IDisposable::Dispose, which holds a reference to 'this' and suppresses finalization....
If you inserted at the end of the finalizer, would that fix the possible race condition? I think it would partially serialize the finalization:
- If user code calls the finalizer first, then the GC would not be able to call the finalizer before the call from user code reaches System::GC::KeepAlive(this). At that point, ptr would already be null, so the finalizer called by GC would do nothing.
- If the GC calls the finalizer first,...