{"id":28733,"date":"2021-10-15T15:00:12","date_gmt":"2021-10-15T15:00:12","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=28733"},"modified":"2021-10-15T10:42:32","modified_gmt":"2021-10-15T10:42:32","slug":"a-race-condition-in-net-finalization-and-its-mitigation-for-cpp-cli","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/a-race-condition-in-net-finalization-and-its-mitigation-for-cpp-cli\/","title":{"rendered":"A Race Condition in .NET Finalization and its Mitigation for C++\/CLI"},"content":{"rendered":"<h2>Abstract<\/h2>\n<blockquote><p>There is a dormant race condition in <code>.NET<\/code> which affects even single threaded code when finalizers are executed. The cause is primarily the fact that finalizers are called on a separate thread by <code>.NET<\/code> and may access objects which have already been garbage collected due to aggressive lifetime determination by the <code>.NET<\/code> JIT compiler in newer versions of the <code>.NET<\/code> runtime.\nA solution for this problem, in the form of automatic generation of calls to <code>System::GC::KeepAlive<\/code>, has been implemented in the Microsoft C++ compiler and is available in version 16.10 and later.<\/p><\/blockquote>\n<h2>Introduction<\/h2>\n<p>C++\/CLI is primarily meant to be an interop language bridging the native\nand <code>.NET<\/code> worlds efficiently. Consequently, a frequently occuring code\npattern is wrapping of native pointers in managed classes. E.g.<\/p>\n<pre><code class=\"language-cpp\">class NativeClass { ... };\r\nref class ManagedClass {\r\n    ...\r\nprivate:\r\n    NativeClass* ptr;\r\n};<\/code><\/pre>\n<p>Often, the managed wrapper class will <code>new<\/code> an instance of\n<code>NativeClass<\/code>, which controls and accesses a system resource (e.g. a\nfile), uses the resources and to make sure that the resource is properly\nreleased back, delegates this task to the finalizer. Elaborating the\nabove example, we could have code like:<\/p>\n<pre><code class=\"language-cpp\"> 1  using Byte = System::Byte;\r\n 2  using String = System::String^;\r\n 3  using Char = System::Char;\r\n 4\r\n 5  class File {\r\n 6      FILE*   fp;\r\n 7  public:\r\n 8      explicit File(const Char* path, const Char* mode)\r\n 9      {\r\n10          fp = _wfopen(path, mode);\r\n11      }\r\n12      void Read() { ... }\r\n13      void Write(const void*, size_t) { ... }\r\n14      void Seek() { ... }\r\n15      void Close()\r\n16      {\r\n17          if (fp) {\r\n18              fclose(fp); fp = nullptr;\r\n19          }\r\n20      }\r\n21      ~File() { Close(); }\r\n22  };\r\n\r\n26   ref class DataOnDisk\r\n27   {\r\n28   public:\r\n29       DataOnDisk(String path, String mode)\r\n30       {\r\n31           cli::pin_ptr&lt;const Char&gt; path_ptr = PtrToStringChars(path);\r\n32           cli::pin_ptr&lt;const Char&gt; mode_ptr = PtrToStringChars(mode);\r\n33           ptr = new File(path_ptr, mode_ptr);\r\n34       }\r\n35       ~DataOnDisk() { this-&gt;!DataOnDisk(); }\r\n36       !DataOnDisk()\r\n37       {\r\n38           if (ptr) {\r\n39               delete ptr; ptr = nullptr;\r\n40           }\r\n41       }\r\n42       void Close() { this-&gt;!DataOnDisk(); }\r\n43       void WriteData(array&lt;Byte&gt;^ data) { ... }\r\n44   private:\r\n45       File*           ptr;  \/\/ Pointer to native implementation class.\r\n46   };<\/code><\/pre>\n<p>In the above code, class <code>File<\/code> controls the actual file via the native\nC++ interface, while <code>DataOnDisk<\/code> uses the native class to read\/write\nstructured data to file (details have been omitted for clarity). While\n<code>Close<\/code> can be called explicitly when there is no more use for the file,\nthe finalizer is meant to do this when the <code>DataOnDisk<\/code> object is\ncollected.<\/p>\n<p>As we shall see in the following section, while the above code appears\ncorrect, there is a hidden race condition that can cause program errors.<\/p>\n<h2>Race Condition<\/h2>\n<p>Let us define the member <code>WriteData<\/code> from the above code<\/p>\n<pre><code class=\"language-cpp\">49  void DataOnDisk::WriteData(array&lt;Byte&gt;^ buffer)\r\n50  {\r\n51      pin_ptr&lt;Byte&gt; buffer_ptr = &amp;buffer[0];\r\n52      this-&gt;ptr-&gt;Write(buffer_ptr, buffer-&gt;Length);\r\n53  } <\/code><\/pre>\n<p>This function itself might be called in this context:<\/p>\n<pre><code class=\"language-cpp\">55  void test_write()\r\n56  {\r\n57      DataOnDisk^ dd = gcnew DataOnDisk(...);\r\n58      array&lt;Byte&gt;^ buf = make_test_data();\r\n59      dd-&gt;WriteData(buf);\r\n60  } <\/code><\/pre>\n<p>So far, nothing catches the eye or looks remotely dangerous. Starting\nfrom <code>test_write<\/code>, let us examine what happens in detail.<\/p>\n<ol>\n<li>A <code>DataOnDisk<\/code> object is created (line 57), some test data is\ncreated and <code>WriteData<\/code> is called to write this data to file (line\n59).<\/li>\n<li>The <code>WriteData<\/code> carefully pins the buffer array object (line 51)\nbefore taking the address of an element and calling the <code>Write<\/code>\nmember function of the underlying native <code>File<\/code> object. The pinning\nis important because we don&#8217;t want <code>.NET<\/code>\u00a0to move the buffer bytes\nwhile the write is happening.<\/li>\n<li>However, since the <code>.NET<\/code>\u00a0garbage collector knows nothing about\nnative types, the <code>ptr<\/code> field of <code>DataOnDisk<\/code> is just a bit pattern\nwith no other meaning attached. The <code>.NET<\/code>\u00a0JIT compiler has analyzed\nthe code and determined that the last use of the <code>dd<\/code> object is to\naccess <code>ptr<\/code> (line 52), before its value is passed as the implicit\nobject parameter of <code>File::Write<\/code>. Following this reasoning by the\nJIT compiler, once the value of <code>ptr<\/code> is fetched from the object,\n<em>the object <code>dd<\/code> is no longer needed<\/em> and becomes eligible for\ngarbage collection.The fact that <code>ptr<\/code> points to a live native\nobject is opaque to <code>.NET<\/code>\u00a0because it does not track native\npointers.<\/li>\n<li>From here onward, things can go wrong. The object <code>dd<\/code> is scheduled\nfor collection and as part of the process, the finalizer is run,\ntypically on a second thread. Now, we have potentially two things\nhappening at the same time without any ordering between them, a\nclassic race condition: the <code>Write<\/code> member function is executing and\nthe finalizer <code>!DataOnDisk<\/code> is executing as well, the latter will\n<code>delete<\/code> the file object referenced by <code>ptr<\/code> <em>while <code>File::Write<\/code> is\npossibly still running<\/em>, which can then result in a crash or other\nincorrect behavior.<\/li>\n<\/ol>\n<h2>Wait &#8212; Wha&#8230;?<\/h2>\n<p>Several questions immediately come to mind:<\/p>\n<ul>\n<li><em>Is this a new bug?<\/em> Yes &#8212; and no. The issue has potentially been\naround since <code>.NET<\/code>\u00a02.0.<\/li>\n<li><em>What changed?<\/em> The <code>.NET<\/code>\u00a0JIT compiler started being aggressive\nwith lifetime determination in <code>.NET<\/code>\u00a04.8. From the perspective of\nmanaged code, it is doing the right thing.<\/li>\n<li><em>But, this affects a core C++\/CLI native interop scenario. What can\nbe done?<\/em> Read on.<\/li>\n<\/ul>\n<h2>Solutions<\/h2>\n<p>It is easy to see that when the call to <code>Write<\/code> happens (line 52), if\n<code>this<\/code> is kept alive, the race condition disappears since <code>dd<\/code> will no\nlonger be collected before the call to <code>Write<\/code> returns. This could be\ndone in several different ways:<\/p>\n<ul>\n<li><em>Treat the change in the behavior of the JIT compiler as a bug and\nrevert back to old behavior.<\/em> Doing this requires a system update\nfor <code>.NET<\/code>\u00a0and potentially disables optimizations. Freezing the\n<code>.NET<\/code>\u00a0framework at version 4.7 is also an option but not one that\nwill work in the longer term, especially since the same JIT behavior\ncan happen in <code>.NET<\/code>\u00a0<code>Core<\/code> as well.<\/li>\n<li><em>Manually insert <code>System::GC::KeepAlive(this)<\/code> calls where needed<\/em>.\nThis works but is error prone and requires examining the user source\nand changing it, so this is not a viable solution for large source\nbases.<\/li>\n<li><em>Have the compiler inject <code>System::GC::KeepAlive(this)<\/code> calls, when\nneeded<\/em>. This is the solution we have implemented in the Microsoft\nC++ compiler.<\/li>\n<\/ul>\n<h2>Details<\/h2>\n<p>We could brute-force a solution by issuing a call to <code>KeepAlive<\/code> every\ntime we see a call to native function, but for performance reasons we\nwant to be more clever. We want to issue such calls where there is a\npossibility of a race condition but nowhere else. The following is the\nalgorithm that the Microsoft C++ compiler follows to determine if an\nimplicit <code>KeepAlive<\/code> call is to be issued at a point in the code where:<\/p>\n<ul>\n<li>We are at a return statement or implicit return from a member\nfunction of a managed class;<\/li>\n<li>The managed class has a member of type &#8216;reference or pointer to\nunmanaged type&#8217;, including members in its direct or indirect base\nclasses, or embedded in members of class-types occuring anywhere in\nthe class hierarchy;<\/li>\n<li>A call to a function <code>FUNC<\/code> is found in the current (managed member)\nfunction, which satisfies one or more of these conditions:<\/p>\n<ol>\n<li><code>FUNC<\/code> doesn&#8217;t have a <code>__clrcall<\/code> calling convention, or<\/li>\n<li><code>FUNC<\/code> doesn&#8217;t take <code>this<\/code> either as an implicit or explicit\nargument, or<\/li>\n<li>A reference to <code>this<\/code> doesn&#8217;t follow the call to <code>FUNC<\/code><\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<p>In essence, we are looking for indicators that show <code>this<\/code> is in no\ndanger of getting garbage collected during the call to <code>FUNC<\/code>. Hence, if\nthe above conditions are satisfied, we insert a\n<code>System::GC::KeepAlive(this)<\/code> call immediately following the call to\n<code>FUNC<\/code>. Even though a call to <code>KeepAlive<\/code> looks very much like a\nfunction call in the generated MSIL, the JIT compiler treats it as a\ndirective to consider the current object alive at that point.<\/p>\n<h2>How to get the fix<\/h2>\n<p>The above Microsoft C++ compiler behavior is <strong>on by default<\/strong> in Visual\nStudio <strong>version 16.10<\/strong> and up but in in cases where unforeseen\nproblems occur due to the new implicit emission of <code>KeepAlive<\/code> calls,\nthe Microsoft C++ compiler provides two escape hatches:<\/p>\n<ul>\n<li>the driver switch <code>\/clr:implicitKeepAlive-<\/code>, which turns off all\nsuch calls in the translation unit. This switch is not available in\nproject system settings but must be added explicitly to the\ncommand-line option list\n(<code>Property Pages &gt; Command Line &gt; Additional Options<\/code>).<\/li>\n<li><code>#pragma implicit_keepalive<\/code>, which provides fine-grained control\nover the emission of such calls at the function level.<\/li>\n<\/ul>\n<h2>A Final Nit<\/h2>\n<p>The astute reader will have noted that there is still a possible race\ncondition at line 39. To see why, imagine that both the finalizer thread\nand user code call the finalizer at the same time. The possibility of a\ndouble-delete in this case is obvious. Fixing this requires a critical\nsection but is beyond the scope of this article and left to the reader\nas an exercise.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Race conditions can occur in .NET finalization. This article explains the solution implemented in MSVC.<\/p>\n","protected":false},"author":67691,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-28733","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus"],"acf":[],"blog_post_summary":"<p>Race conditions can occur in .NET finalization. This article explains the solution implemented in MSVC.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/28733","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/67691"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=28733"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/28733\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=28733"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=28733"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=28733"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}