{"id":11613,"date":"2011-02-02T07:00:00","date_gmt":"2011-02-02T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2011\/02\/02\/ready-cancel-wait-for-it-part-1\/"},"modified":"2011-02-02T07:00:00","modified_gmt":"2011-02-02T07:00:00","slug":"ready-cancel-wait-for-it-part-1","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20110202-00\/?p=11613","title":{"rendered":"Ready&#8230; cancel&#8230; wait for it! (part 1)"},"content":{"rendered":"<p>\nOne of the cardinal rules of the <code>OVERLAPPED<\/code>\nstructure is <i>the <code>OVERLAPPED<\/code> structure\nmust remain valid until the I\/O completes<\/i>.\nThe reason is that the <code>OVERLAPPED<\/code> structure\nis\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2010\/12\/17\/10106259.aspx\">\nmanipulated by address rather than by value<\/a>.\n<\/p>\n<p>\nThe word <i>complete<\/i> here has a specific technical meaning.\nIt doesn&#8217;t mean &#8220;must remain valid until you are no longer interested\nin the result of the I\/O.&#8221;\nIt means that the structure must remain valid until\nthe I\/O subsystem has signaled that the I\/O operation\nis finally over, that there is nothing left to do,\nit has passed on:\nYou have an ex-I\/O operation.\n<\/p>\n<p>\nNote that\nan I\/O operation can complete successfully, or it can\ncomplete unsuccessfully.\nCompletion is not the same as success.\n<\/p>\n<p>\nA common mistake when performing overlapped I\/O\nis issuing a cancel and immediately freeing the <code>OVERLAPPED<\/code>\nstructure.\nFor example:\n<\/p>\n<pre>\n<i>\/\/ this code is wrong\n HANDLE h = ...; \/\/ handle to file opened as FILE_FLAG_OVERLAPPED\n OVERLAPPED o;\n BYTE buffer[1024];\n InitializeOverlapped(&amp;o); \/\/ creates the event etc\n if (ReadFile(h, buffer, sizeof(buffer), NULL, &amp;o) ||\n     GetLastError() == ERROR_IO_PENDING) {\n  if (WaitForSingleObject(o.hEvent, 1000) != WAIT_OBJECT_0) {\n   \/\/ took longer than 1 second - cancel it and give up\n   CancelIo(h);\n   return WAIT_TIMEOUT;\n  }\n  ... use the results ...\n }\n ...<\/i>\n<\/pre>\n<p>\nThe bug here is that after calling <code>Cancel&shy;Io<\/code>,\nthe function returns without waiting for the <code>Read&shy;File<\/code>\nto complete.\nReturning from the function\nimplicitly frees the automatic variable <code>o<\/code>.\nWhen the <code>Read&shy;File<\/code> finally completes, the I\/O system\nis now writing to stack memory that has been freed and is probably\nbeing reused by another function.\nThe result is impossible to debug:\nFirst of all, it&#8217;s a race condition between your code and the I\/O\nsubsystem, and breaking into the debugger <i>doesn&#8217;t stop the\nI\/O subsystem<\/i>.\nIf you step through the code, you don&#8217;t see the corruption,\nbecause the I\/O completes <i>while you&#8217;re broken into the debugger<\/i>.\n<\/p>\n<p>\nHere&#8217;s what happens when the program is run outside the debugger:\n<\/p>\n<table BORDER=\"0\">\n<tbody>\n<tr>\n<td>ReadFile<\/td>\n<td>&rarr;<\/td>\n<td>I\/O begins<\/td>\n<\/tr>\n<tr>\n<td>WaitForSingleObject<\/td>\n<td><\/td>\n<td>I\/O still in progress<\/td>\n<\/tr>\n<tr>\n<td>WaitForSingleObject times out<\/td>\n<\/tr>\n<tr>\n<td>CancelIo<\/td>\n<td>&rarr;<\/td>\n<td>I\/O cancellation submitted to device driver<\/td>\n<\/tr>\n<tr>\n<td>return<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td>Device driver was busy reading from the hard drive<br \/>\n        Device driver receives the cancellation<br \/>\n        Device driver abandons the rest of the read operation<br \/>\n        Device driver reports that I\/O has been canceled<br \/>\n        I\/O subsystem writes <code>STATUS_CANCELED<\/code>\n        to <code>OVERLAPPED<\/code> structure<br \/>\n        I\/O subsystem queues the completion function (if applicable)<br \/>\n        I\/O subsystem signals the completion event (if applicable)<br \/>\n        I\/O operation is now complete<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\nWhen the I\/O subsystem receives word from the device driver\nthat the cancellation has completed,\nit performs the usual operations when an I\/O operation completes:\nIt updates the <code>OVERLAPPED<\/code> structure with the results\nof the I\/O operation, and notifies whoever wanted to be notified\nthat the I\/O is finished.\n<\/p>\n<p>\nNotice that when it updates the <code>OVERLAPPED<\/code> structure,\nit&#8217;s updating memory that has already been freed back to the stack,\nwhich means that it&#8217;s corrupting the stack of whatever function\nhappens to be running right now.\n(It&#8217;s even worse if you happened to catch it while it was in the\nprocess of updating the <code>buffer<\/code>!)\nSince the precise timing of I\/O is unpredictable,\nthe program crashes with memory corruption that keeps changing\neach time it happens.\n<\/p>\n<p>\nIf you try to debug the program, you get this:\n<\/p>\n<table BORDER=\"0\">\n<tbody>\n<tr>\n<td>ReadFile<\/td>\n<td>&rarr;<\/td>\n<td>I\/O begins<\/td>\n<\/tr>\n<tr>\n<td>WaitForSingleObject<\/td>\n<td><\/td>\n<td>I\/O still in progress<\/td>\n<\/tr>\n<tr>\n<td>WaitForSingleObject times out<\/td>\n<\/tr>\n<tr>\n<td>Breakpoint hit on <code>Cancel&shy;Io<\/code> statement<br \/>\n        Stops in debugger<\/td>\n<\/tr>\n<tr>\n<td>Hit F10 to step over the CancelIo call<\/td>\n<td>&rarr;<\/td>\n<td>I\/O cancellation submitted to device driver<\/td>\n<\/tr>\n<tr>\n<td>Breakpoint hit on <code>return<\/code> statement<br \/>\n        Stops in debugger<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td>Device driver was busy reading from the hard drive<br \/>\n        Device driver receives the cancellation<br \/>\n        Device driver abandons the rest of the read operation<br \/>\n        Device driver reports that I\/O has been canceled<br \/>\n        I\/O subsystem writes <code>STATUS_CANCELED<\/code>\n        to <code>OVERLAPPED<\/code> structure<br \/>\n        I\/O subsystem queues the completion function (if applicable)<br \/>\n        I\/O subsystem signals the completion event (if applicable)<br \/>\n        I\/O operation is now complete<\/td>\n<\/tr>\n<tr>\n<td>Look at the <code>OVERLAPPED<\/code> structure in the debugger<br \/>\n        It says <code>STATUS_CANCELED<\/code><\/td>\n<\/tr>\n<tr>\n<td>Hit F5 to resume execution<br \/>\n        No memory corruption<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\nBreaking into the debugger changed the timing of the I\/O operation\nrelative to program execution.\nNow, the I\/O completes before the function returns,\nand consequently there is no memory corruption.\nYou look at the <code>OVERLAPPED<\/code> structure and say,\n&#8220;See? Immediately on return from the <code>Cancel&shy;Io<\/code> function,\nthe <code>OVERLAPPED<\/code> structure has been updated with the result,\nand the <code>buffer<\/code> contents are not being written to.\nIt&#8217;s safe to free them both now.\nTherefore, this can&#8217;t be the source of my memory corruption bug.&#8221;\n<\/p>\n<p>\nExcept, of course, that it is.\n<\/p>\n<p>\nThis is even more crazily insidious because the <code>OVERLAPPED<\/code>\nstructure and the <code>buffer<\/code> are\nupdated by the I\/O subsystem, which means that it happens\n<i>from kernel mode<\/i>.\nThis means that\n<a HREF=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2008\/05\/09\/8475735.aspx\">\nwrite breakpoints set by your debugger won&#8217;t fire<\/a>.\nEven if you manage to narrow down the corruption to\n&#8220;it happens somewhere in this function&#8221;,\nyour breakpoints will never see it as it happens.\nYou&#8217;re going to see that the value was good,\nthen a little while later, the value was bad,\nand yet your write breakpoint never fired.\nYou&#8217;re then going to declare that the world has gone mad\nand seriously consider a different line of work.\n<\/p>\n<p>\nTo fix this race condition,\nyou have to delay freeing the <code>OVERLAPPED<\/code> structure\nand the associated <code>buffer<\/code>\nuntil the I\/O is complete and anything else that&#8217;s using them\nhas also given up their claim to it.\n<\/p>\n<pre>\n   \/\/ took longer than 1 second - cancel it and give up\n   CancelIo(h);\n   <font COLOR=\"blue\">WaitForSingleObject(o.hEvent, INFINITE); \/\/ added\n   \/\/ Alternatively: GetOverlappedResult(h, &amp;o, TRUE);<\/font>\n   return WAIT_TIMEOUT;\n<\/pre>\n<p>\nThe <code>Wait&shy;For&shy;Single&shy;Object<\/code> after the\n<code>Cancel&shy;Io<\/code>\nwaits for the I\/O to complete\nbefore finally returning (and implicitly freeing the <code>OVERLAPPED<\/code>\nstructure and the <code>buffer<\/code> on the stack).\nBetter would be to use\n<code>GetOverlapped&shy;Result<\/code>\nwith <code>bWait&nbsp;=&nbsp;TRUE<\/code>,\nbecause that also handles the case where the <code>hEvent<\/code>\nmember of the <code>OVERLAPPED<\/code> structure is <code>NULL<\/code>.\n<\/p>\n<p>\n<b>Exercise<\/b>:\nIf you retrieve the completion status\nafter canceling the I\/O\n(either by looking at the <code>OVERLAPPED<\/code> structure\ndirectly or by using <code>GetOverlapped&shy;Result<\/code>)\nthere&#8217;s a chance that the overlapped result\nwill be something other than <code>STATUS_CANCELED<\/code>\n(or <code>ERROR_CANCELLED<\/code> if you prefer Win32 error codes).\nExplain.\n<\/p>\n<p>\n<b>Exercise<\/b>:\nIf this example had used <code>Read&shy;File&shy;Ex<\/code>,\nthe proposed fix would be incomplete.\nExplain and provide a fix.\nAnswer to come next time, and then we&#8217;ll look at another\nversion of this same principle.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the cardinal rules of the OVERLAPPED structure is the OVERLAPPED structure must remain valid until the I\/O completes. The reason is that the OVERLAPPED structure is manipulated by address rather than by value. The word complete here has a specific technical meaning. It doesn&#8217;t mean &#8220;must remain valid until you are no longer [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-11613","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>One of the cardinal rules of the OVERLAPPED structure is the OVERLAPPED structure must remain valid until the I\/O completes. The reason is that the OVERLAPPED structure is manipulated by address rather than by value. The word complete here has a specific technical meaning. It doesn&#8217;t mean &#8220;must remain valid until you are no longer [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/11613","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=11613"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/11613\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=11613"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=11613"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=11613"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}