{"id":11963,"date":"2010-12-20T07:00:01","date_gmt":"2010-12-20T07:00:01","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2010\/12\/20\/developing-the-method-for-taking-advantage-of-the-fact-that-the-overlapped-associated-with-asynchronous-io-is-passed-by-address\/"},"modified":"2010-12-20T07:00:01","modified_gmt":"2010-12-20T07:00:01","slug":"developing-the-method-for-taking-advantage-of-the-fact-that-the-overlapped-associated-with-asynchronous-io-is-passed-by-address","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20101220-01\/?p=11963","title":{"rendered":"Developing the method for taking advantage of the fact that the OVERLAPPED associated with asynchronous I\/O is passed by address"},"content":{"rendered":"<p>\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2010\/12\/17\/10106259.aspx\">\nYou can take advantage of the fact that the\n<code>OVERLAPPED<\/code> associated with asynchronous I\/O is\npassed by address<\/a>,\nbut there was some confusion about how this technique could\n&#8220;work&#8221; when kernel mode has no idea that you are playing this trick.\n<\/p>\n<p>\nWhether kernel mode is in on the trick is immaterial since it is not\npart of the trick.\n<\/p>\n<p>\nLet&#8217;s start with a version of the code which does not take advantage\nof the <code>OVERLAPPED<\/code> structure address in the way\ndescribed in the article.\nThis is a technique I found in a book on advanced Windows programming:\n<\/p>\n<pre>\n#define MAX_OVERLAPPED 10 \/\/ let's do 10 I\/O's at a time\n\/\/ data to associate with each OVERLAPPED\nstruct OTHERDATA { ... };\nOVERLAPPED MasterOverlapped[MAX_OVERLAPPED];\nOTHERDATA OtherData[MAX_OVERLAPPED];\nOTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped)\n{\n ptrdiff_t index = lpOverlapped - MasterOverlapped;\n return &amp;OtherData[index];\n}\n\/\/ I\/O is issued via\n\/\/ ReadFileEx(hFile, lpBuffer, nNumberOfBytesToRead,\n\/\/            &amp;MasterOverlapped[i], CompletionRoutine);\nvoid CALLBACK CompletionRoutine(\n    DWORD dwErrorCode,\n    DWORD dwNumberOfBytesTransferred,\n    LPOVERLAPPED lpOverlapped)\n{\n OTHERDATA *lpOtherData =\n                       FindOtherDataFromOverlapped(lpOverlapped);\n ... do stuff with lpOverlapped and lpOtherData ...\n}\n<\/pre>\n<p>\nThis version of the code uses the address of the\n<code>OVERLAPPED<\/code> structure to determine the\nlocation in the <code>MasterOverlapped<\/code> table\nand uses the corresponding entry in the parallel array\nat <code>OtherData<\/code> to hold the other data.\n<\/p>\n<p>\nLet&#8217;s make this code worse before we make it better:\n<\/p>\n<pre>\nOTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped)\n{\n for (int index = 0; index &lt; MAX_OVERLAPPED; index++) {\n  if (&amp;MasterOverlapped[index] == lpOverlapped) {\n   return &amp;OtherData[index];\n  }\n }\n FatalError(); \/\/ should never be reached\n}\n<\/pre>\n<p>\nInstead of doing simple pointer arithmetic to recover\nthe index, we walk the array testing the pointers.\nThis is naturally worse than doing pointer arithmetic, but\nwatch what this step allows us to do:\nFirst, we reorganize the data so that instead of two\nparallel arrays, we have a single array of a compound\nstructure.\n<\/p>\n<pre>\nstruct OVERLAPPEDEX\n{\n OVERLAPPED Overlapped;\n OTHERDATA OtherData;\n};\nOVERLAPPEDEX Master[MAX_OVERLAPPED];\nOTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped)\n{\n for (int index = 0; index &lt; MAX_OVERLAPPED; index++) {\n  if (&amp;Master[index].Overlapped == lpOverlapped) {\n   return &amp;Master[index].OtherData;\n  }\n }\n FatalError(); \/\/ should never be reached\n}\n\/\/ I\/O is issued via\n\/\/ ReadFileEx(hFile, lpBuffer, nNumberOfBytesToRead,\n\/\/            &amp;Master[i].Overlapped, CompletionRoutine);\n<\/pre>\n<p>\nAll we did was consolidate the parallel arrays into a single array.\n<\/p>\n<p>\nNow that it&#8217;s an array of compound structures, we don&#8217;t need\nto carry two pointers around (one to the <code>OVERLAPPED<\/code>\nand one to the <code>OTHERDATA<\/code>).\nWe can just use a single <code>OVERLAPPEDEX<\/code> pointer\nand dereference either the <code>Overlapped<\/code>\nor the <code>OtherData<\/code> part.\n<\/p>\n<pre>\nOVERLAPPEDEX* FindOverlappedExFromOverlapped(\n    OVERLAPPED *lpOverlapped)\n{\n for (int index = 0; index &lt; MAX_OVERLAPPED; index++) {\n  if (&amp;Master[index].Overlapped == lpOverlapped) {\n   return &amp;Master[index];\n  }\n }\n FatalError(); \/\/ should never be reached\n}\nvoid CALLBACK CompletionRoutine(\n    DWORD dwErrorCode,\n    DWORD dwNumberOfBytesTransferred,\n    LPOVERLAPPED lpOverlapped)\n{\n    OVELRAPPEDEX *lpOverlappedEx =\n                    FindOverlappedExFromOverlapped(lpOverlapped);\n    ... do stuff with lpOverlappedEx ...\n}\n<\/pre>\n<p>\nFinally, we can optimize the\n<code>FindOverlappedExFromOverlapped<\/code> function\nthat we de-optimized earlier.\nObserve that the de-optimized loop is an example of the\n&#8220;for\/if&#8221; anti-pattern.\n<\/p>\n<p>\nThe &#8220;for\/if&#8221; anti-pattern goes like this:\n<\/p>\n<pre>\nfor (int i = 0; i &lt; 100; i++) {\n if (i == 42) do_something(i);\n}\n<\/pre>\n<p>\nThis can naturally be simplified to\n<\/p>\n<pre>\ndo_something(42);\n<\/pre>\n<p>\nOur\n<code>FindOverlappedExFromOverlapped<\/code> function\nis a special case of this anti-pattern.\nIt becomes more evident if we do some rewriting.\nStart with\n<\/p>\n<pre>\n&amp;Master[index].Overlapped == lpOverlapped\n<\/pre>\n<p>\nApply <code>CONTAINING_RECORD<\/code> to both sides.\n<\/p>\n<pre>\nCONTAINING_RECORD(&amp;Master[index].Overlapped, OVERLAPPEDEX, Overlapped) ==\n    CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)\n<\/pre>\n<p>\nThe left-hand side of the comparison simplifies to\n<\/p>\n<pre>\n    &amp;Master[index]\n<\/pre>\n<p>\nresulting in\n<\/p>\n<pre>\n&amp;Master[index] ==\n   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)\n<\/pre>\n<p>\nRecall that <code>a[b]<\/code> is equivalent to <code>*(a+b)<\/code>,\nand therefore <code>&amp;a[b]<\/code> is equivalent to <code>a+b<\/code>.\n<\/p>\n<pre>\nMaster + index ==\n   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)\n<\/pre>\n<p>Now subtract <code>Master<\/code> from both sides:<\/p>\n<pre>\nindex == CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master\n<\/pre>\n<p>\nWe have transformed the test into a clear case of the\nfor\/if anti-pattern,\nand the function can be simplified to\n<\/p>\n<pre>\nOVERLAPPEDEX* FindOverlappedExFromOverlapped(\n    OVERLAPPED *lpOverlapped)\n{\n ptrdiff_t index =\n   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master;\n return &amp;Master[index];\n}\n<\/pre>\n<p>\nAgain, rewrite <code>&amp;a[b]<\/code> as <code>a+b<\/code>:\n<\/p>\n<pre>\n return Master + index;\n<\/pre>\n<p>\nSubstitute the value of <code>index<\/code> computed on the previous\nline:\n<\/p>\n<pre>\n return Master +\n   CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master;\n<\/pre>\n<p>\nThe two occurrences of <code>Master<\/code> cancel out, leaving\n<\/p>\n<pre>\nOVERLAPPEDEX* FindOverlappedExFromOverlapped(\n    OVERLAPPED *lpOverlapped)\n{\n return CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped);\n}\n<\/pre>\n<p>\nAnd there you have it.\nBy a series of purely mechanical transformations,\nwe have rediscovered the technique of\nextending the <code>OVERLAPPED<\/code> structure.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You can take advantage of the fact that the OVERLAPPED associated with asynchronous I\/O is passed by address, but there was some confusion about how this technique could &#8220;work&#8221; when kernel mode has no idea that you are playing this trick. Whether kernel mode is in on the trick is immaterial since it is not [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-11963","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>You can take advantage of the fact that the OVERLAPPED associated with asynchronous I\/O is passed by address, but there was some confusion about how this technique could &#8220;work&#8221; when kernel mode has no idea that you are playing this trick. Whether kernel mode is in on the trick is immaterial since it is not [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/11963","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=11963"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/11963\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=11963"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=11963"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=11963"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}