You can take advantage of the fact that the
OVERLAPPED
associated with asynchronous I/O is
passed by address,
but there was some confusion about how this technique could
“work” when kernel mode has no idea that you are playing this trick.
Whether kernel mode is in on the trick is immaterial since it is not part of the trick.
Let’s start with a version of the code which does not take advantage
of the OVERLAPPED
structure address in the way
described in the article.
This is a technique I found in a book on advanced Windows programming:
#define MAX_OVERLAPPED 10 // let's do 10 I/O's at a time // data to associate with each OVERLAPPED struct OTHERDATA { ... }; OVERLAPPED MasterOverlapped[MAX_OVERLAPPED]; OTHERDATA OtherData[MAX_OVERLAPPED]; OTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped) { ptrdiff_t index = lpOverlapped - MasterOverlapped; return &OtherData[index]; } // I/O is issued via // ReadFileEx(hFile, lpBuffer, nNumberOfBytesToRead, // &MasterOverlapped[i], CompletionRoutine); void CALLBACK CompletionRoutine( DWORD dwErrorCode, DWORD dwNumberOfBytesTransferred, LPOVERLAPPED lpOverlapped) { OTHERDATA *lpOtherData = FindOtherDataFromOverlapped(lpOverlapped); ... do stuff with lpOverlapped and lpOtherData ... }
This version of the code uses the address of the
OVERLAPPED
structure to determine the
location in the MasterOverlapped
table
and uses the corresponding entry in the parallel array
at OtherData
to hold the other data.
Let’s make this code worse before we make it better:
OTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped) { for (int index = 0; index < MAX_OVERLAPPED; index++) { if (&MasterOverlapped[index] == lpOverlapped) { return &OtherData[index]; } } FatalError(); // should never be reached }
Instead of doing simple pointer arithmetic to recover the index, we walk the array testing the pointers. This is naturally worse than doing pointer arithmetic, but watch what this step allows us to do: First, we reorganize the data so that instead of two parallel arrays, we have a single array of a compound structure.
struct OVERLAPPEDEX { OVERLAPPED Overlapped; OTHERDATA OtherData; }; OVERLAPPEDEX Master[MAX_OVERLAPPED]; OTHERDATA* FindOtherDataFromOverlapped(OVERLAPPED *lpOverlapped) { for (int index = 0; index < MAX_OVERLAPPED; index++) { if (&Master[index].Overlapped == lpOverlapped) { return &Master[index].OtherData; } } FatalError(); // should never be reached } // I/O is issued via // ReadFileEx(hFile, lpBuffer, nNumberOfBytesToRead, // &Master[i].Overlapped, CompletionRoutine);
All we did was consolidate the parallel arrays into a single array.
Now that it’s an array of compound structures, we don’t need
to carry two pointers around (one to the OVERLAPPED
and one to the OTHERDATA
).
We can just use a single OVERLAPPEDEX
pointer
and dereference either the Overlapped
or the OtherData
part.
OVERLAPPEDEX* FindOverlappedExFromOverlapped( OVERLAPPED *lpOverlapped) { for (int index = 0; index < MAX_OVERLAPPED; index++) { if (&Master[index].Overlapped == lpOverlapped) { return &Master[index]; } } FatalError(); // should never be reached } void CALLBACK CompletionRoutine( DWORD dwErrorCode, DWORD dwNumberOfBytesTransferred, LPOVERLAPPED lpOverlapped) { OVELRAPPEDEX *lpOverlappedEx = FindOverlappedExFromOverlapped(lpOverlapped); ... do stuff with lpOverlappedEx ... }
Finally, we can optimize the
FindOverlappedExFromOverlapped
function
that we de-optimized earlier.
Observe that the de-optimized loop is an example of the
“for/if” anti-pattern.
The “for/if” anti-pattern goes like this:
for (int i = 0; i < 100; i++) { if (i == 42) do_something(i); }
This can naturally be simplified to
do_something(42);
Our
FindOverlappedExFromOverlapped
function
is a special case of this anti-pattern.
It becomes more evident if we do some rewriting.
Start with
&Master[index].Overlapped == lpOverlapped
Apply CONTAINING_RECORD
to both sides.
CONTAINING_RECORD(&Master[index].Overlapped, OVERLAPPEDEX, Overlapped) == CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)
The left-hand side of the comparison simplifies to
&Master[index]
resulting in
&Master[index] == CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)
Recall that a[b]
is equivalent to *(a+b)
,
and therefore &a[b]
is equivalent to a+b
.
Master + index == CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped)
Now subtract Master
from both sides:
index == CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master
We have transformed the test into a clear case of the for/if anti-pattern, and the function can be simplified to
OVERLAPPEDEX* FindOverlappedExFromOverlapped( OVERLAPPED *lpOverlapped) { ptrdiff_t index = CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master; return &Master[index]; }
Again, rewrite &a[b]
as a+b
:
return Master + index;
Substitute the value of index
computed on the previous
line:
return Master + CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped) - Master;
The two occurrences of Master
cancel out, leaving
OVERLAPPEDEX* FindOverlappedExFromOverlapped( OVERLAPPED *lpOverlapped) { return CONTAINING_RECORD(lpOverlapped, OVERLAPPEDEX, Overlapped); }
And there you have it.
By a series of purely mechanical transformations,
we have rediscovered the technique of
extending the OVERLAPPED
structure.
0 comments