A little while ago, we wondered whether WriteÂProcessÂMemory was faster than shared memory for transferring data between two processes, and the conclusion is that it wasn’t. Shared memory, as its name implies, shares the memory between two processes: The two processes are accessing the same memory; there are no copies. On the other hand, the implementation of WriteÂProcessÂMemory allocates a transfer buffer, copies the data from the source to the transfer buffer, then changes memory context to the destination, and then copies the data from the transfer buffer to the destination. But could WriteÂProcessÂMemory be optimized to avoid this copy?
I mean, I guess you could do that in theory. I’m thinking, maybe create a memory descriptor list (MDL), lock and map the pages into kernel mode while in the context of the source, then change context to the destination and copy the memory to the destination. Repeat until all the memory has been copied. You don’t want to allocate a single MDL for the entire source block because the program might say that it wants to copy 100GB of memory, and if you didn’t cap the size of the transfer buffer, that would lock 100GB of RAM.
But it seems overkill and unnecessary to lock the source pages. It’s fine for them to be pageable. We’re okay with them faulting in as necessary.
I don’t know if there’s a way to map memory from one process into another except by locking it. I don’t spend a lot of time in kernel mode. But you do have to be careful that the mapping goes into the kernel address space and not the user-mode address space. Putting it in the user-mode address space would be a security vulnerability because the destination process can see the bytes on the source page that are not part of the memory being copied.¹
But really, all of this effort is pointless. We saw that the purpose of the WriteÂProcessÂMemory function is not inter-process communication (IPC) but to be a tool for debuggers. Debuggers are typically writing just a few bytes at a time, say, to patch a breakpoint instruction, and the WriteÂProcessÂMemory function actually goes out of its way to write the memory, even in the face of incompatible memory protections, though it does so in a not-thread-safe way. But that’s okay because the destination process is presumably frozen by the debugger when it calls WriteÂProcessÂMemory. A debugger is not going to patch a process while it’s actively running. The lack of atomicity means that patching a running process could result in the process seeing torn state, like a partly-patched variable or even a partly-patched instruction.
In summary, WriteÂProcessÂMemory was not intended to be used as an inter-process communication channel. Its intended client is a debugger that is using it to patch bytes in a process being debugged. The very high level of access required to call the function (PROCESS_) is not suitable for an inter-process communication channel, since it basically gives the writer full pwnage over the process being written to. In the case of a debugger, you want the debugger to have complete and total control of the process being debugged. But in the case of IPC, you don’t want to give your clients that high a level of access to your process. And even if you get past that, the lack of atomicity and lack of control over the order in which the bytes become visible in the target process means that WriteÂProcessÂMemory is not suitable as an IPC mechanism anyway. There’s no point trying to make a bad idea more efficient.
¹ Or you could try it the other way: Map the destination into the source. But now you are giving the source read access to the destination bytes that share the same page as the destination buffer, even though the source may not have PROCESS_ access.
I have used WriteProcessMemory for IPC, because it was the obvious way to do IPC when only one side had a message loop.
"The very high level of access required to call the function (PROCESS_VM_WRITE) is not suitable for an inter-process communication channel"
Really? Most code that does IPC is within the same user and session; so you have PROCESS_VM_WRITE any time you want it.
I suppose getting rid of the intermediate buffer would make it faster, but it doesn't matter. Captive standard IO handles is fast enough; it's just a code structure reason why it can't always be used.