Could WriteProcessMemory be made faster by avoiding the intermediate buffer?

A little while ago, we wondered whether WriteProcessMemory was faster than shared memory for transferring data between two processes, and the conclusion is that it wasn’t. Shared memory, as its name implies, shares the memory between two processes: The two processes are accessing the same memory; there are no copies. On the other hand, the implementation of WriteProcessMemory allocates a transfer buffer, copies the data from the source to the transfer buffer, then changes memory context to the destination, and then copies the data from the transfer buffer to the destination. But could WriteProcessMemory be optimized to avoid this copy?

I mean, I guess you could do that in theory. I’m thinking, maybe create a memory descriptor list (MDL), lock and map the pages into kernel mode while in the context of the source, then change context to the destination and copy the memory to the destination. Repeat until all the memory has been copied. You don’t want to allocate a single MDL for the entire source block because the program might say that it wants to copy 100GB of memory, and if you didn’t cap the size of the transfer buffer, that would lock 100GB of RAM.

But it seems overkill and unnecessary to lock the source pages. It’s fine for them to be pageable. We’re okay with them faulting in as necessary.

I don’t know if there’s a way to map memory from one process into another except by locking it. I don’t spend a lot of time in kernel mode. But you do have to be careful that the mapping goes into the kernel address space and not the user-mode address space. Putting it in the user-mode address space would be a security vulnerability because the destination process can see the bytes on the source page that are not part of the memory being copied.¹

But really, all of this effort is pointless. We saw that the purpose of the WriteProcessMemory function is not inter-process communication (IPC) but to be a tool for debuggers. Debuggers are typically writing just a few bytes at a time, say, to patch a breakpoint instruction, and the WriteProcessMemory function actually goes out of its way to write the memory, even in the face of incompatible memory protections, though it does so in a not-thread-safe way. But that’s okay because the destination process is presumably frozen by the debugger when it calls WriteProcessMemory. A debugger is not going to patch a process while it’s actively running. The lack of atomicity means that patching a running process could result in the process seeing torn state, like a partly-patched variable or even a partly-patched instruction.

In summary, WriteProcessMemory was not intended to be used as an inter-process communication channel. Its intended client is a debugger that is using it to patch bytes in a process being debugged. The very high level of access required to call the function (PROCESS_VM_WRITE) is not suitable for an inter-process communication channel, since it basically gives the writer full pwnage over the process being written to. In the case of a debugger, you want the debugger to have complete and total control of the process being debugged. But in the case of IPC, you don’t want to give your clients that high a level of access to your process. And even if you get past that, the lack of atomicity and lack of control over the order in which the bytes become visible in the target process means that WriteProcessMemory is not suitable as an IPC mechanism anyway. There’s no point trying to make a bad idea more efficient.

¹ Or you could try it the other way: Map the destination into the source. But now you are giving the source read access to the destination bytes that share the same page as the destination buffer, even though the source may not have PROCESS_VM_READ access.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

1 comment

Joshua Hudson 3 hours ago

I have used WriteProcessMemory for IPC, because it was the obvious way to do IPC when only one side had a message loop.

"The very high level of access required to call the function (PROCESS_VM_WRITE) is not suitable for an inter-process communication channel"

Really? Most code that does IPC is within the same user and session; so you have PROCESS_VM_WRITE any time you want it.

I suppose getting rid of the intermediate buffer would make it faster, but it doesn't matter. Captive standard IO handles is fast enough; it's just a code structure reason why it can't always be used.

Stay informed

Get notified when new posts are published.

Join the discussion.

Could `WriteProcessMemory` be made faster by avoiding the intermediate buffer?

Author

1 comment

Leave a commentCancel reply

Read next

It rather involved being on the other side of the airtight hatchway: Tricking(?) a program into reading files

How can I distinguish between the numeric keypad 0 and the top-row 0 in the `WM_CHAR` message?

Author

1 comment

Leave a commentCancel reply

Read next

It rather involved being on the other side of the airtight hatchway: Tricking(?) a program into reading files

How can I distinguish between the numeric keypad 0 and the top-row 0 in the WM_CHAR message?

Stay informed

How can I distinguish between the numeric keypad 0 and the top-row 0 in the `WM_CHAR` message?