Why does the CreateProcess function modify its input command line?

Raymond Chen

One of the nasty gotchas of the CreateProcess function is that the lpCommandLine parameter must be a mutable pointer. If you pass a pointer to memory that cannot be written to (for example, a pointer to a page that is marked PAGE_READONLY), then you might crash. Commenter Ritchie wonders why this parameter is so weird.

Basically, somebody back in the 1980’s wanted to avoid allocating memory. (Another way of interpreting this is that somebody tried to be a bit too clever.)

The CreateProcess temporarily modifies the string you pass as the lpCommandLine in its attempt to figure out where the program name ends and the command line arguments begin. Now, it could have made a copy of the string and made its temporary modifications to the copy, but hey, if you modify the input string directly, then you save yourself an expensive memory allocation operation. Back in the old days, people worried about avoiding memory allocations, so this class of micro-optimization is the sort of thing people worried about as a matter of course. Of course, nowadays, it seems rather antiquated.

Now, there may also be good technical reasons (as opposed to merely performance considerations) for avoiding allocating memory on the heap. When a program crashes, the just in time debugger is launched with the CreateProcess function, and you don’t want to allocate memory on the heap if the reason the program crashed is that the heap is corrupted. Otherwise, you can get yourself into a recursive crash loop: While trying to launch the debugger, you crash, which means you try to launch the debugger to debug the new crash, which again crashes, and so on. The original authors of the CreateProcess function were careful to avoid allocating memory off the heap, so that in the case the function is being asked to launch the debugger, it won’t get waylaid by a corrupted heap.

Whether these concerns are still valid today I am not sure, but it was those concerns that influenced the original design and therefore the interface.

Why is it that only the Unicode version is affected? Well, because the ANSI version of the function just converts its strings to Unicode and the calls the Unicode version of the function. Consequently, the ANSI version of the function happens to implement the workaround as a side effect of its original goal: The string passed to the Unicode version of the function is a temporary string!

Exercise: Why is it okay for the ANSI version of the CreateProcess to allocate a temporary string from the heap when the Unicode function cannot?

0 comments

Discussion is closed.

Feedback usabilla icon