Matt Godbolt, probably best known for being the proprietor of Compiler Explorer, wrote a brief article on why x86 compilers love the xor eax, eax instruction.
The answer is that it is the most compact way to set a register to zero on x86. In particular, it is several bytes shorter than the more obvious mov eax, 0 since it avoids having to encode the four-byte constant. The x86 architecture does not have a dedicated zero register, so if you need to zero out a register, you’ll have to do it ab initio.
But Matt doesn’t explain why everyone chooses xor as opposed to some other mathematical operation that is guaranteed to result in a zero? In particular, what’s wrong with sub eax, eax? It encodes to the same number of bytes, executes in the same number of cycles. And its behavior with respect to flags is even better:
| Â | xor eax, eax | sub eax, eax |
|---|---|---|
| OF | clear | clear |
| SF | clear | clear |
| ZF | set | set |
| AF | undefined | clear |
| PF | set | set |
| CF | clear | clear |
Observe that xor eax, eax leaves the AF flag undefined, whereas sub eax, eax clears it.
I don’t know why xor won the battle, but I suspect it was just a case of swarming.
In my hypothetical history, xor and sub started out with roughly similar popularity, but xor took a slightly lead due to some fluke, perhaps because it felt more “clever”.
When early compilers used xor to zero out a register, this started the snowball, because people would see the compiler generate xor and think, “Well, those compiler writes are smart, they must know something I don’t. Since I was on the fence between xor and sub, this tiny data point is enough to tip it toward xor.”
The predominance of these idioms as a way to zero out a register led Intel to add special xor r, r-detection and sub r, r-detection in the instruction decoding front-end and rename the destination to an internal zero register, bypassing the execution of the instruction entirely. You can imagine that the instruction, in some sense, “takes zero cycles to execute”. The front-end detection also breaks dependency chains: Normally, the output of an xor or sub is dependent on its inputs, but in this special case of xor‘ing or sub‘ing a register with itself, we know that the output is zero, independent of input.
Even though Intel added support for both xor-detection and sub-detection, Stack Overflow worries that other CPU manufacturers may have special-cased xor but not sub, so that makes xor the winner in this ultimately meaningless battle.
Once an instruction has an edge, even if only extremely slight, that’s enough to tip the scales and rally everyone to that side.
Bonus chatter: One of my former colleagues was partial to using sub r, r to zero a register, and when I was reading assembly code, I could tell that he was the author due to the use of sub to zero a register rather than the more popular xor.
Bonus bonus chatter: The xor trick doesn’t work for Itanium because mathematical operations don’t reset the NaT bit. Fortunately, Itanium also has a dedicated zero register, so you don’t need this trick. You can just move zero into your desired destination.
I checked my copy of the Peter Norton assembly language book, the 1989 revision, and it mentions both XOR and SUB.
There is a lengthy explanation of why XOR won at StackOverflow here. Main tipping point is probably that it was apparently the way that Intel suggested it should be done.
Love your articles sir, thank you for writing them. Always an interesting read.
(Edited) There are indeed processors which recognized XOR but not SUB. Agner Fog’s manual has some details.
Via Nano 2000 is one (SUB dependency breaking is only supported with the Nano 3000).
Agner Fog insists that AMD processors starting with K10 support both XOR and SUB, but I checked the AMD optimization manuals from 2004 and 2014 and they don’t even mention SUB. They ask developers to use XOR only. Probably due to some chicken-and-egg situation as you described. The 2023 manual finally mentions SUB.
So if you were an asm programmer in 2004, and you read both Intel’s and AMD’s manual, the...