Sure, xor’ing a register with itself is the idiom for zeroing it out, but why not sub?

Raymond Chen

Matt Godbolt, probably best known for being the proprietor of Compiler Explorer, wrote a brief article on why x86 compilers love the xor eax, eax instruction.

The answer is that it is the most compact way to set a register to zero on x86. In particular, it is several bytes shorter than the more obvious mov eax, 0 since it avoids having to encode the four-byte constant. The x86 architecture does not have a dedicated zero register, so if you need to zero out a register, you’ll have to do it ab initio.

But Matt doesn’t explain why everyone chooses xor as opposed to some other mathematical operation that is guaranteed to result in a zero? In particular, what’s wrong with sub eax, eax? It encodes to the same number of bytes, executes in the same number of cycles. And its behavior with respect to flags is even better:

	`xor eax, eax`	`sub eax, eax`
OF	clear	clear
SF	clear	clear
ZF	set	set
AF	undefined	clear
PF	set	set
CF	clear	clear

Observe that xor eax, eax leaves the AF flag undefined, whereas sub eax, eax clears it.

I don’t know why xor won the battle, but I suspect it was just a case of swarming.

In my hypothetical history, xor and sub started out with roughly similar popularity, but xor took a slightly lead due to some fluke, perhaps because it felt more “clever”.

When early compilers used xor to zero out a register, this started the snowball, because people would see the compiler generate xor and think, “Well, those compiler writes are smart, they must know something I don’t. Since I was on the fence between xor and sub, this tiny data point is enough to tip it toward xor.”

The predominance of these idioms as a way to zero out a register led Intel to add special xor r, r-detection and sub r, r-detection in the instruction decoding front-end and rename the destination to an internal zero register, bypassing the execution of the instruction entirely. You can imagine that the instruction, in some sense, “takes zero cycles to execute”. The front-end detection also breaks dependency chains: Normally, the output of an xor or sub is dependent on its inputs, but in this special case of xor‘ing or sub‘ing a register with itself, we know that the output is zero, independent of input.

Even though Intel added support for both xor-detection and sub-detection, Stack Overflow worries that other CPU manufacturers may have special-cased xor but not sub, so that makes xor the winner in this ultimately meaningless battle.

Once an instruction has an edge, even if only extremely slight, that’s enough to tip the scales and rally everyone to that side.

Bonus chatter: One of my former colleagues was partial to using sub r, r to zero a register, and when I was reading assembly code, I could tell that he was the author due to the use of sub to zero a register rather than the more popular xor.

Bonus bonus chatter: The xor trick doesn’t work for Itanium because mathematical operations don’t reset the NaT bit. Fortunately, Itanium also has a dedicated zero register, so you don’t need this trick. You can just move zero into your desired destination.

Topics

History

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

32 comments

Discussion is closed. Login to edit/delete existing comments.

Marco Brambilla April 27, 2026

Using XOR should cause less power consumption.
In order to do sub, the logic needs to complement the number, then run an adder, which in order to reach high performance cannot use a simple carry look ahead.
XOR requires literally just an XOR gate per bit, therefore the total energy per op is orders of magnitude lower.

Keep however in mind that was probably important many eons ago. With modern technologies we're probably talking less than a pico joule saved.
The energy needed for the instruction fetch and decode alone, especially in a CISC processor, is much higher, therefore the real...
Read more
Using XOR should cause less power consumption.
In order to do sub, the logic needs to complement the number, then run an adder, which in order to reach high performance cannot use a simple carry look ahead.
XOR requires literally just an XOR gate per bit, therefore the total energy per op is orders of magnitude lower.

Keep however in mind that was probably important many eons ago. With modern technologies we’re probably talking less than a pico joule saved.
The energy needed for the instruction fetch and decode alone, especially in a CISC processor, is much higher, therefore the real enrgy difference is in the noise.
So besides looking fancy it also probably produced a power benefit once upon a time

Read less
- Jerry Coffin April 28, 2026 · Edited
  On most modern processors, both
```
sub reg, reg
```
  and
```
xor reg, reg
```
  are recognized as a register clearing operation. They’re normally handled by register renaming, so they don’t involve carrying out an operation on the contents of a register at all.
  - Marco Brambilla April 29, 2026
    
    Thanks for the clarification. The historical explanation still applies though!
Jerry Coffin April 23, 2026

The AF (auxiliary carry flag) is only used by the BCD instructions DAA, DAS, AAA, and AAS.

Few (if any) compilers ever emitted any of these, and they’re quite unusual even in hand-written assembly code. In long mode (64-bit code) they’re no longer available.

Bottom line: it’s exceedingly rare that leaving AF in a defined state matters (at all).
- Peter Cordes April 23, 2026
  
  AF is part of the state saved by PUSHF. If for some reason you’re trying to verify that something runs *identically* on different machines, memory could plausibly differ because of different AF results. (That kind of verification seems more likely to be done for a kernel than for most user-space programs.). Still exceedingly rare, but not depending on use of BCD instructions.
  
  But yeah, this is fortunately pretty much a non-problem even if you care about rare CPUs that set AF with XOR.
  - Jerry Coffin April 23, 2026
    
    There are reserved bits whose values aren’t guaranteed either, so if you do a pushf, there are already bits you should ignore. At least in 64-bit code, you almost certainly want AF to ignore AF in any case.
Peter Cordes April 23, 2026 · Edited

Another reason non-x86 ISAs (like Itanium, but also AArch64) can't have dep-breaking zeroing idioms for XOR is memory dependency ordering. (memory_order_consume). Only x86 treats every load as acquire. Others need architectural rules about carrying dependencies to guarantee that you can do things like ptr[tmp-tmp] (in asm load / sub / load), and still have that load ordered after an earlier tmp=data_ready.load( consume). (Typically with a branch in between on the data-ready flag, but then also using it for the pointed-to data.)

Also, there's less reason to spend transistors on checking for zeroing idioms on RISCs or VLIWs...
Read more
Another reason non-x86 ISAs (like Itanium, but also AArch64) can’t have dep-breaking zeroing idioms for XOR is memory dependency ordering. (memory_order_consume). Only x86 treats every load as acquire. Others need architectural rules about carrying dependencies to guarantee that you can do things like ptr[tmp-tmp] (in asm load / sub / load), and still have that load ordered after an earlier tmp=data_ready.load( consume). (Typically with a branch in between on the data-ready flag, but then also using it for the pointed-to data.)

Also, there’s less reason to spend transistors on checking for zeroing idioms on RISCs or VLIWs where it doesn’t save code-size vs move-immediate which is automatically dep-breaking for any value. The back-end exec unit benefit is small.

BTW, Intel Silvermont only recognizes XOR as an idiom, not SUB. (Todo check current E cores like Gracemont / Crestmont.)
I mentioned that in https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and/33668295#33668295
I updated my “how many ways” answer you linked to not just say maybe.

Another commenter points out that Via Nano 2000 is the same, with only Nano 3000 adding recognition of SUB as a zeroing idiom.

Read less
Olivier Barthelemy April 23, 2026

Wasn’t xor better on Z80, and ppl just kept doing it on 8080 et seq mostly out of habit ? I tried to confirm, but the world is much younger than me these days so I couldn’t find a quick answer.
- Simon Farnsworth April 30, 2026
  
  I went looking at datasheets - on the 8008, 8080, 8085 and Z80, both SUB and XOR are equivalent in terms of instruction byte code and clock cycles; additionally, it looks like all ALU operations (including both SUB and XOR) affect the flags, so you don't even get that benefit from XOR. The 4004 and 4040 didn't have XOR, and neither the MC6800 nor the 6502 have an XOR r,r instruction.
  
  I'm guessing, based on what I've found so far, that you'd have to leave the world of microprocessors and go to minicomputers to find a CPU where XOR is better...
  Read more
  I went looking at datasheets – on the 8008, 8080, 8085 and Z80, both SUB and XOR are equivalent in terms of instruction byte code and clock cycles; additionally, it looks like all ALU operations (including both SUB and XOR) affect the flags, so you don’t even get that benefit from XOR. The 4004 and 4040 didn’t have XOR, and neither the MC6800 nor the 6502 have an XOR r,r instruction.
  
  I’m guessing, based on what I’ve found so far, that you’d have to leave the world of microprocessors and go to minicomputers to find a CPU where XOR is better than SUB – I can’t find an single-chip ALU where XOR has better timings than SUB, so I’d expect that you’d be looking at machines where an ALU was made from multiple chips (or that predate ICs) to find XOR beating SUB.
  
  Read less
- Your comment is awaiting moderation.
  
  anonymous April 28, 2026
  
  this comment has been deleted.