{"id":112247,"date":"2026-04-21T07:00:00","date_gmt":"2026-04-21T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=112247"},"modified":"2026-04-21T09:10:32","modified_gmt":"2026-04-21T16:10:32","slug":"20260421-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20260421-00\/?p=112247","title":{"rendered":"Sure, xor&#8217;ing a register with itself is the idiom for zeroing it out, but why not sub?"},"content":{"rendered":"<p><a href=\"https:\/\/xania.org\/MattGodbolt\">Matt Godbolt<\/a>, probably best known for being the proprietor of <a href=\"https:\/\/compiler-explorer.com\/\">Compiler Explorer<\/a>, wrote a brief article on <a title=\"Why xor eax, eax?\" href=\"https:\/\/xania.org\/202512\/01-xor-eax-eax\"> why x86 compilers love the <code>xor eax, eax<\/code> instruction<\/a>.<\/p>\n<p>The answer is that it is the most compact way to set a register to zero on x86. In particular, it is several bytes shorter than the more obvious <code>mov eax, 0<\/code> since it avoids having to encode the four-byte constant. The x86 architecture does not have a dedicated zero register, so if you need to zero out a register, you&#8217;ll have to do it <i>ab initio<\/i>.<\/p>\n<p>But Matt doesn&#8217;t explain why everyone chooses <code>xor<\/code> as opposed to some other mathematical operation that is guaranteed to result in a zero? In particular, what&#8217;s wrong with <code>sub eax, eax<\/code>? It encodes to the same number of bytes, executes in the same number of cycles. And its behavior with respect to flags is even better:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse; text-align: center;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>\u00a0<\/th>\n<th><tt>xor eax, eax<\/tt><\/th>\n<th><tt>sub eax, eax<\/tt><\/th>\n<\/tr>\n<tr>\n<th>OF<\/th>\n<td>clear<\/td>\n<td>clear<\/td>\n<\/tr>\n<tr>\n<th>SF<\/th>\n<td>clear<\/td>\n<td>clear<\/td>\n<\/tr>\n<tr>\n<th>ZF<\/th>\n<td>set<\/td>\n<td>set<\/td>\n<\/tr>\n<tr>\n<th>AF<\/th>\n<td>undefined<\/td>\n<td>clear<\/td>\n<\/tr>\n<tr>\n<th>PF<\/th>\n<td>set<\/td>\n<td>set<\/td>\n<\/tr>\n<tr>\n<th>CF<\/th>\n<td>clear<\/td>\n<td>clear<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Observe that <code>xor eax, eax<\/code> leaves the AF flag undefined, whereas <code>sub eax, eax<\/code> clears it.<\/p>\n<p>I don&#8217;t know why <code>xor<\/code> won the battle, but I suspect it was just a case of swarming.<\/p>\n<p>In my hypothetical history, <code>xor<\/code> and <code>sub<\/code> started out with roughly similar popularity, but <code>xor<\/code> took a slightly lead due to some fluke, perhaps because it felt more &#8220;clever&#8221;.<\/p>\n<p>When early compilers used <code>xor<\/code> to zero out a register, this started the snowball, because people would see the compiler generate <code>xor<\/code> and think, &#8220;Well, those compiler writes are smart, they must know something I don&#8217;t. Since I was on the fence between <code>xor<\/code> and <code>sub<\/code>, this tiny data point is enough to tip it toward <code>xor<\/code>.&#8221;<\/p>\n<p>The predominance of these idioms as a way to zero out a register led Intel to add special <code>xor r, r<\/code>-detection and <code>sub r, r<\/code>-detection in the instruction decoding front-end and rename the destination to an internal zero register, bypassing the execution of the instruction entirely. You can imagine that the instruction, in some sense, &#8220;takes zero cycles to execute&#8221;. The front-end detection also breaks dependency chains: Normally, the output of an <code>xor<\/code> or <code>sub<\/code> is dependent on its inputs, but in this special case of <code>xor<\/code>&#8216;ing or <code>sub<\/code>&#8216;ing a register with itself, we know that the output is zero, independent of input.<\/p>\n<p>Even though Intel added support for both <code>xor<\/code>-detection and <code>sub<\/code>-detection, <a title=\"How many ways to set a register to zero?\" href=\"https:\/\/stackoverflow.com\/questions\/4829937\/how-many-ways-to-set-a-register-to-zero\"> Stack Overflow worries that other CPU manufacturers may have special-cased <code>xor<\/code> but not <code>sub<\/code><\/a>, so that makes <code>xor<\/code> the winner in this ultimately meaningless battle.<\/p>\n<p>Once an instruction has an edge, even if only extremely slight, that&#8217;s enough to tip the scales and rally everyone to that side.<\/p>\n<p><b>Bonus chatter<\/b>: <a href=\"https:\/\/github.com\/jeffpar\"> One of my former colleagues<\/a> was partial to using <code>sub r, r<\/code> to zero a register, and when I was reading assembly code, I could tell that he was the author due to the use of <code>sub<\/code> to zero a register rather than the more popular <code>xor<\/code>.<\/p>\n<p><b>Bonus bonus chatter<\/b>: The <code>xor<\/code> trick doesn&#8217;t work for Itanium because mathematical operations <a title=\"The Itanium processor, part 7: Speculative loads\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150804-00\/?p=91181\"> don&#8217;t reset the NaT bit<\/a>. Fortunately, Itanium also <a title=\"The Itanium processor, part 1: Warming up\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150727-00\/?p=90821\"> has a dedicated zero register<\/a>, so you don&#8217;t need this trick. You can just move zero into your desired destination.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Somehow xor became the most popular version.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-112247","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Somehow xor became the most popular version.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112247","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=112247"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112247\/revisions"}],"predecessor-version":[{"id":112248,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112247\/revisions\/112248"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=112247"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=112247"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=112247"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}