{"id":106998,"date":"2022-08-17T07:00:00","date_gmt":"2022-08-17T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106998"},"modified":"2022-08-17T07:17:38","modified_gmt":"2022-08-17T14:17:38","slug":"20220817-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220817-00\/?p=106998","title":{"rendered":"The AArch64 processor (aka arm64), part 16: Conditional execution"},"content":{"rendered":"<p>The AArch64 provides a handful of branchless conditional instructions.<\/p>\n<p>First up are the conditional assignments.<\/p>\n<pre>    ; condition select\r\n    ; Rd = cond ? Rn : Rm\r\n    csel    Rd\/zr, Rn\/zr, Rm\/zr, cond\r\n\r\n    ; conditional select invert\r\n    ; Rd = cond ? Rn : ~Rm\r\n    csinv   Rd\/zr, Rn\/zr, Rm\/zr, cond\r\n\r\n    ; conditional select negate\r\n    ; Rd = cond ? Rn : -Rm\r\n    csneg   Rd\/zr, Rn\/zr, Rm\/zr, cond\r\n\r\n    ; conditional select increment\r\n    ; Rd = cond ? Rn : (Rm + 1)\r\n    csinc   Rd\/zr, Rn\/zr, Rm\/zr, cond\r\n<\/pre>\n<p>These operations assign a value based on a condition. If the condition is met, then the first input operand is assigned to the destination. Otherwise, some function of the second input operand is assigned.<\/p>\n<p>The condition is any of the same condition codes used by the conditional branch instruction.<\/p>\n<p>By passing the same register as both input operands, you get some interesting pseudo-instructions:<\/p>\n<pre>    ; conditional invert\r\n    ; Rd = cond ? Rn : ~Rn    \r\n    cinv    Rd\/zr, Rn\/zr, cond  ; csinv Rd, Rn, Rn, cond\r\n\r\n    ; conditional increment\r\n    ; Rd = cond ? (Rn + 1) : Rn\r\n    cinc    Rd\/zr, Rn\/zr, cond  ; csinc Rd, Rn, Rn, !cond\r\n\r\n    ; conditional negate\r\n    ; Rd = cond ? Rn : -Rn\r\n    cneg    Rd\/zr, Rn\/zr, cond  ; csneg Rd, Rn, Rn, !cond\r\n<\/pre>\n<p>Since the interesting operation occurs to the second input operand, we have to reverse the sense of the condition. (The assembler doesn&#8217;t accept <code>!<\/code> to negate the condition. You&#8217;ll have to write it out by hand.)<\/p>\n<p>Finally, we get some interesting pseudo-instructions if we hard-code both input registers to zero.<\/p>\n<pre>    ; conditional set\r\n    ; Rd = cond ? 1 : 0\r\n    cset    Rd\/zr, cond         ; csinc Rd, zr, zr, !cond\r\n\r\n    ; conditional set mask\r\n    ; Rd = cond ? -1 : 0        ; -1 is all bits set\r\n    csetm   Rd\/zr, cond         ; csinv Rd, zr, zr, !cond\r\n<\/pre>\n<p>The next set of conditional operations is the conditional comparisons, which let you combine the results of multiple comparisons so you can perform a single test at the end.<\/p>\n<p>Recall that Itanium accomplished this by <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150803-00\/?p=91191\"> predicating a comparison instruction<\/a>, which had the effect of accumulating (either by AND or OR) multiple predicates into a single predicate register. And PowerPC did this by having <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180807-00\/?p=99435\"> eight sets of flags on which you can perform boolean operations<\/a>, so that you can combine the flags in the way you like to produce a single result bit at the end.<\/p>\n<p>AArch64 does it by letting you make a comparison instruction conditional and also specify the artificial result if the condition is not met.<\/p>\n<pre>    ; conditional compare\r\n    ; if (cond) then set flags as if \"cmp a, b\"\r\n    ;           else set flags to #nzcv\r\n    ccmp    Rd\/zr, #imm5, #nzcv, cond\r\n    ccmp    Rd\/zr, Rn\/zr, #nzcv, cond\r\n\r\n    ; conditional compare negative\r\n    ; if (cond) then set flags as if \"cmn a, b\"\r\n    ;           else set flags to #nzcv\r\n    ccmn    Rd\/zr, #imm5, #nzcv, cond\r\n    ccmn    Rd\/zr, Rn\/zr, #nzcv, cond\r\n<\/pre>\n<p>The immediate is an unsigned 5-bit value, so it can cover the range 0 \u2026 31.<\/p>\n<p>If the condition is met, then the flags are set according to the underlying comparison instruction. And if the condition is not met, then the flags are set to the bits you specify. The flags are expressed as a 4-bit value, corresponding to this arrangement of the flag bits:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td>N<\/td>\n<td>Z<\/td>\n<td>C<\/td>\n<td>V<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The pattern for combining two results via AND is<\/p>\n<pre>    ; branch if a1 op1 b1 &amp;&amp; a2 op2 b2\r\n\r\n    cmp     a1, b1\r\n    ccmp    a2, b2, #op2-fail, op1\r\n    bop2    both_true\r\n<\/pre>\n<p>You start with the first comparison. Then you follow up with a <code>CCMP<\/code> where the condition is the thing you want the first comparison to be. The register operands are the arguments to the second comparison. And the <code>nzvc<\/code> value is chosen so that it fails the <code>Bop2<\/code>.<\/p>\n<p>For example,<\/p>\n<pre>    ; branch if r0 ge 0 and r1 lt 5\r\n    cmp     r0, #0\r\n    ccmp    r1, #5, #0, ge\r\n    blt     both_true\r\n<\/pre>\n<p>Let&#8217;s walk through this code. The important aspect of the magic value <code>#0<\/code> is that it corresponds to <var>N<\/var> = 0 and <var>V<\/var> = 0, which is the flags result of a comparison that reports &#8220;greater than or equal to&#8221;. (You can consult the condition chart from last time to see what each condition tests.)<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse; text-align: center;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th rowspan=\"2\">Instruction<\/th>\n<th colspan=\"2\">Flags<\/th>\n<\/tr>\n<tr>\n<th>If <var>r0 \u2265 0<\/var><\/th>\n<th>If <var>r0 &lt; 0<\/var><\/th>\n<\/tr>\n<tr>\n<td style=\"text-align: left;\"><code>cmp r0, #0<\/code><\/td>\n<td>GE<\/td>\n<td>LT<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: left;\"><code>ccmp r1, #5, #0, ge<\/code><\/td>\n<td><code>cmp r1, #5<\/code><\/td>\n<td>GE<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If the first comparison results in <code>GE<\/code>, then we perform the second comparison, and if it results in <code>LT<\/code> then we branch, satisfied that both conditions were met.<\/p>\n<p>If the first comparison does not produce <code>GE<\/code>, then we force the <var>nzcv<\/var> to zero, which acts like <code>GE<\/code>, and do not perform the second comparison. We just force it to fail. The branch fails, because we forced the flags to the opposite of <code>LT<\/code>.<\/p>\n<p>Similarly, the pattern for combining two comparisons via OR is<\/p>\n<pre>    ; branch if a1 op1 b1 &amp;&amp; a2 op2 b2\r\n\r\n    cmp     a1, b1\r\n    ccmp    a2, b2, #op2-succeed, !op1\r\n    bop2    either_true\r\n<\/pre>\n<p>If the first comparison is not the desired <code>op1<\/code>, then we try again with the second comparison. But if the first comparison was what we wanted, then we force the flags to be something that causes the conditional branch to succeed.<\/p>\n<p>This strikes me as a clever solution for allowing multiple conditions to be combined and tested with a single conditional branch at the end, and therefore consume only a single branch prediction slot. It gives you the results in a single flags register, rather than having to create multiple flags registers or predicates and then invent instructions that combine them. It works only for straight-line expressions (not things like <code>(a &amp;&amp; b) || (c &amp;&amp; d)<\/code>), but that&#8217;s probably good enough.<\/p>\n<p><b>Bonus chatter<\/b>: The Windows debugger disassembles these instructions differently from how they are listed in the ARM reference manual. Instead of putting the condition at the end of the instruction, the condition is appended to the opcode.<\/p>\n<pre>    csel    w0, w8, wzr, eq     ; ARM reference manual\r\n    cseleq  w0, w8, wzr         ; Windows debugger\r\n\r\n    ccmp    x0, #0x1c, #0, le   ; ARM reference manual\r\n    ccmple  x0, #0x1c, #0       ; Windows debugger\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Making decisions.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-106998","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Making decisions.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106998","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106998"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106998\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106998"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106998"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106998"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}