{"id":100795,"date":"2019-01-24T23:00:00","date_gmt":"2019-01-25T14:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=100795"},"modified":"2019-03-18T11:10:34","modified_gmt":"2019-03-18T18:10:34","slug":"20190125-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190124-00\/?p=100795","title":{"rendered":"The Intel 80386, part 5: Logical operations"},"content":{"rendered":"<p>The next group of instructions we&#8217;ll look are the bitwise logical operation. <\/p>\n<pre>\n    AND     r\/m, r\/m\/i  ; d &amp;= s, set flags\n    OR      r\/m, r\/m\/i  ; d |= s, set flags\n    XOR     r\/m, r\/m\/i  ; d ^= s, set flags\n\n    TEST    r\/m, r\/m\/i  ; calculate d &amp; s, set flags\n\n    NOT     r\/m         ; d = ~d, do <u>not<\/u> set flags\n<\/pre>\n<p>The <code>AND<\/code>, <code>OR<\/code>, and <code>XOR<\/code> instructions set flags based on the numeric value of the result; carry and overflow are always clear. <\/p>\n<p>The <code>TEST<\/code> instruction is the same as <code>AND<\/code>, except that the result is thrown away rather than being stored back into the destination. You can say that <code>AND<\/code> is to <code>TEST<\/code> as <code>SUB<\/code> is to <code>CMP<\/code>. <\/p>\n<p>A quirk of the <code>TEST<\/code> instruction is that it does not support an 8-bit immediate with sign extension. The immediate must be the same size as the other operand. This means that you can save instruction encoding space by using a smaller data size: <\/p>\n<pre>\n    TEST    DWORD PTR [rax+10h], 40000000h  ; 7-byte instruction\n    TEST    BYTE PTR [rax+13h], 40h         ; 4-byte instruction\n<\/pre>\n<p>If you do this, you will run afoul of the <a HREF=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/\">store-to-load forwarder<\/a>. Fortunately, the 80386 doesn&#8217;t have one. <\/p>\n<p>We will learn later that moving constants into registers requires a large instruction encoding. To avoid this, you may see two idioms for setting a register to zero: You can subtract it from itself, or you can exclusive-or it with itself. <\/p>\n<pre>\n    SUB     eax, eax        ; set eax = 0, set flags\n    XOR     eax, eax        ; set eax = 0, set flags\n<\/pre>\n<p>The 80386 doesn&#8217;t really care either way, but later versions of the processor recognize the &#8220;<code>XOR<\/code> a register with itself&#8221; idiom and special-case it to avoid the dependency on the previous value of the register. Therefore, you&#8217;ll see the <code>XOR<\/code> version in compiler-generated code. <\/p>\n<p>The next group of instructions is the bit-testing group. <\/p>\n<pre>\n    BT      r\/m, r\/i        ; copy bit s of d to CF\n    BTS     r\/m, r\/i        ; copy bit s of d to CF and set\n    BTR     r\/m, r\/i        ; copy bit s of d to CF and reset\n    BTC     r\/m, r\/i        ; copy bit s of d to CF and complement\n<\/pre>\n<p>The <code>BT<\/code> instruction tests a bit (lowest-order bit is bit zero) of the destination operand to the carry flag. If the destination is a register, then the bit number is taken mod <var>n<\/var>, where <var>n<\/var> is the register size. If the destination is memory, then the memory is considered a packed bit array, and bit <var>s<\/var> % 8 of byte <var>m<\/var> + (<var>s<\/var> \/ 8) is copied.&sup1; For example, <\/p>\n<pre>\n    BT      eax, 17     ; copy bit 17 of eax to carry\n    SBB     ecx, -1     ; ecx -= -1 + CF\n<\/pre>\n<p>The effect of this sequence of operations is to increment the <var>ecx<\/var> register if bit 17 of <var>eax<\/var> is clear: If the bit is not set, then the <code>BT<\/code> results in carry clear, so the <code>SBB<\/code> instruction subtracts &minus;1 from <var>ecx<\/var>, which has the effect of adding 1. If the bit is set, then the <code>BT<\/code> results in carry set, so the <code>SBB<\/code> instruction subtracts &minus;1 from <var>ecx<\/var>, and then subtracts one more. Some algebra shows that <var>ecx<\/var> &minus; (&minus;1) &minus;1 = <var>ecx<\/var> + 1 &minus;1 = <var>ecx<\/var>, so there is no net change to the <var>ecx<\/var> register. <\/p>\n<p>The <code>BTS<\/code>, <code>BTR<\/code>, and <code>BTC<\/code> instructions copy the bit to the carry flag, and then set, reset, or toggle the bit that was tested. I haven&#8217;t seen the compiler generate these instructions, so you probably don&#8217;t need to know them. <\/p>\n<p>Next are the shift instructions. <\/p>\n<pre>\n    SHL     r\/m, CL\/i       ; d = d &lt;&lt; s,             set flags\n    SHR     r\/m, CL\/i       ; d = d &gt;&gt; s (zero-fill), set flags\n    SAR     r\/m, CL\/i       ; d = d &gt;&gt; s (sign-fill), set flags\n<\/pre>\n<p>The <code>SHL<\/code> instructions shifts left, The <code>SHR<\/code> instructions shifts right with zero fill (unsigned shift), and the The <code>SAR<\/code> instructions shifts right with sign fill (signed shift). <\/p>\n<p>The shift amount can be a constant (the encoding with 1 is more compact than the encoding with other constants), or it can be a variable in the <var>cl<\/var> register. No other register can be used to specify the shift amount. The shift amount is taken mod 32. <\/p>\n<p>The last bit shifted out is placed in the carry flag. If the shift amount is the immediate 1, then the overflow flag is set if the sign bit changed. (If the shift amount is not the immediate 1, then the overflow flag is undefined.) The zero, sign, and parity flags are set based on the result. <\/p>\n<p>Next come the double shift instructions. <\/p>\n<pre>\n    SHLD    r\/m, r, CL\/i       ; d = d &lt;&lt; t, fill from s, set flags\n                               ; n = 16, 32\n    SHRD    r\/m, r, CL\/i       ; d = d &gt;&gt; t, fill from s, set flags\n                               ; n = 16, 32\n<\/pre>\n<p>The shift left double and shift right double instruction shift the destination by the amount specified by the third operand (which must be a constant or the <var>cl<\/var> register) and fills in the bits from the second operand. The <code>SHLD<\/code> instruction fills with the high-order bits of <var>s<\/var>, and the <code>SHRD<\/code> instruction fills with the low-order bits of <var>s<\/var>. The last bit shifted out is copied to the carry flag. The shift amount is taken mod 32. <\/p>\n<p>Although <var>n<\/var> can be 16, you won&#8217;t see it in practice, so there&#8217;s no point mentioning that the behavior is undefined if the shift amount (mod 32) is greater than 16. <\/p>\n<p>Okay, so those were the logical operations. <a HREF=\"http:\/\/devblogs.microsoft.com\/oldnewthing\/20190128-00\/?p=100805\">Next time<\/a>, we&#8217;ll look at data transfer instructions. <\/p>\n<p>&sup1; Technically, it is bit <var>s<\/var> % <var>n<\/var> of <var>n<\/var>-bit unit <var>m<\/var> + (<var>s<\/var> \/ <var>n<\/var>). This means that <\/p>\n<pre>\n    MOV     ecx, 32\n    BT      DWORD PTR [eax], ecx\n<\/pre>\n<p>will read four bytes from <code>[eax+4]<\/code> to <code>[eax+7]<\/code> and then test bit 0 of the value. Note that the bytes from <code>[eax+5]<\/code> to <code>[eax+7]<\/code> do not participate in the bit test, but they must still be accessible, or you will take an access violation. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Fiddling with bits.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-100795","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Fiddling with bits.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/100795","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=100795"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/100795\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=100795"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=100795"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=100795"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}