{"id":100825,"date":"2019-01-29T23:00:00","date_gmt":"2019-01-30T14:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=100825"},"modified":"2019-03-18T11:12:35","modified_gmt":"2019-03-18T18:12:35","slug":"20190130-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190129-00\/?p=100825","title":{"rendered":"The Intel 80386, part 8: Block operations"},"content":{"rendered":"<p>Most of the special-purpose operations that the 80386 inherited from the 8086 are largely obsolete. Although processors still support them, the implementations are not optimized, and compilers don&#8217;t generate them. <\/p>\n<p>Except for the block operations. Those are still important. <\/p>\n<p>The block operations (formally known as &#8220;string&#8221; instructions) operate on blocks of memory. They are another class of the unusual instructions that operate on two pieces of memory in a single instruction. <\/p>\n<p>The implied source memory is pointed to by the <var>esi<\/var> register, and the implied destination memory is pointed to by the <var>edi<\/var> register. You are not required to specify the implied operands in assembly language, but the Windows disassembler always shows them. I&#8217;ll show them as they are disassembled, since the focus of this series is on reading disassembly of compiler-generated code, not on writing assembly. <\/p>\n<p>Remember this table? <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Operand size<\/th>\n<th>Hi<\/th>\n<th>Lo<\/th>\n<\/tr>\n<tr>\n<td>byte<\/td>\n<td><code>AH<\/code><\/td>\n<td><code>AL<\/code><\/td>\n<\/tr>\n<tr>\n<td>word<\/td>\n<td><code>DX<\/code><\/td>\n<td><code>AX<\/code><\/td>\n<\/tr>\n<tr>\n<td>dword<\/td>\n<td><code>EDX<\/code><\/td>\n<td><code>EAX<\/code><\/td>\n<\/tr>\n<\/table>\n<p>We saw this table when we studied multiplication and division. Well, we&#8217;re going to use the <code>lo<\/code> column again. <\/p>\n<p>Let&#8217;s also define this operation: <\/p>\n<pre>\nadvance reg {\n   if (direction flag is clear) reg += sizeof(size)\n   if (direction flag is set  ) reg -= sizeof(size)\n}\n<\/pre>\n<p>The <code>advance<\/code> operation performs a post-increment if the direction flag is clear, aka <var>up<\/var>, or a post-decrement if the direction flag is set, aka <var>dn<\/var> (down). <\/p>\n<p>The <var>DF<\/var> flag is required to be <var>up<\/var> at function call boundaries. A function is permitted to set it to <var>dn<\/var> temporarily, but it needs to set it back to <var>up<\/var> before allowing control to leave the function.&sup1; <\/p>\n<p>In practice, the direction flag is always <var>up<\/var>, except possibly for brief moments inside the <code>memmove<\/code> function when moving between overlapped memory blocks. <\/p>\n<pre>\n    MOVS    size PTR [edi], size PTR [esi] ; d = s\n                                           ; advance edi\n                                           ; advance esi\n\n    CMPS    size PTR [edi], size PTR [esi] ; set flags per d - s\n                                           ; advance edi\n                                           ; advance esi\n\n    SCAS    size PTR [edi]                 ; set flags per lo - d\n                                           ; advance edi\n\n    LODS    size PTR [esi]                 ; lo = s\n                                           ; advance esi\n\n    STOS    size PTR [edi]                 ; s = lo\n                                           ; advance edi\n<\/pre>\n<p>The &#8220;move string&#8221; instruction copies the specified unit of memory from the source address to the destination address, and then post-increments or post-decrements the <var>edi<\/var> and <var>esi<\/var> registers. For example, <\/p>\n<pre>\n    MOVS    DWORD PTR [edi], DWORD PTR [esi]\n                ; *(int32_t*)edi = *(int32_t*)esi\n                ; if up, then edi += 4, esi += 4\n                ; if dn, then edi -= 4, esi -= 4\n<\/pre>\n<p>The &#8220;compare string&#8221; instruction sets flags according to the calculation of <code>d - s<\/code>, the same as the <code>CMP<\/code> instruction, and then post-increments\/post-decrements the <var>edi<\/var> and <var>esi<\/var> registers. <\/p>\n<p>The &#8220;scan string&#8221; instruction compares the destination with the <var>lo<\/var> register and then post-increments\/post-decrements the <var>edi<\/var> register. <\/p>\n<p>The &#8220;load string&#8221; instruction loads <var>lo<\/var> from the source and then post-increments\/post-decrements the <var>esi<\/var> register. <\/p>\n<p>The &#8220;store string&#8221; instruction stores <var>lo<\/var> to the destination and then post-increments\/post-decrements the <var>edi<\/var> register. <\/p>\n<p>These instructions are known as &#8220;string&#8221; operations because they can include a &#8220;repeat&#8221; prefix that indicates that the operation should be repeated for a number of times specified by the <var>ecx<\/var> register, which is the length of the string. <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Prefixed opcode<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><code>REP MOVS<\/code><\/td>\n<td>Move <var>ecx<\/var> units<\/td>\n<\/tr>\n<tr>\n<td><code>REPE CMPS<\/code><\/td>\n<td>Compare <var>ecx<\/var> units as long as they are equal<\/td>\n<\/tr>\n<tr>\n<td><code>REPNE CMPS<\/code><\/td>\n<td>Compare <var>ecx<\/var> units as long as they are different<\/td>\n<\/tr>\n<tr>\n<td><code>REPE SCAS<\/code><\/td>\n<td>Compare <var>ecx<\/var> units as long as they are equal to <var>lo<\/var><\/td>\n<\/tr>\n<tr>\n<td><code>REPNE SCAS<\/code><\/td>\n<td>Compare <var>ecx<\/var> units as long as they are different from <var>lo<\/var><\/td>\n<\/tr>\n<tr>\n<td><code>REP LODS<\/code><\/td>\n<td>Load <var>ecx<\/var> units into <var>lo<\/var><\/td>\n<\/tr>\n<tr>\n<td><code>REP STOS<\/code><\/td>\n<td>Store <var>ecx<\/var> units from <var>lo<\/var><\/td>\n<\/tr>\n<\/table>\n<p>The <code>REP<\/code> prefix causes the operation to repeat for <var>ecx<\/var> iterations. <\/p>\n<p>The <code>REPE<\/code> prefix causes the operation to repeat for <var>ecx<\/var> iterations, provided that the result of the comparison was &#8220;equal&#8221;. <\/p>\n<p>The <code>REPNE<\/code> prefix causes the operation to repeat for <var>ecx<\/var> iterations, provided that the result of the comparison was &#8220;not equal&#8221;. <\/p>\n<p>In all cases, if <var>ecx<\/var> is zero, then the instruction is a nop. <\/p>\n<p>The assembler accepts <code>REPZ<\/code> and <code>REPNZ<\/code> as synonyms for <code>REPE<\/code> and <code>REPNE<\/code>, respectively. <\/p>\n<p>Although <code>REP LODS<\/code> is technically legal, it is of dubious utility because each iteration will overwrite <var>lo<\/var>, and only the last iteration&#8217;s result will remain. <\/p>\n<p>At the end of the instruction, the <var>ecx<\/var> register has been decremented by the number of elements operated upon, and the <var>esi<\/var> and\/or <var>edi<\/var> registers have been incremented or decremented by the number of bytes operated upon. <\/p>\n<p>These instructions are typically used only in the following idioms: <\/p>\n<pre>\n    ; copy ecx units from esi to edi\n    REP MOVS size PTR [edi], size PTR [esi]\n\n    ; look for lo in a buffer with ecx elements starting at edi\n    REPNE SCAS size PTR [edi]\n\n    ; store ecx copies of lo into the buffer starting at edi\n    REP STOS size PTR [edi]\n<\/pre>\n<p>For the cases where there are multiple termination conditions, you can inspect the flags and the <var>ecx<\/var> register to determine which condition terminated the loop and consequently how many iterations of the loop were performed. <\/p>\n<pre>\n    mov ecx, 100                ; search up to 100 characters\n    xor eax, eax                ; search for 0\n    mov edi, offset string      ; search this string\n    repne scas byte ptr [edi]   ; scan bytes looking for 0 (find end of string)\n    jnz toolong                 ; not found\n    sub edi, (offset string) + 1 ; calculate length\n<\/pre>\n<p>AFter preparing the preconditions for the <code>REPNE SCAS<\/code> instruction, we kick off the search. At the completion of the instruction, we know the following: <\/p>\n<ul>\n<li>If the zero byte was not found:<\/li>\n<ul>\n<li>The loop ran for 100 iterations.<\/li>\n<li><var>ZF<\/var> will be clear (<var>nz<\/var>).<\/li>\n<li><var>ecx<\/var> was decremented 100 times.         Its value is now zero.<\/li>\n<li><var>edi<\/var> was incremented 100 times.         It now points one past the end of the buffer.<\/li>\n<\/ul>\n<li>If the zero byte was found, at offset <var>n<\/var>:<\/li>\n<ul>\n<li>The loop ran for <var>n<\/var>+1 iterations.<\/li>\n<li><var>ZF<\/var> will be set (<var>zr<\/var>).<\/li>\n<li><var>ecx<\/var> was decremented <var>n<\/var>+1 times.         Its value is the number of characters not scanned.<\/li>\n<li><var>edi<\/var> was incremented <var>n<\/var>+1 times.         It now points one past the zero byte.<\/li>\n<\/ul>\n<\/ul>\n<p>After the <code>REPNE SCAS<\/code> instruction, we check the <var>ZF<\/var> flag to see whether the zero byte was found. If not, then we declare the string too long. <\/p>\n<p>Otherwise, the zero byte was found and we want to calculate the length. We have two choices: We could try to infer it from <var>ecx<\/var>, whose final value is 100 &minus; (<var>n<\/var> + 1), or we could try to infer it from <var>edi<\/var>, whose final value is <code>offset string<\/code> + <var>n<\/var> + 1. <\/p>\n<p>To infer it from <var>ecx<\/var>, we solve for <var>n<\/var> and get <var>n<\/var> = 99 &minus; <var>ecx<\/var>. However, the 80386 does not have a way to subtract a register from a constant in a single instruction, so this would require us to use two instructions, say <code>sub ecx, 99<\/code> followed by <code>neg ecx<\/code>. <\/p>\n<p>To infer it from <var>edi<\/var>, we solve for <var>n<\/var> and get <var>n<\/var> = <var>edi<\/var> &minus; <code>offset string<\/code> &minus; 1<\/var> = <var>edi<\/var> &minus; (<code>offset string<\/code> + 1<\/var>). <\/p>\n<p>The second calculation is easier in this case, so we go with that. <\/p>\n<p>These instructions are usually used with a repeat prefix, but for small numbers of iterations, they might be unrolled, to avoid the overhead of having to set up the <code>ecx<\/code> register. The <code>MOVS<\/code> instruction encodes in only one byte, so you can do four of them in fewer bytes than it takes to load a constant into a 32-bit register. <\/p>\n<pre>\n    ; move 16 bytes from esi to edi\n    MOVS    DWORD PTR [edi], DWORD PTR [esi]\n    MOVS    DWORD PTR [edi], DWORD PTR [esi]\n    MOVS    DWORD PTR [edi], DWORD PTR [esi]\n    MOVS    DWORD PTR [edi], DWORD PTR [esi]\n<\/pre>\n<p>The repeating instructions do not operate atomically. Rather, a single iteration is run, the registers are updated, and then the instruction pointer either advances to the next instruction if the loop termination condition is met, or it returns to the instruction if the loop should continue. This means that at each step, the <var>ecx<\/var> register decrements by one, the <var>edi<\/var> and\/or <var>esi<\/var> registers advance by one unit, the flags are set as necessary, and then the instruction pointer either moves to the next instruction or stays put. (This design permits interrupts to be serviced during long block operations.) <\/p>\n<p>You&#8217;ll notice this behavior if you try to single-step through a repeated block operation in the debugger. Each single-step will run one iteration, and it will look like nothing happened because the instruction pointer didn&#8217;t move. But something did happen: The <var>ecx<\/var> register was decremented, the <var>edi<\/var> and\/or <var>esi<\/var> registers advanced, and flags may have been updated. <\/p>\n<p><a HREF=\"http:\/\/devblogs.microsoft.com\/oldnewthing\/20190131-00\/?p=100835\">Next time<\/a>, we&#8217;ll look at the stack frame instructions. <\/p>\n<p>&sup1; Back in the days when assembly language was still commonly used, a frustrating source of bugs was forgetting to set the direction flag back to <var>up<\/var> when you were finished. This caused future string operations to walk backward through memory rather than forward, and the result of the error was often not manifested until much, much later, at which point the culprit was long gone. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the highly specialized groups of instructions.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-100825","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>One of the highly specialized groups of instructions.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/100825","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=100825"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/100825\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=100825"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=100825"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=100825"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}