{"id":100745,"date":"2019-01-20T23:00:00","date_gmt":"2019-01-21T14:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=100745"},"modified":"2019-03-18T11:02:43","modified_gmt":"2019-03-18T18:02:43","slug":"20190121-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190120-00\/?p=100745","title":{"rendered":"The Intel 80386, part 1: Introduction"},"content":{"rendered":"<p>Windows NT stopped supporting the Intel 80386 processor with Windows 4.0, which raised the minimum requirements to an Intel 80486. Therefore, the Intel 80386 technically falls into the category of &#8220;processor that Windows once supported but no longer does.&#8221; This series focuses on the portion of the x86 instruction set available on an 80386, although I will make notes about future extensions in a special chapter. <\/p>\n<p>The Intel 80386 is the next step in the evolution of the processor series that started with the Intel 8086 (which was itself inspired by the Intel 8080, which was in turn inspired by the Intel 8008). Even at this early stage, it had a long history, which helps to explain many of its strange corners. <\/p>\n<p>As with all the processor retrospective series, I&#8217;m going to focus on how Windows NT used the Intel 80386 in user mode because the original audience for all of these discussions was user-mode developers trying to get up to speed debugging their programs. Normally, this means that I omit instructions that you are unlikely to see in compiler-generated code. However, I&#8217;ll set aside a day to cover some of the legacy instructions that are functional but not used in practice. <\/p>\n<p>The Intel 80386 has eight integer registers, each 32 bits wide. <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Meaning<\/th>\n<th>Preserved?<\/th>\n<\/tr>\n<tr>\n<td><var>eax<\/var><\/td>\n<td>accumulator<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td><var>ebx<\/var><\/td>\n<td>base register<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td><var>ecx<\/var><\/td>\n<td>count register<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td><var>edx<\/var><\/td>\n<td>data register<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td><var>esi<\/var><\/td>\n<td>source index<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td><var>edi<\/var><\/td>\n<td>destination index<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td><var>ebp<\/var><\/td>\n<td>base pointer<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td><var>esp<\/var><\/td>\n<td>stack pointer<\/td>\n<td>Sort of<\/td>\n<\/tr>\n<\/table>\n<p>The register names are rather unusual due to <a HREF=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/\">the history of the processor line<\/a>. That history also explains why the instruction encoding uses the non-alphabetical-order <var>eax<\/var>, <var>ecx<\/var>, <var>edx<\/var>, <var>ebx<\/var>. <\/p>\n<p>Also for historical reasons, there are also names for selected partial registers. <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><var>ax<\/var><\/td>\n<td>Lower 16 bits of <var>eax<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>bx<\/var><\/td>\n<td>Lower 16 bits of <var>ebx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>cx<\/var><\/td>\n<td>Lower 16 bits of <var>ecx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>dx<\/var><\/td>\n<td>Lower 16 bits of <var>edx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>si<\/var><\/td>\n<td>Lower 16 bits of <var>esi<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>di<\/var><\/td>\n<td>Lower 16 bits of <var>edi<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>bp<\/var><\/td>\n<td>Lower 16 bits of <var>ebp<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>sp<\/var><\/td>\n<td>Lower 16 bits of <var>esp<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>ah<\/var><\/td>\n<td>Upper 8 bits of <var>ax<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>al<\/var><\/td>\n<td>Lower 8 bits of <var>ax<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>bh<\/var><\/td>\n<td>Upper 8 bits of <var>bx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>bl<\/var><\/td>\n<td>Lower 8 bits of <var>bx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>ch<\/var><\/td>\n<td>Upper 8 bits of <var>cx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>cl<\/var><\/td>\n<td>Lower 8 bits of <var>cx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>dh<\/var><\/td>\n<td>Upper 8 bits of <var>dx<\/var><\/td>\n<\/tr>\n<tr>\n<td><var>dl<\/var><\/td>\n<td>Lower 8 bits of <var>dx<\/var><\/td>\n<\/tr>\n<\/table>\n<p>Operations on these register fragments affect only the indicated bits; the other bits of the 32-bit register remain unaffected. For example, storing a value into the <var>ax<\/var> register leaves the most-significant 16 bits of the <var>eax<\/var> register unchanged.&sup1; <\/p>\n<p>Windows NT requires that the stack be kept on an 4-byte boundary. There is no red zone. <\/p>\n<p>The 80386 also has eight 80-bit extended precision floating point registers named <var>st0<\/var> through <var>st7<\/var>. The floating point system is rather unusual: In addition to the fact that the registers are extended precision, the programming model for the floating point registers is as a stack. Values are pushed onto the floating point stack, operations are performed on the stack, and results are popped off. <\/p>\n<p>Floating point support is optional and is provided by the 80387 coprocessor chip, which runs concurrently with the main CPU. If a floating point instruction is executed on a system that lacks a floating point coprocessor, the floating point instruction traps, and the kernel emulates the instruction. <\/p>\n<p>There are also some non-integer registers which are difficult\/impossible to get to, but which still participate in user-mode instructions. <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Meaning<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>eip<\/var><\/td>\n<td>instruction pointer<\/td>\n<td>program counter<\/td>\n<\/tr>\n<tr>\n<td><var>eflags<\/var><\/td>\n<td>flags<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td><var>cs<\/var><\/td>\n<td>code segment<\/td>\n<td>Don&#8217;t worry about it<\/td>\n<\/tr>\n<tr>\n<td><var>ds<\/var><\/td>\n<td>data segment<\/td>\n<td>Don&#8217;t worry about it<\/td>\n<\/tr>\n<tr>\n<td><var>es<\/var><\/td>\n<td>extra segment<\/td>\n<td>Don&#8217;t worry about it<\/td>\n<\/tr>\n<tr>\n<td><var>fs<\/var><\/td>\n<td>F segment<\/td>\n<td>For TEB access<\/td>\n<\/tr>\n<tr>\n<td><var>gs<\/var><\/td>\n<td>G segment<\/td>\n<td>Not used<\/td>\n<\/tr>\n<\/table>\n<p>Windows NT uses the 80386 in flat mode, which means that applications see a contiguous 32-bit address space. The segment registers largely don&#8217;t come into play when in flat mode, with the exception of the <var>fs<\/var> register, which we&#8217;ll learn about more when we get to the TEB. <\/p>\n<p>The flags register is updated by many instructions. We&#8217;ll learn more about flags when we study conditionals. <\/p>\n<p>The 80386 is unusual in that it supports multiple calling conventions. Common to all the calling conventions are the register preservation rules and the return value rules: The function return value is placed in <var>eax<\/var>. If the return value is a 64-bit value, then the most significant 32 bits are returned in <var>edx<\/var>. If the return value is a floating point value, it is returned in <var>st0<\/var>, and possibly <var>st1<\/var> (for complex numbers). <\/p>\n<p>Furthermore, link-time code generation is permitted to manufacture ad hoc calling conventions which may not even follow the register preservation rules. <i>It&#8217;s crazy free-for-all time<\/i>. <\/p>\n<p>The architectural names for data sizes are as follows: <\/p>\n<ul>\n<li><b>byte<\/b>: 8-bit value<\/li>\n<li><b>word<\/b>: 16-bit value<\/li>\n<li><b>dword<\/b> (doubleword): 32-bit value<\/li>\n<li><b>qword<\/b> (quadword): 64-bit value<\/li>\n<li><b>tword<\/b> (ten-byte word): 80-bit value<\/li>\n<\/ul>\n<p>Instruction encoding is highly irregular. Instructions are variable-length, and instructions can begin at any byte boundary. <\/p>\n<p>The general pattern for multi-operand opcodes is <\/p>\n<pre>\n    opcode  destination, source\n<\/pre>\n<p>Note that the destination is on the left. Note also that three-operand instructions are rare. This will become interesting when we get to arithmetic. <\/p>\n<p>Here&#8217;s the notation I will use when introducing instructions: <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Notation<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td>r<var>n<\/var><\/td>\n<td><var>n<\/var>-bit register<\/td>\n<\/tr>\n<tr>\n<td>m<var>n<\/var><\/td>\n<td><var>n<\/var>-bit memory<\/td>\n<\/tr>\n<tr>\n<td>i<var>n<\/var><\/td>\n<td><var>n<\/var>-bit immediate<\/td>\n<\/tr>\n<tr>\n<td>r\/m<var>n<\/var><\/td>\n<td><var>n<\/var>-bit register or <var>n<\/var>-bit memory<\/td>\n<\/tr>\n<tr>\n<td>r\/m\/i<var>n<\/var><\/td>\n<td><var>n<\/var>-bit register, <var>n<\/var>-bit memory, <var>n<\/var>-bit immediate,<br>or 8-bit immediate sign-extended to <var>n<\/var> bits<\/td>\n<\/tr>\n<\/table>\n<ul>\n<li>If <var>n<\/var> is omitted, then 8, 16, and 32 are permitted.     For example, &#8220;r\/m&#8221; means &#8220;r\/m8, r\/m16, or r\/m32&#8221;.<\/li>\n<li>Immediates are sign-extended as necessary.<\/li>\n<li>The first operand is called &#8220;d&#8221; (destination).<\/li>\n<li>The second operand (if any) is called &#8220;s&#8221; (source).<\/li>\n<li>The third operand (if any) is called &#8220;t&#8221; (second source).<\/li>\n<li>At most one of the operands can be a memory operand.<\/li>\n<li>All operands must have the same size.<\/li>\n<\/ul>\n<p>Exceptions to the above rules will be called out as necessary. <\/p>\n<p>For example: <\/p>\n<pre>\n    ADD     r\/m, r\/m\/i          ; d += s,      set flags\n<\/pre>\n<p>The <code>ADD<\/code> instruction takes two operands. The first is a register or memory, and the second is a register or memory or immediate or single-byte immediate. They cannot both be memory operands. They must be the same size. <\/p>\n<p>Many instructions have a more compact encoding if the destination register is <var>al<\/var>, <var>ax<\/var>, or <var>eax<\/var>. <\/p>\n<p>The assembly language overloads multiple variations of instructions into a single opcode. This is different from most other processors, where each opcode maps to an instruction template, where all that&#8217;s left to fill in are the registers and immediates. For example, the MIPS R4000 <a HREF=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/\">has two different shift opcodes<\/a> depending on whether the shift amount is specified by an immediate or a register. But the 80386 assembly language uses the same opcode for both, and it&#8217;s the assembler&#8217;s job to figure out which variant you intended. <\/p>\n<p>The 80386 does not not perform speculation, does not have an on-chip cache, does not have a branch predictor, and does not reorder memory accesses. Life was simpler then. <\/p>\n<p>Okay, that&#8217;s enough background. We&#8217;ll dig in <a HREF=\"http:\/\/devblogs.microsoft.com\/oldnewthing\/20190122-00\/?p=100755\">next time<\/a> by looking at memory addressing modes. <\/p>\n<p> &sup1; This partial register behavior wasn&#8217;t a big deal at the time, but it ended up creating register dependencies that made it much harder to add out-of-order execution to later versions of the processor. It even created a register version of the <a HREF=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20170428-00\/?p=96065\">store-to-load forwarding<\/a> problem. <\/p>\n<p>The x86-64 architecture took a different approach when it extended the 32-bit registers to 64-bit registers: If the destination register is encoded as a 32-bit subset of a 64-bit register, the upper 32 bits of the destination register are zeroed. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hitting a bit closer to home.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-100745","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Hitting a bit closer to home.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/100745","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=100745"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/100745\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=100745"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=100745"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=100745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}