{"id":99425,"date":"2018-08-06T07:00:00","date_gmt":"2018-08-06T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=99425"},"modified":"2019-03-13T00:37:53","modified_gmt":"2019-03-13T07:37:53","slug":"20180806-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180806-00\/?p=99425","title":{"rendered":"The PowerPC 600 series, part 1: Introduction"},"content":{"rendered":"<p>The PowerPC is a RISC processor architecture which grew out of IBM&#8217;s <a HREF=\"https:\/\/en.wikipedia.org\/wiki\/IBM_POWER_Instruction_Set_Architecture\">POWER<\/a> architecture. Windows NT support was introduced in Windows NT 3.51, and it didn&#8217;t last long; the last version to support it was Windows NT 4.0. Despite not being supported by the flagship operating system, it continued to be supported by Windows CE, and a later version of the PowerPC was chosen as the processor for the Xbox 360. <\/p>\n<p>As with all the processor retrospective series, I&#8217;m going to focus on how Windows NT used the PowerPC in user mode because the original audience for all of these discussions was user-mode developers trying to get up to speed debugging their programs on PowerPC. <\/p>\n<p>The PowerPC 600 series started out as a 32-bit processor, with 64-bit support arriving in the 620. The earliest record I can find (not that I looked very hard) shows Windows NT supporting the 603 and 604 processors. I guess this makes sense, because Wikipedia says that the 603 was <a HREF=\"https:\/\/en.wikipedia.org\/wiki\/PowerPC_600#PowerPC_603\">the first processor to support the full PowerPC instruction set<\/a>. The 603 could complete a maximum of two instructions per cycle; the 604 could do up to four. The 603 did not have a dynamic branch predictor, but the 604 did. Both could forward arithmetic operations into the next arithmetic operation, so consecutive integer arithmetic operations did not stall, even if the second depended on the result of the first. <\/p>\n<p>The PowerPC 600 series processors are natively big-endian, with an option for little-endian operation. Windows NT uses the processor in 32-bit little-endian mode.&sup1; Even though the processor can be put into little-endian mode, this affects only how bytes are swapped when they are read from or written to memory; the instructions themselves still operate in a big-endian way, Among other things, the bits in a register are numbered from most-significant to least-significant: Bit 0 is the high-order bit, and bit 31 is the low-order bit. <\/p>\n<p>The PowerPC has 32 integer registers, each 32 bits wide. They are officially named <var>GPR0<\/var> through <var>GPR31<\/var>, but the assembler just calls them <var>0<\/var> through <var>31<\/var>. This is ridiculously confusing,&sup2; so nobody uses the purely numeric names. People call them <var>r0<\/var> through <var>r31<\/var>. (Some assemblers call them <var>r.0<\/var> through <var>r.31<\/var>.) <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Mnemonic<\/th>\n<th>Meaning<\/th>\n<th>Preserved?<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>gpr0<\/var><\/td>\n<td><var>r0<\/var><\/td>\n<td><\/td>\n<td>No<\/td>\n<td>Of limited use<\/td>\n<\/tr>\n<tr>\n<td><var>gpr1<\/var><\/td>\n<td><var>r1<\/var><\/td>\n<td>stack pointer<\/td>\n<td>Yes<\/td>\n<td>Includes 232-byte negative red zone<\/td>\n<\/tr>\n<tr>\n<td><var>gpr2<\/var><\/td>\n<td><var>r2<\/var><\/td>\n<td>table of contents<\/td>\n<td>Yes, mostly<\/td>\n<td>Access to global variables<\/td>\n<\/tr>\n<tr>\n<td><var>gpr3<\/var>&hellip;<var>gpr10<\/var><\/td>\n<td><var>r3<\/var>&hellip;<var>r10<\/var><\/td>\n<td>argument<\/td>\n<td>No<\/td>\n<td>On function entry, contains function parameters<\/td>\n<\/tr>\n<tr>\n<td><var>gpr11<\/var><\/var><\/td>\n<td><var>r11<\/var><\/td>\n<td>temporary<\/td>\n<td>No<\/td>\n<td>For function glue<\/td>\n<\/tr>\n<tr>\n<td><var>gpr12<\/var><\/td>\n<td><var>r12<\/var><\/td>\n<td>temporary<\/td>\n<td>No<\/td>\n<td>prologue and epilogue helper<\/td>\n<\/tr>\n<tr>\n<td><var>gpr13<\/var><\/td>\n<td>r13<\/td>\n<td>read-only<\/td>\n<td>Yes<\/td>\n<td>TEB<\/td>\n<\/tr>\n<tr>\n<td><var>gpr14<\/var>&hellip;<var>gpr31<\/var><\/td>\n<td><var>r14<\/var>&hellip;<var>r31<\/var><\/td>\n<td>saved<\/td>\n<td>Yes<\/td>\n<td><\/td>\n<\/tr>\n<\/table>\n<p>Note that this does not exactly line up with the PowerPC register conventions for other platforms. (Many other platforms assign special meanings to <var>gpr11<\/var> through <var>gpr13<\/var>.) <\/p>\n<p>The stack must be kept on an 8-byte boundary. There is a large red zone of 232 bytes at negative offsets from the stack pointer. We&#8217;ll see the importance of this when we look at function prologues. <\/p>\n<p>The function return value is placed in <var>r3<\/var>. <\/p>\n<p>The <var>r0<\/var> register is of limited use because many instructions cannot use a source of <var>r0<\/var>. We&#8217;ll see more about that later. <\/p>\n<p>We&#8217;ll learn about the table of contents, function glue, and epilogue\/prologue helpers later when we cover Windows NT software conventions. <\/p>\n<p>In addition to the general-purpose integer registers, there are a number of special-purpose 32-bit integer registers. There are only nineteen of these special-purpose registers, but the numbers range from <var>spr1<\/var> to <var>spr1013<\/var>. (The number space is very sparsely populated, but I guess they reserved room for adding more registers in the future.) These are the ones you&#8217;re likely to see in user-mode code: <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Mnemonic<\/th>\n<th>Meaning<\/th>\n<th>Preserved?<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>spr1<\/var><\/td>\n<td><var>xer<\/var><\/td>\n<td>Status bits<\/td>\n<td>No<\/td>\n<td>Integer exception register<\/td>\n<\/tr>\n<tr>\n<td><var>spr8<\/var><\/td>\n<td><var>lr<\/var><\/td>\n<td>link register<\/td>\n<td>No<\/td>\n<td>On function entry, contains return address<\/td>\n<\/tr>\n<tr>\n<td><var>spr9<\/var><\/td>\n<td><var>ctr<\/var><\/td>\n<td>counter<\/td>\n<td>No<\/td>\n<td>Dedicated counter or jump target<\/td>\n<\/tr>\n<tr>\n<td><var>fpscr<\/var><\/td>\n<td><var>fpscr<\/var><\/td>\n<td>Status bits<\/td>\n<td>?<\/td>\n<td>Floating point status and control register<\/td>\n<\/tr>\n<\/table>\n<p>I&#8217;ve never had to deal with floating point on the PowerPC, so I don&#8217;t know what parts of <var>fpscr<\/var> need to be preserved and what parts don&#8217;t. <\/p>\n<p>We&#8217;ll learn more about the other special registers as the need arises. <\/p>\n<p>Remember how the Itanium, MIPS, and Alpha don&#8217;t have a flags register? Well, the PowerPC scoffs at them. &#8220;Flags register? You say you want a flags register? I&#8217;ve got your flags register right here. In fact, I&#8217;ve got <i>eight sets<\/i> of flags registers.&#8221; They are named <var>cr0<\/var> through <var>cr7<\/var>, each four bits wide. (The &#8220;cr&#8221; stands for <i>condition register<\/i>.) The pseudo-register <var>cr<\/var> can be used to treat them as one giant 32-bit register.&sup3; Remember that the PowerPC is a big-endian processor, so <var>cr0<\/var> occupies the most significant bits of <var>cr<\/var>, and so <var>cr7<\/var> occupies the least significant bits. <\/p>\n<p>Condition register <var>cr0<\/var> is the implicit target of integer operations, and condition register <var>cr1<\/var> is the implicit target of floating point operations. I don&#8217;t know which condition registers must be preserved across calls, because I&#8217;ve never found any code that needed to. <\/p>\n<p>The PowerPC also has 32 floating-point double-precision registers, officially named <var>FPR0<\/var> through <var>FPR31<\/var>. <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Mnemonic<\/th>\n<th>Preserved?<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>fpr0<\/var><\/td>\n<td><var>f0<\/var><\/td>\n<td>No<\/td>\n<td>temporary<\/td>\n<\/tr>\n<tr>\n<td><var>fpr1<\/var>&hellip;<var>fpr13<\/var><\/td>\n<td><var>f1<\/var>&hellip;<var>f13<\/var><\/td>\n<td>No<\/td>\n<td>Function parameters<\/td>\n<\/tr>\n<tr>\n<td><var>fpr14<\/var>&hellip;<var>fpr31<\/var><\/td>\n<td><var>f14<\/var>&hellip;<var>f31<\/var><\/td>\n<td>Yes<\/td>\n<td><\/td>\n<\/tr>\n<\/table>\n<p>As for instruction encoding, each instruction is 32 bits wide and must be aligned on a four-byte boundary. The instruction whose encoding is <code>0x00000000<\/code> is reserved as an invalid instruction, so trying to execute a page of zeros will instantly fault. <\/p>\n<p>The general syntax for multi-operand opcodes is <\/p>\n<pre>\n    opcode  destination, source1, source2, source3...\n<\/pre>\n<p>with the notable exception of store instructions, which put the source register on the left and the address destination on the right. <\/p>\n<p>The architectural terms for operand sizes are <i>byte<\/i>, <i>halfword<\/i> (2 bytes), <i>word<\/i> (4 bytes), <i>doubleword<\/i> (8 bytes), and <i>quadword<\/i> (16 bytes). In 32-bit operation, the largest unit that can be operated on directly is the word. <\/p>\n<p>In opcode names, the word <i>arithmetic<\/i> is used to emphasize that the operands are treated as signed (usually abbreviated <code>a<\/code>), and the words <i>logical<\/i> (<code>l<\/code>) and <i>unsigned<\/i> (<code>u<\/code>) or sometimes <i>zero-extended<\/i> (<code>z<\/code>) are used to emphasize that the operands are treated as unsigned. I guess they couldn&#8217;t make up their mind what to call it unsigned operations, so they chose one at random each time they needed one. Note further that these conventions are not uniformly applied, so stay alert. <\/p>\n<p>The processor maintains the fiction that every instruction is retired completely before the next one starts. Consequently, there are no architectural branch delay slots or load delay slots. It also means that when an exception is raised, all instructions preceding the exception have run to completion, and no instructions after the exception will appear to have started. <\/p>\n<p>Internally, the processor may perform operations out of order or in parallel or speculatively, and it may introduce stalls if your dependencies are too close together, but the processor does its best to hide this from the code being executed. <\/p>\n<p>There are two notable exceptions to the principle of sequential operation: <\/p>\n<ul>\n<li>Floating point exceptions in imprecise mode     can be delayed beyond the instruction that triggered the exception. <\/li>\n<li>Self-modifying code requires special instructions     to evict the old instructions out of the I-cache. <\/li>\n<\/ul>\n<p>Both reads and writes to memory <a HREF=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/ee418650(v=vs.85).aspx\">can be reordered<\/a>, and reads can be speculated. Storing a value may partly succeed before raising an exception. (For example, an unaligned store that crosses into an invalid page may write to the valid page and then take an exception on the invalid page.) <\/p>\n<p>Okay, that&#8217;s enough background. We&#8217;ll pick up <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20180807-00\/?p=99435\">next time<\/a> by taking a closer look at those condition registers. <\/p>\n<p>&sup1; When the processor is in 32-bit mode, you can still execute 64-bit instructions. However, since Windows NT did not require a 64-bit capable version of the PowerPC processor, PowerPC programs for Windows NT had to perform runtime detection of 64-bit support and run either a 32-bit friendly version of the code or a 64-bit version of the code. In practice, nobody did this. They just stuck to 32-bit code. (Even though you could use 64-bit instructions in 32-bit mode, the ABI preserves only the least-significant 32 bits of saved registers.) <\/p>\n<p>&sup2; The designers of the PowerPC assembly language appear to be dedicated to making their instruction set as confusing as possible by making the assembly language be just barely more readable than machine code. For example, to say &#8220;Decrement the counter, and branch if the result is zero and the <var>eq<\/var> flag is set in <var>cr3<\/var>&#8220;, they want you to write <\/p>\n<pre>\n    bc  2, 14, destination\n<\/pre>\n<p>Because obviously 2 means &#8220;decrement counter and branch if the result is zero and the specific flag is set&#8221;, and naturally 14 means &#8220;the <var>eq<\/var> flag in <var>cr3<\/var>.&#8221; <\/p>\n<p>The Windows disassembler substitutes names for some (but not all) of these magic numbers at disassembly so you don&#8217;t have to remember all the codes. <\/p>\n<p>&sup3; You might think, &#8220;Who&#8217;s to say which is the real register and which is the pseudo-register? You could equivalently think of <var>cr<\/var> as the real register, and the <var>cr#<\/var> registers as pseudo-registers!&#8221; Perhaps so, but the processor can execute operations on different <var>cr#<\/var> registers in parallel. If <var>cr<\/var> were the real register, then you would expect multiple operations on different <var>cr#<\/var> registers to be dependent on each other since they are all operating on <var>cr<\/var>. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here we go again.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-99425","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Here we go again.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=99425"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99425\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=99425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=99425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=99425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}