{"id":90821,"date":"2015-07-27T07:00:00","date_gmt":"2015-07-27T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20150727-00\/?p=90821\/"},"modified":"2019-03-13T12:17:44","modified_gmt":"2019-03-13T19:17:44","slug":"20150727-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150727-00\/?p=90821","title":{"rendered":"The Itanium processor, part 1: Warming up"},"content":{"rendered":"<p>The Itanium may not have been much of a commercial success, but it is interesting as a processor architecture because it is different from anything else commonly seen today. It&#8217;s like learning a foreign language: It gives you an insight into how others view the world. <\/p>\n<p>The next two weeks will be devoted to an introduction to the Itanium processor architecture, as employed by Win32. (Depending on the reaction to this series, I might also do a series on the Alpha AXP.) <\/p>\n<p>I originally learned this information in order to be able to debug user-mode code as part of the <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2012\/12\/18\/10378851.aspx\">massive port of several million lines of code from 32-bit to 64-bit Windows<\/a>, so the focus will be on being able to read, understand, and debug user-mode code. I won&#8217;t cover kernel-mode features since I never had to learn them. <\/p>\n<p><b>Introduction<\/b> <\/p>\n<p>The Itanium is a 64-bit EPIC architecture. EPIC stands for Explicitly Parallel Instruction Computing, a design in which work is offloaded from the processor to the compiler. For example, the compiler decides which operations can be safely performed in parallel and which memory fetches can be productively speculated. This relieves the processor from having to make these decisions on the fly, thereby allowing it to focus on the real work of processing. <\/p>\n<p><b>Registers overview<\/b> <\/p>\n<p>There are a lot of registers. <\/p>\n<ul>\n<li>128 general-purpose integer registers <var>r0<\/var> through <var>r127<\/var>,     each carrying 64 value bits and a trap bit.     We&#8217;ll learn more about the trap bit later. <\/li>\n<li>128 floating point registers <var>f0<\/var> through <var>f127<\/var>. <\/li>\n<li>64 predicate registers <var>p0<\/var> through <var>p63<\/var>. <\/li>\n<li>8 branch registers <var>b0<\/var> through <var>b7<\/var>. <\/li>\n<li>An instruction pointer, which the     <a HREF=\"http:\/\/msdn.microsoft.com\/en-us\/windows\/hardware\/gg463009.aspx\">    Windows debugging engine<\/a>     for some reason calls <var>iip<\/var>.     (The extra &#8220;i&#8221; is for &#8220;insane&#8221;?) <\/li>\n<li>128 special-purpose registers, not all of which have been given meanings.     These are called &#8220;application registers&#8221; (<var>ar<\/var>) for some reason.     I will cover selected register as they arise during the discussion. <\/li>\n<li>Other miscellaneous registers we will not cover in this series. <\/li>\n<\/ul>\n<p>Some of these registers are further subdivided into categories like <i>static<\/i>, <i>stacked<\/i>, and <i>rotating<\/i>. <\/p>\n<p>Note that if you want to retrieve the value of a register with the Windows debugging engine, you need to prefix it with an at-sign. For example <code>? @r32<\/code> will print the contents of the <var>r32<\/var> register. If you omit the at-sign, then the debugger will look for a variable called <var>r32<\/var>. <\/p>\n<p>A notational note: I am using the register names assigned by the Windows debugging engine. The formal names for the registers are <var>gr#<\/var> for integer registers, <var>fr#<\/var> for floating point registers, <var>pr#<\/var> for predicate registers, and <var>br#<\/var> for branch registers. <\/p>\n<p><b>Static, stacked, and rotating registers<\/b> <\/p>\n<p>These terms describe how the registers participate in register renumbering. <\/p>\n<p><i>Static<\/i> registers are never renumbered. <\/p>\n<p><i>Stacked<\/i> registers are pushed onto a register stack when control transfers into a function, and they pop off the register stack when control transfers out. We&#8217;ll see more about this when we study the calling convention. <\/p>\n<p><i>Rotating<\/i> registers can be cyclically renumbered during the execution of a function. They revert to being stacked when the function ends (and are then popped off the register stack). We&#8217;ll see more about this when we study register rotation. <\/p>\n<p><b>Integer registers<\/b> <\/p>\n<p>Of the 128 integer registers, registers <var>r0<\/var> through <var>r31<\/var> are static, and <var>r32<\/var> through <var>r127<\/var> are stacked (but they can be converted to rotating). <\/p>\n<p>Of the static registers, Win32 assigns them the following mnemonics which correspond to their use in the Win32 calling convention. <\/p>\n<table CLASS=\"cp3\" BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Mnemonic<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><var>r0<\/var><\/td>\n<td><\/td>\n<td>Reads as zero (writes will fault)<\/td>\n<\/tr>\n<tr>\n<td><var>r1<\/var><\/td>\n<td><var>gp<\/var><\/td>\n<td>Global pointer<\/td>\n<\/tr>\n<tr>\n<td><var>r8<\/var>&hellip;<var>r11<\/var><\/td>\n<td><var>ret0<\/var>&hellip;<var>ret3<\/var><\/td>\n<td>Return values<\/td>\n<\/tr>\n<tr>\n<td><var>r12<\/var><\/td>\n<td><var>sp<\/var><\/td>\n<td>Stack pointer<\/td>\n<\/tr>\n<tr>\n<td><var>r13<\/var><\/td>\n<td><\/td>\n<td>TEB<\/td>\n<\/tr>\n<\/table>\n<p>Registers <var>r4<\/var> through <var>r7<\/var> are preserved across function calls. Well, okay, you should also preserve the stack pointer and the TEB if you know what&#8217;s good for you, and there are special rules for <var>gp<\/var> which we will discuss later. The other static variables are scratch (may be modified by the function). <\/p>\n<p>Register <var>r0<\/var> is a register that always contains the value zero. Writes to <var>r0<\/var> trigger a processor exception. <\/p>\n<p>The <var>gp<\/var> register points to the current function&#8217;s global variables. The Itanium has no absolute addressing mode. In order to access a global variable, you need to load it indirectly through a register, and the <var>gp<\/var> register points to the global variables associated with the current function. The <var>gp<\/var> register is kept up to date when code transfers between DLLs by means we&#8217;ll discuss later. (This is sort of a throwback to <!-- backref: MAKEPROCINSTANCE -->the old days of <code>MAKEPROCINSTANCE<\/code><\/a>.) <\/p>\n<p>Every integer register contains 64 value bits and one trap bit, known as not-a-thing, or <i>NaT<\/i>. The NaT bit is used by speculative execution to indicate that the register values are not valid. <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2004\/01\/19\/60162.aspx\">We learned a little about NaT some time ago<\/a>; we&#8217;ll discuss it further when we reach the topic of control speculation. The important thing to know about NaT right now is that if you take a register which is tagged as NaT and try to do arithmetic with it, then the NaT bit is set on the output register. Most other operations on registers tagged as NaT will raise an exception. <\/p>\n<p>The NaT bit means that accessing an uninitialized variable can <i>crash<\/i>. <\/p>\n<pre>\nvoid bad_idea(int *p)\n{\n int uninitialized;\n *p = uninitialized; \/\/ can crash here!\n}\n<\/pre>\n<p>Since the variable <var>uninitialized<\/var> is uninitialized, the register assigned to it might happen to have the NaT bit set, left over from previous execution, at which point trying to save it into memory raises an exception. <\/p>\n<p>You may have noticed that there are four return value registers, which means that you can return up to 32 bytes of data in registers. <\/p>\n<p><b>Floating point registers<\/b> <\/p>\n<table CLASS=\"cp3\" BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><var>f0<\/var><\/td>\n<td>Reads as 0.0 (writes will fault)<\/td>\n<\/tr>\n<tr>\n<td><var>f1<\/var><\/td>\n<td>Reads as 1.0 (writes will fault)<\/td>\n<\/tr>\n<\/table>\n<p>Registers <var>f0<\/var> through <var>f31<\/var> are static, and <var>f32<\/var> through <var>f127<\/var> are rotating. <\/p>\n<p>By convention, registers <var>f0<\/var> through <var>f5<\/var> and <var>f16<\/var> through <var>f31<\/var> are preserved across calls. The others are scratch. <\/p>\n<p>That&#8217;s about all I&#8217;m going to say about floating point registers, since they aren&#8217;t really where the Itanium architecture is exciting. <\/p>\n<p><b>Predicate registers<\/b> <\/p>\n<p>Instead of a flags register, the Itanium records the state of previous comparison operations in dedicated registers known as <i>predicates<\/i>. Each comparison operation indicates which predicates should hold the comparison result, and future instructions can test the predicate. <\/p>\n<table CLASS=\"cp3\" BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><var>p0<\/var><\/td>\n<td>Reads as <var>true<\/var> (writes are ignored)<\/td>\n<\/tr>\n<\/table>\n<p>Predicate registers <var>p0<\/var> through <var>p15<\/var> are static, and <var>p16<\/var> through <var>p63<\/var> are rotating. <\/p>\n<p>You can predicate almost any instruction, and the instruction will execute only if the predicate register is <var>true<\/var>. For example: <\/p>\n<pre>\n(p1) add ret0 = r32, r33\n<\/pre>\n<p>means, &#8220;If predicate <var>p1<\/var> is <var>true<\/var>, then set register <var>ret0<\/var> equal to the sum of <var>r32<\/var> and <var>r33<\/var>. If not, then do nothing.&#8221; The thing inside the parentheses is called the <i>qualifying predicate<\/i> (abbreviated <i>qp<\/i>). <\/p>\n<p>Instructions which execute unconditionally are internally represented as being conditional upon predicate register <var>p0<\/var>, since that register is always <var>true<\/var>. <\/p>\n<p>Actually, I lied when I said that the instruction will execute only if the qualifying predicate is <var>true<\/var>. There is one class of instructions which execute regardless of the state of the qualifying predicate; more on that later. <\/p>\n<p>The Win32 calling convention specifies that predicate registers <var>p0<\/var> through <var>p5<\/var> are preserved across calls, and <var>p6<\/var> through <var>p63<\/var> are scratch. <\/p>\n<p>There is a special pseudo-register called <var>preds<\/var> by the Windows debugging engine which consists of the 64 predicate registers combined into a single 64-bit value. This pseudo-register is used when code needs to save and restore the state of the predicate registers. <\/p>\n<p><b>Branch registers<\/b> <\/p>\n<p>The branch registers are used for indirect jump instructions. The only things you can do with branch registers are load them from an integer register, copy them to an integer register, and jump to them. In particular, you cannot load them directly from memory or do arithmetic on them. If you want to do any of those things, you need to do it with an integer register, then transfer it to a branch register. <\/p>\n<p>The Win32 calling convention assigns the following meanings to the branch registers: <\/p>\n<table CLASS=\"cp3\" BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Register<\/th>\n<th>Mnemonic<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><var>b0<\/var><\/td>\n<td><var>rp<\/var><\/td>\n<td>Return address<\/td>\n<\/tr>\n<\/table>\n<p>The return address register is sometimes called <var>br<\/var>, but the disassembler calls it <var>rp<\/var>, so that&#8217;s what we&#8217;ll call it. <\/p>\n<p>The return address register is set automatically by the processor when a <code>br.call<\/code> instruction is executed. <\/p>\n<p>By convention, registers <var>b1<\/var> through <var>b5<\/var> are preserved across calls, while <var>b6<\/var> and <var>b7<\/var> are scratch. (Exercise: Is <var>b0<\/var> preserved across calls?) <\/p>\n<p><b>Application registers<\/b> <\/p>\n<p>There are a large number of application registers, most of which are not useful to user-mode code. We&#8217;ll introduce the interesting ones as they arise. I&#8217;ve already mentioned one of them already: <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2005\/04\/21\/410397.aspx\"><code>bsp<\/code> is the ia64&#8217;s second stack pointer<\/a>. <\/p>\n<p><b>Break<\/b> <\/p>\n<p>Okay, this was a whirlwind tour of the Itanium register set. I bet your head hurts already, and we haven&#8217;t even started coding yet! <\/p>\n<p>In fact, we&#8217;re not going to be coding for quite some time. <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20150728-00\/?p=90811\">Next time<\/a>, we&#8217;ll look at the instruction format. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>All those registers.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-90821","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>All those registers.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/90821","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=90821"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/90821\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=90821"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=90821"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=90821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}