{"id":98465,"date":"2018-04-09T07:00:00","date_gmt":"2018-04-09T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=98465"},"modified":"2019-03-13T00:45:24","modified_gmt":"2019-03-13T07:45:24","slug":"20180409-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180409-00\/?p=98465","title":{"rendered":"The MIPS R4000, part 6: Memory access (unaligned)"},"content":{"rendered":"<p>Unaligned memory access on the MIPS R4000 is performed with pairs of instructions. <\/p>\n<pre>\n    LWL     rd, n+3(rs)     ; load word left\n    LWR     rd, n(rs)       ; load word right\n<\/pre>\n<p>This is easier to explain with a diagram rather than with a formula. <\/p>\n<table STYLE=\"border-collapse: collapse;text-align: center\" CLASS=\"cp3\" CELLPADDING=\"3\">\n<tr>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 4pc\" COLSPAN=\"4\" ALIGN=\"left\">n+3(rs)<\/td>\n<td STYLE=\"width: 4pc\" COLSPAN=\"4\" ALIGN=\"left\">n(rs)<\/td>\n<td STYLE=\"width: 1px\"><\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 1pc\">&darr;<\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 1px\"><\/td>\n<td STYLE=\"width: 1pc\">&darr;<\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<\/tr>\n<tr>\n<td STYLE=\"border: solid black 1px;width: 1pc\">AA<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">BB<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">CC<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">DD<\/td>\n<td STYLE=\"width: 1px\"><\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">EE<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">FF<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">GG<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">HH<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 9pc\" COLSPAN=\"9\">&nbsp;<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">11<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">22<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">33<\/td>\n<td STYLE=\"width: 1px\"><\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">44<\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 2pc\" COLSPAN=\"2\" ALIGN=\"left\">rd<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 8pc\" COLSPAN=\"8\" ALIGN=\"left\"><code>LWL rd, n+3(rs)<\/code><\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">BB<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">CC<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">DD<\/td>\n<td STYLE=\"width: 1px\"><\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">44<\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 2pc\" COLSPAN=\"2\" ALIGN=\"left\">rd<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 8pc\" COLSPAN=\"8\" ALIGN=\"left\"><code>LWR rd, n(rs)<\/code><\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">BB<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">CC<\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">DD<\/td>\n<td STYLE=\"width: 1px\"><\/td>\n<td STYLE=\"border: solid black 1px;width: 1pc\">EE<\/td>\n<td STYLE=\"width: 1pc\"><\/td>\n<td STYLE=\"width: 2pc\" COLSPAN=\"2\" ALIGN=\"left\">rd<\/td>\n<\/tr>\n<\/table>\n<p>You give the &#8220;load word left&#8221; instruction the effective address of the most significant byte of the unaligned word you want to load, and it picks out the correct bytes from the enclosing word and merges them into the upper bytes of the destination register. <\/p>\n<p>The &#8220;load word right&#8221; works analogously: You give it the effective address of the least significant byte of the unaligned word you want to load, and it picks out the correct bytes from the enclosing word and merges them into the lower bytes of the destination register. <\/p>\n<p>Since the results are combined via merging, you can issue the <code>LWL<\/code> and <code>LWR<\/code> instructions in either order, and together they will load the complete four-byte value.&sup1; (If the address happened to be aligned, then both instructions will load the complete word.) <\/p>\n<p>There are corresponding left\/right instructions for storing an unaligned word: <\/p>\n<pre>\n    SWL     rd, n+3(rs)     ; store word left\n    SWR     rd, n(rs)       ; store word right\n<\/pre>\n<p>These are the counterparts to the load versions. They store the upper and lower part of the word to the corresponding parts of memory. <\/p>\n<p> For unaligned halfword access, you might be tempted to do this: <\/p>\n<pre>\n    ; Try to load unaligned word unsigned from rs to rd\n    ; Does this work?\n    LWL     rd, n+3(rs)     ; load word left\n    LWR     rd, n(rs)       ; load word right\n    ANDI    rd, rd, 0xFFFF  ; keep the lower 16 bits\n<\/pre>\n<p>Unfortunately, this doesn&#8217;t work because the <code>n+3(rs)<\/code> might cross into an invalid page. Consider the case where the halfword is the very last halfword on its page: If you tried to load it as a word, you would need to load the first halfword on the next page (to fill the top 16 bits), and that could crash if the next page were invalid. <\/p>\n<p>Instead, you need to perform unaligned halfword access by loading two bytes and combining them: <\/p>\n<pre>\n    ; Load unaligned word signed from rs to rd\n    LB      at, n+1(rs)     ; load high byte\n    LBU     rd, n(rs)       ; load low byte\n    SLL     at, at, 8       ; shift high byte into position\n    OR      rd, rd, at      ; combine the bytes\n<\/pre>\n<p>If you want to load an unaligned word unsigned, you would change the first instruction from <code>LB<\/code> to <code>LBU<\/code>. <\/p>\n<p>For the same reason as loading, storing an unaligned word is done by storing the bytes separately. <\/p>\n<pre>\n    ; Store unaligned word to rd from rs\n    SRL     at, rs, 8       ; shift high byte into position\n    SB      at, n+1(rd)     ; store high byte\n    SB      rs, n(rd)       ; store low byte\n<\/pre>\n<p>The assembler provides pseudo-instructions for these unaligned memory operations: <\/p>\n<pre>\n    ULW     rs, disp16(rd)  ; unaligned load word\n    USW     rs, disp16(rd)  ; unaligned store word\n    ULH     rs, disp16(rd)  ; unaligned load halfword signed\n    ULHU    rs, disp16(rd)  ; unaligned load halfword unsigned\n    USH     rs, disp16(rd)  ; unaligned store halfword\n\n    ; and again for absolute addressing\n    ULW     rs, global_var  ; unaligned load word\n    USW     rs, global_var  ; unaligned store word\n    ULH     rs, global_var  ; unaligned load halfword signed\n    ULHU    rs, global_var  ; unaligned load halfword unsigned\n    USH     rs, global_var  ; unaligned store halfword\n<\/pre>\n<p>Mind you, these pseudo-instructions don&#8217;t help you when debugging. The debugger shows the underlying real instructions. <\/p>\n<p>If you&#8217;ve been paying attention, you may have noticed that the <code>ULW rd, disp16(rs)<\/code> pseudo-instruction fails if <var>rs<\/var> and <var>rd<\/var> happen to be the same register, because the <code>LWL<\/code> will damage the base register before it can be used to load the right half. In that case, the assembler uses this alternate version: <\/p>\n<pre>\n    LWL     at, n+3(rs)     ; load word left into temporary\n    LWR     at, n(rs)       ; load word right into temporary\n    OR      rs, at, at      ; move to final destination\n<\/pre>\n<p>Okay, next time we&#8217;ll look at atomic memory operations. <\/p>\n<p>&sup1; In versions of the MIPS architecture with load delay slots, there was a special exception for <code>LWL<\/code> and <code>LWR<\/code>: You were allowed to issue them directly after the other, and they would merge correctly, provided they target different bytes of the same destination register or update the entire destination. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Split &#8217;em up.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-98465","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Split &#8217;em up.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/98465","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=98465"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/98465\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=98465"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=98465"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=98465"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}