Unaligned memory access on the MIPS R4000 is performed with pairs of instructions.
LWL rd, n+3(rs) ; load word left LWR rd, n(rs) ; load word right
This is easier to explain with a diagram rather than with a formula.
n+3(rs) | n(rs) | |||||||||||
↓ | ↓ | |||||||||||
AA | BB | CC | DD | EE | FF | GG | HH | |||||
11 | 22 | 33 | 44 | rd | ||||||||
LWL rd, n+3(rs) |
||||||||||||
BB | CC | DD | 44 | rd | ||||||||
LWR rd, n(rs) |
||||||||||||
BB | CC | DD | EE | rd |
You give the “load word left” instruction the effective address of the most significant byte of the unaligned word you want to load, and it picks out the correct bytes from the enclosing word and merges them into the upper bytes of the destination register.
The “load word right” works analogously: You give it the effective address of the least significant byte of the unaligned word you want to load, and it picks out the correct bytes from the enclosing word and merges them into the lower bytes of the destination register.
Since the results are combined via merging, you can issue the LWL
and LWR
instructions in either order, and together they will load the complete four-byte value.¹ (If the address happened to be aligned, then both instructions will load the complete word.)
There are corresponding left/right instructions for storing an unaligned word:
SWL rd, n+3(rs) ; store word left SWR rd, n(rs) ; store word right
These are the counterparts to the load versions. They store the upper and lower part of the word to the corresponding parts of memory.
For unaligned halfword access, you might be tempted to do this:
; Try to load unaligned word unsigned from rs to rd ; Does this work? LWL rd, n+3(rs) ; load word left LWR rd, n(rs) ; load word right ANDI rd, rd, 0xFFFF ; keep the lower 16 bits
Unfortunately, this doesn’t work because the n+3(rs)
might cross into an invalid page. Consider the case where the halfword is the very last halfword on its page: If you tried to load it as a word, you would need to load the first halfword on the next page (to fill the top 16 bits), and that could crash if the next page were invalid.
Instead, you need to perform unaligned halfword access by loading two bytes and combining them:
; Load unaligned word signed from rs to rd LB at, n+1(rs) ; load high byte LBU rd, n(rs) ; load low byte SLL at, at, 8 ; shift high byte into position OR rd, rd, at ; combine the bytes
If you want to load an unaligned word unsigned, you would change the first instruction from LB
to LBU
.
For the same reason as loading, storing an unaligned word is done by storing the bytes separately.
; Store unaligned word to rd from rs SRL at, rs, 8 ; shift high byte into position SB at, n+1(rd) ; store high byte SB rs, n(rd) ; store low byte
The assembler provides pseudo-instructions for these unaligned memory operations:
ULW rs, disp16(rd) ; unaligned load word USW rs, disp16(rd) ; unaligned store word ULH rs, disp16(rd) ; unaligned load halfword signed ULHU rs, disp16(rd) ; unaligned load halfword unsigned USH rs, disp16(rd) ; unaligned store halfword ; and again for absolute addressing ULW rs, global_var ; unaligned load word USW rs, global_var ; unaligned store word ULH rs, global_var ; unaligned load halfword signed ULHU rs, global_var ; unaligned load halfword unsigned USH rs, global_var ; unaligned store halfword
Mind you, these pseudo-instructions don’t help you when debugging. The debugger shows the underlying real instructions.
If you’ve been paying attention, you may have noticed that the ULW rd, disp16(rs)
pseudo-instruction fails if rs and rd happen to be the same register, because the LWL
will damage the base register before it can be used to load the right half. In that case, the assembler uses this alternate version:
LWL at, n+3(rs) ; load word left into temporary LWR at, n(rs) ; load word right into temporary OR rs, at, at ; move to final destination
Okay, next time we’ll look at atomic memory operations.
¹ In versions of the MIPS architecture with load delay slots, there was a special exception for LWL
and LWR
: You were allowed to issue them directly after the other, and they would merge correctly, provided they target different bytes of the same destination register or update the entire destination.
0 comments