The SuperH-3, part 13: Misaligned data, and converting between signed vs unsigned values
When going through compiler-generated assembly language, there are some patterns you’ll see over and over again. Note that the code you see may not look exactly like this due to compiler instruction scheduling. In particular, the sequences for misaligned memory access may bring additional registers into play in order to avoid register dependencies.
First, is the unsigned memory access. Bytes and words loaded from memory are sign-extended by default. If you want to load an unsigned value, you need to perform an explicit zero-extension.
; load unsigned byte from address in r0 MOV.B @r0, r1 ; loads sign-extended byte EXTU.B r1, r1 ; zero-extend the byte to a longword ; load unsigned word from address in r0 MOV.W @r0, r1 ; loads sign-extended word EXTU.W r1, r1 ; zero-extend the word to a longword
Next up is misaligned data. The SH-3 does not support unaligned memory access. Not only that, but the kernel doesn’t even emulate unaligned memory access. If you access memory from a misaligned address, you take an access violation and your process crashes. So don’t mess up!
There are no special instructions for accessing misaligned data. You are on your own to take individual bytes and combine them into the desired final value, or to take the starting value and decompose it into bytes.
; store 16-bit value in r1 to possibly unaligned address in r0 ; destroys r1 ; r1 @r0 ; xxxxAABB xx xx MOV.B r1, @r0 ; xxxxAABB BB xx SHLR8 r1 ; 00xxxxAA BB xx MOV.B r1, @(1, r0) ; 00xxxxAA BB AA ; store 32-bit value in r1 to possibly unaligned address in r0 ; destroys r1 ; r1 @r0 ; AABBCCDD xx xx xx xx MOV.B r1, @r0 ; AABBCCDD DD xx xx xx SHLR8 r1 ; 00AABBCC DD xx xx xx MOV.B r1, @(1, r0) ; 00AABBCC DD CC xx xx SHLR8 r1 ; 0000AABB DD CC xx xx MOV.B r5, @(2, r0) ; 0000AABB DD CC BB xx SHLR8 r1 ; 000000AA DD CC BB xx MOV.B r1, @(3, r0) ; 000000AA DD CC BB AA ; read 16-bit value from possibly unaligned address in r0 ; r1 r2 @r0 ; xxxxxxxx xxxxxxxx BB AA MOV.B @(1, r0), r1 ; SSSSSSAA xxxxxxxx SHLL8 r1 ; SSSSAA00 xxxxxxxx MOV.B @r0, r2 ; SSSSAA00 SSSSSSBB EXTU.B r2, r2 ; SSSSAA00 000000BB OR r1, r2 ; SSSSAA00 SSSSAABB ; r2 contains signed 16-bit value EXTU.W r2, r2 ; SSSSAA00 0000AABB ; r2 contains unsigned 16-bit value ; read 32-bit value from possibly unaligned address in r0 ; r1 r2 @r0 ; xxxxxxxx xxxxxxxx DD CC BB AA MOV.B @(3, r0), r1 ; SSSSSSAA xxxxxxxx SHLL8 r1 ; SSSSAA00 xxxxxxxx MOV.B @(2, r0), r2 ; SSSSAA00 SSSSSSBB EXTU.B r2, r2 ; SSSSAA00 000000BB OR r2, r1 ; SSSSAABB 000000BB SHLL8 r1 ; SSAABB00 000000BB MOV.B @(1, r0), r2 ; SSAABB00 SSSSSSCC EXTU.B r2, r2 ; SSAABB00 000000CC OR r2, r1 ; SSAABBCC 000000CC SHLL8 r1 ; AABBCC00 000000CC MOV.B @r0, r2 ; AABBCC00 SSSSSSDD EXTU.B r2, r2 ; AABBCC00 000000DD OR r1, r2 ; AABBCC00 AABBCCDD
Less often, you will see code that sign-extends a 32-bit value to a 64-bit value.
; sign-extend 32-bit value in r0 to 64-bit value in r1:r0 MOV r0, r1 ; copy value to r1 SHLL r1 ; T contains high bit of value SUBC r1, r1 ; if T=0, then r1 = 00000000 ; if T=1, then r1 = FFFFFFFF
If you happen to have the value 0 lying around in a register, you could accomplish the task in two instructions:
; sign-extend 32-bit value in r0 to 64-bit value in r1:r0 ; assumes that r2 already contains the value zero CMP/GT r0, r2 ; T = (0 > r0) ; in other words, T=0 if r0 is positive or zero ; T=1 if r0 is negative SUBC r1, r1 ; if T=0, then r1 = 00000000 ; if T=1, then r1 = FFFFFFFF
That is just code golf on my part. I haven’t seen the compiler use this trick, or the next one.
; sign-extend 32-bit value in r0 to 64-bit value in r1:r0 ; preserves flags ROTCL r0 ; rotate r0 left, copying high bit into T ; and saving old T in low bit of r0 SUBC r1, r1 ; if T=0, then r1 = 00000000, T stays 0 ; if T=1, then r1 = FFFFFFFF, T stays 1 ROTCR r0 ; rotate r0 right to restore original value ; and recover original value of T
In general, you’ll see that SH-3 assembly code is somewhat verbose, even more so because compiler technology back in this time period was not as advanced as it is today, but you have to realize that each of these instructions is only half the size of the instructions of its RISC-style contemporaries, so even though you plowed through 2000 instructions, that’s only 4KB of code.
Okay, next time, we’re returning to reality and looking at function call patterns.