The ARM processor (Thumb-2), part 9: Sign and zero extension
I noted last time that you could use the bitfield extraction instructions to do zero- and sign-extension of bytes and halfwords to words. But there are dedicated instructions for these operations which have smaller encodings if the source and destination registers are low.
; unsigned extend byte to word uxtb Rd, Rm ; Rd = (uint8_t)Rm ; signed extend byte to word sxtb Rd, Rm ; Rd = (int8_t)Rm ; unsigned extend halfword to word uxth Rd, Rm ; Rd = (uint16_t)Rm ; signed extend halfword to word sxth Rd, Rm ; Rd = (int16_t)Rm
You can optionally apply a rotation to the second register so that you can extract a 8-bit or 16-bit value that sits along a byte boundary.
; unsigned/signed extend byte to word with rotation ; rotation must be a multiple of 8 uxtb Rd, Rm, #rot ; Rd = (uint8_t)(Rm ROR #rot) sxtb Rd, Rm, #rot ; Rd = ( int8_t)(Rm ROR #rot) ; unsigned/signed extend halfword to word with rotation ; rotation must be a multiple of 8 uxth Rd, Rm, #rot ; Rd = (uint16_t)(Rm ROR #rot) sxth Rd, Rm, #rot ; Rd = ( int16_t)(Rm ROR #rot)
It’s kind of weird to apply a 24-bit rotation to extract a halfword, but you can do it if you want to.
You can also zero-extend or sign-extend a word to a doubleword using instructions you already have available:
; zero-extend Rd to Rd/R(d+1) mov R(d+1), #0 ; set to 0 ; sign-extend Rd to Rd/R(d+1) asrs R(d+1), Rd, #31 ; copy sign bit to all bits
The trick is that a signed right-shift by 31 positions ends up filling the entire word with the sign bit. We use the S-version
ASRS because it allows a compact 16-bit encoding if both the source and destination registers are low.
ASR #31 trick can also be used in the
op2 of arithmetic or logical instructions.
; set r0 to zero if r1 is positive or zero and r0, r1, ASR #31
The trick here is that
r1, ASR #31 produces
0xFFFFFFFF if r1 is negative, but
0x00000000 if r1 is positive or zero.
In addition to the straight zero- and sign-extension operations, there are other instructions that combine the extension with another operation. Most of them are focused on multimedia scenarios, but the extend-and-add instructions are more general-purpose, and I have seen the compiler generate the versions with no rotation.
; zero/sign extend and add byte with optional rotation ; rotation must be a multiple of 8 uxtab Rd, Rn, #rot ; Rd = Rd + (uint8_t)(Rn ROR #rot) sxtab Rd, Rn, #rot ; Rd = Rd + ( int8_t)(Rn ROR #rot) ; zero/sign extend and add halfword with optional rotation ; rotation must be a multiple of 8 sxtah Rd, Rn, #rot ; Rd = Rd + ( int16_t)(Rn ROR #rot) uxtah Rd, Rn, #rot ; Rd = Rd + (uint16_t)(Rn ROR #rot)
There’s another instruction that looks like it’d come in handy, particularly in Win32 user interface code that has to pack two 16-bit coordinates into a 32-bit integer, but I haven’t seen any compiler generate it:
; pack halfword bottom-and-top, or top-and-bottom ; shift is optional pkhbt Rd, Rn, Rm, LSL #imm ; Rd = ((Rm LSL #imm) << 16) | (uint16_t)Rn pkhtb Rd, Rn, Rm, ASR #imm ; Rd = (Rn << 16) | (uint16_t)(Rm ASR #imm)
The bottom-and-top version puts the first input register in the bottom part of the output, and the second input parameter goes into the top part. The top-and-bottom version does it the other way. (The top-and-bottom instruction is not redundant because the barrel shifter can be applied only to the second input parameter.)
When the compiler needs to do this, it generates two instructions:
; pack halfword bottom-and-top uxth r12, Rn ; r12 = (uint16_t)Rn orr Rd, r12, Rm, LSL #16 ; Rd = r12 | (Rm << 16) ; = (uint16_t)Rn | (Rm << 16)
Even if it didn’t want to use
PKHBT, it could have used
BFI to pack the values in a single instruction:
; pack halfword bottom-and-top (in place) bfi Rd, Rm, #16, #16 ; Rd[31:16] = Rm[15:0]
Maybe there’s some dirty secret about the
BFI instructions that the compiler knows but I don’t.