I noted last time that you could use the bitfield extraction instructions to do zero- and sign-extension of bytes and halfwords to words. But there are dedicated instructions for these operations which have smaller encodings if the source and destination registers are low.
; unsigned extend byte to word uxtb Rd, Rm ; Rd = (uint8_t)Rm ; signed extend byte to word sxtb Rd, Rm ; Rd = (int8_t)Rm ; unsigned extend halfword to word uxth Rd, Rm ; Rd = (uint16_t)Rm ; signed extend halfword to word sxth Rd, Rm ; Rd = (int16_t)Rm
You can optionally apply a rotation to the second register so that you can extract a 8-bit or 16-bit value that sits along a byte boundary.
; unsigned/signed extend byte to word with rotation ; rotation must be a multiple of 8 uxtb Rd, Rm, #rot ; Rd = (uint8_t)(Rm ROR #rot) sxtb Rd, Rm, #rot ; Rd = ( int8_t)(Rm ROR #rot) ; unsigned/signed extend halfword to word with rotation ; rotation must be a multiple of 8 uxth Rd, Rm, #rot ; Rd = (uint16_t)(Rm ROR #rot) sxth Rd, Rm, #rot ; Rd = ( int16_t)(Rm ROR #rot)
It’s kind of weird to apply a 24-bit rotation to extract a halfword, but you can do it if you want to.
You can also zero-extend or sign-extend a word to a doubleword using instructions you already have available:
; zero-extend Rd to Rd/R(d+1) mov R(d+1), #0 ; set to 0 ; sign-extend Rd to Rd/R(d+1) asrs R(d+1), Rd, #31 ; copy sign bit to all bits
The trick is that a signed right-shift by 31 positions ends up filling the entire word with the sign bit. We use the S-version ASRS
because it allows a compact 16-bit encoding if both the source and destination registers are low.
The ASR #31
trick can also be used in the op2
of arithmetic or logical instructions.
; set r0 to zero if r1 is positive or zero and r0, r1, ASR #31
The trick here is that r1, ASR #31
produces 0xFFFFFFFF
if r1 is negative, but 0x00000000
if r1 is positive or zero.
In addition to the straight zero- and sign-extension operations, there are other instructions that combine the extension with another operation. Most of them are focused on multimedia scenarios, but the extend-and-add instructions are more general-purpose, and I have seen the compiler generate the versions with no rotation.
; zero/sign extend and add byte with optional rotation ; rotation must be a multiple of 8 uxtab Rd, Rn, #rot ; Rd = Rd + (uint8_t)(Rn ROR #rot) sxtab Rd, Rn, #rot ; Rd = Rd + ( int8_t)(Rn ROR #rot) ; zero/sign extend and add halfword with optional rotation ; rotation must be a multiple of 8 sxtah Rd, Rn, #rot ; Rd = Rd + ( int16_t)(Rn ROR #rot) uxtah Rd, Rn, #rot ; Rd = Rd + (uint16_t)(Rn ROR #rot)
There’s another instruction that looks like it’d come in handy, particularly in Win32 user interface code that has to pack two 16-bit coordinates into a 32-bit integer, but I haven’t seen any compiler generate it:
; pack halfword bottom-and-top, or top-and-bottom ; shift is optional pkhbt Rd, Rn, Rm, LSL #imm ; Rd = ((Rm LSL #imm) << 16) | (uint16_t)Rn pkhtb Rd, Rn, Rm, ASR #imm ; Rd = (Rn << 16) | (uint16_t)(Rm ASR #imm)
The bottom-and-top version puts the first input register in the bottom part of the output, and the second input parameter goes into the top part. The top-and-bottom version does it the other way. (The top-and-bottom instruction is not redundant because the barrel shifter can be applied only to the second input parameter.)
When the compiler needs to do this, it generates two instructions:
; pack halfword bottom-and-top uxth r12, Rn ; r12 = (uint16_t)Rn orr Rd, r12, Rm, LSL #16 ; Rd = r12 | (Rm << 16) ; = (uint16_t)Rn | (Rm << 16)
Even if it didn’t want to use PKHBT
, it could have used BFI
to pack the values in a single instruction:
; pack halfword bottom-and-top (in place) bfi Rd, Rm, #16, #16 ; Rd[31:16] = Rm[15:0]
Maybe there’s some dirty secret about the PKHBT
and BFI
instructions that the compiler knows but I don’t.
I think you are cheating a little saying BFI is a single instruction pack, since the compiler is packing Rn and Rm into Rd, and you are just putting Rm into Rd (if Rd is Rn, you are modifying Rn when the compiler version didn’t).
Even when the compiler can clobber Rn, it doesn’t use BFI.