The ARM processor (Thumb-2), part 9: Sign and zero extension

I noted last time that you could use the bitfield extraction instructions to do zero- and sign-extension of bytes and halfwords to words. But there are dedicated instructions for these operations which have smaller encodings if the source and destination registers are low.

    ; unsigned extend byte to word
    uxtb    Rd, Rm      ; Rd = (uint8_t)Rm

    ; signed extend byte to word
    sxtb    Rd, Rm      ; Rd = (int8_t)Rm

    ; unsigned extend halfword to word
    uxth    Rd, Rm      ; Rd = (uint16_t)Rm

    ; signed extend halfword to word
    sxth    Rd, Rm      ; Rd = (int16_t)Rm

You can optionally apply a rotation to the second register so that you can extract a 8-bit or 16-bit value that sits along a byte boundary.

    ; unsigned/signed extend byte to word with rotation
    ; rotation must be a multiple of 8
    uxtb    Rd, Rm, #rot ; Rd = (uint8_t)(Rm ROR #rot)
    sxtb    Rd, Rm, #rot ; Rd = ( int8_t)(Rm ROR #rot)

    ; unsigned/signed extend halfword to word with rotation
    ; rotation must be a multiple of 8
    uxth    Rd, Rm, #rot ; Rd = (uint16_t)(Rm ROR #rot)
    sxth    Rd, Rm, #rot ; Rd = ( int16_t)(Rm ROR #rot)

It’s kind of weird to apply a 24-bit rotation to extract a halfword, but you can do it if you want to.

You can also zero-extend or sign-extend a word to a doubleword using instructions you already have available:

    ; zero-extend Rd to Rd/R(d+1)
    mov     R(d+1), #0          ; set to 0

    ; sign-extend Rd to Rd/R(d+1)
    asrs    R(d+1), Rd, #31     ; copy sign bit to all bits

The trick is that a signed right-shift by 31 positions ends up filling the entire word with the sign bit. We use the S-version ASRS because it allows a compact 16-bit encoding if both the source and destination registers are low.

The ASR #31 trick can also be used in the op2 of arithmetic or logical instructions.

    ; set r0 to zero if r1 is positive or zero
    and     r0, r1, ASR #31

The trick here is that r1, ASR #31 produces 0xFFFFFFFF if r1 is negative, but 0x00000000 if r1 is positive or zero.

In addition to the straight zero- and sign-extension operations, there are other instructions that combine the extension with another operation. Most of them are focused on multimedia scenarios, but the extend-and-add instructions are more general-purpose, and I have seen the compiler generate the versions with no rotation.

    ; zero/sign extend and add byte with optional rotation
    ; rotation must be a multiple of 8
    uxtab   Rd, Rn, #rot        ; Rd = Rd + (uint8_t)(Rn ROR #rot)
    sxtab   Rd, Rn, #rot        ; Rd = Rd + ( int8_t)(Rn ROR #rot)

    ; zero/sign extend and add halfword with optional rotation
    ; rotation must be a multiple of 8
    sxtah   Rd, Rn, #rot        ; Rd = Rd + ( int16_t)(Rn ROR #rot)
    uxtah   Rd, Rn, #rot        ; Rd = Rd + (uint16_t)(Rn ROR #rot)

There’s another instruction that looks like it’d come in handy, particularly in Win32 user interface code that has to pack two 16-bit coordinates into a 32-bit integer, but I haven’t seen any compiler generate it:

    ; pack halfword bottom-and-top, or top-and-bottom
    ; shift is optional
    pkhbt   Rd, Rn, Rm, LSL #imm ; Rd = ((Rm LSL #imm) << 16) | (uint16_t)Rn
    pkhtb   Rd, Rn, Rm, ASR #imm ; Rd = (Rn << 16) | (uint16_t)(Rm ASR #imm)

The bottom-and-top version puts the first input register in the bottom part of the output, and the second input parameter goes into the top part. The top-and-bottom version does it the other way. (The top-and-bottom instruction is not redundant because the barrel shifter can be applied only to the second input parameter.)

When the compiler needs to do this, it generates two instructions:

    ; pack halfword bottom-and-top
    uxth    r12, Rn                 ; r12 = (uint16_t)Rn
    orr     Rd, r12, Rm, LSL #16    ; Rd = r12 | (Rm << 16)
                                    ;    = (uint16_t)Rn | (Rm << 16)

Even if it didn’t want to use PKHBT, it could have used BFI to pack the values in a single instruction:

    ; pack halfword bottom-and-top (in place)
    bfi     Rd, Rm, #16, #16        ; Rd[31:16] = Rm[15:0]

Maybe there’s some dirty secret about the PKHBT and BFI instructions that the compiler knows but I don’t.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

2 comments

Pete Wilson June 10, 2021

I think you are cheating a little saying BFI is a single instruction pack, since the compiler is packing Rn and Rm into Rd, and you are just putting Rm into Rd (if Rd is Rn, you are modifying Rn when the compiler version didn’t).

Raymond Chen Author June 10, 2021

Even when the compiler can clobber Rn, it doesn’t use BFI.

The ARM processor (Thumb-2), part 9: Sign and zero extension

Author

2 comments

Read next

The ARM processor (Thumb-2), part 10: Memory access and alignment

The ARM processor (Thumb-2), part 11: Atomic access and barriers

Author

2 comments

Read next

The ARM processor (Thumb-2), part 10: Memory access and alignment

The ARM processor (Thumb-2), part 11: Atomic access and barriers

Stay informed