The SH-3 does not have a simple “divide two integers please” instruction. Rather, it has a collection of instructions that let you build a division operation yourself.
DIV0U ; prepare for unsigned division DIV0S Rm, Rn ; prepare for signed division Rn ÷ Rm DIV1 Rm, Rn ; generate 1 bit of the quotient Rn ÷ Rm
To begin an integer division operation, you execute either a DIV0U
or DIV0S
instruction, depending on whether you want a signed or unsigned division.
You then perform a number of DIV1
instructions equal to the number of bits of quotient you need, mixed in with other instructions are outlined in the programmer’s manual. After running the desired number of iterations, the result is in either the Rn register or in the register you accumulated the results into, depending on the specific algorithm you used.
These instructions do not attempt to handle division by zero or division overflow. They will simply generate nonsense results. If preventing division by zero or overflow is important to you, you will have to check for them yourself explicitly.
I’m not going to go into the fine details of how these instructions operate. They use Rm and Rn to record the state of the division, with three additional bits of state recorded in the M, Q, and T flags.
It’s basically magic.
In practice, you won’t see the compiler generate these instructions anyway. Instead, the compiler is going to do one of the following:
- If dividing by a constant power of 2, use a shift instruction.
- If dividing by a small constant, multiply by 2³²÷n and extract the high 32 bits of the result.
- Otherwise, call a helper function in the runtime library.
Phew, that was crazy.
Next time, we’ll return to the relative sanity of bitwise logical operations.
Bonus chatter: If you want to see how one compiler implements division on the SH-3 (and if you are okay with being exposed to GPL source code), you can take a look at how GCC implements division in its runtime library.
I spent a while wondering why this seemed familiar, then I rememberd it is like the GreenArrays F18A's multiply-step instruction. It does a conditional add based on the bottom bit of a register and a shift of two registers joined together, so that you can multiply numbers by repeating the multiply-step instruction (though with NOPs in-between, because the multiply-step instruction has a preceding delay-slot, as does the addition instruction).
The F18A is a weird processor;...
That F18A thing is interesting. Whatever happened to it?
I’m just suprised there is a hardware divide at all. Thats quite a luxury!
How did it compare to a pure software integer divide in terms of cycles?
From the description of the instructions, there isn't much of one! Getting one bit of quotient at a time is a tell-tale sign of the type of algorithm used: namely, a basic digit recurrence algorithm (either restoring or non-restoring). The main step in either variant is a single integer subtraction and a 1-bit shift per iteration. The subtraction is just a regular ALU operation; the shift register to record the quotient bits and the "glue...