Some time ago, I noted that the 8086 was designed so that existing 8080 code could be machine-translated instruction by instruction into 8086. The 8086 BX
register stood in for the HL
register pair on the 8080, and it is also the only register that you could indirect through, mirroring the corresponding limitation on the 8080.
But that explains only part of the story. Yes, the 8086 had to let you indirect through BX
so that 8080 instructions which operate on M
(which was the pseudo-register that represented [HL]
) could be translated into operations on [BX]
. But that doesn’t mean that the 8086 had to forbid indirection through the other registers. After all, the 8086 had plenty of other instructions that didn’t exist on the 8080.
So you can’t take away BX
, but more is better, right? Why didn’t the 8086 let you indirect through AX
, CX
or DX
, as well as BX
?
Basically, because there was no room.
The encoding of two-operand instructions on the 8086 went like this:
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
op | d | w | mod | reg | r/m |
The op
determines the operation to be performed.
The d
is the direction (reg to r/m or r/m to reg).¹
The w
indicates whether it is a byte operation or a word operation.
The mod
is the mode and describes how the r/m
is to be interpreted.
The reg
is the first operand, always a register (although the d
bit can reverse the first and second operands).
The interesting thing here is the mod
+ r/m
combination, since those capture the possible memory operands.
r/m | mode+w | ||||
---|---|---|---|---|---|
00+* | 01+* | 10+* | 11+0 | 11+1 | |
000 | * PTR [BX+SI] | * PTR [BX+SI+imm8] | * PTR [BX+SI+imm16] | AL | AX |
001 | * PTR [BX+DI] | * PTR [BX+DI+imm8] | * PTR [BX+DI+imm16] | CL | CX |
010 | * PTR [BP+SI] | * PTR [BP+SI+imm8] | * PTR [BP+SI+imm16] | DL | DX |
011 | * PTR [BP+DI] | * PTR [BP+DI+imm8] | * PTR [BP+DI+imm16] | BL | BX |
100 | * PTR [SI] | * PTR [SI+imm8] | * PTR [SI+imm16] | AH | SP |
101 | * PTR [DI] | * PTR [DI+imm8] | * PTR [DI+imm16] | CH | BP |
110 | imm | * PTR [BP+imm8] | * PTR [BP+imm16] | DH | SI |
111 | * PTR [BX] | * PTR [BX+imm8] | * PTR [BX+imm16] | BL | DI |
The encoding leaves room for 8 memory addressing modes. We are forced to have [BX]
for compatibility, but we can choose the other seven. You need to be able to indirect through the base pointer so that you can access your local variables and parameters. And it’s expected that you can indirect through SI
and DI
since those are the registers used for block memory operations.
That leaves four more addressing modes, and the architects decided to use the four ways of combining BX
/BP
with SI
/DI
. The BP+x
addressing modes let you access arrays on the stack, and the BX+x
addressing modes let you access arrays on the heap, where SI
and DI
serve as the index registers.
Now, the architects could have chosen to allow indirection through the other three 16-bit registers, but that would have left room for only one array indexing mode. Giving the instructions to the array indexing modes means that you lose [AX]
, [CX]
, and [DX]
, but that’s less of a loss because you can still indirect through [SI]
and [DI]
(and [BP]
, but that’s intended to be the frame pointer, not a general-purpose pointer register).
The other choice would be to increase the number of addressing modes by going to a three-byte instruction encoding, thereby picking up eight more bits. But that seems like quite an excessive step, seeing as the original 8080 consisted only of one-byte instructions. (I’m not counting immediate bytes toward encoding counts for the purpose of this comparison.)
It was a game of trade-offs, and the trade-off was to pick up indexed addressing, and give up on supporting indirection through all of the 16-bit registers.
¹ Note that this means that register-to-register operations can be encoded two ways:
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
op | 0 | w | 1 | 1 | reg1 | reg2 | ||||||||||
op | 1 | w | 1 | 1 | reg2 | reg1 |
These redundant encodings are used by some assemblers to “fingerprint” their output.
Could they have made the bit fields variable, depending on “op”? For example, as you said, “d” wasn’t required for register-to-register copies.
Theoretically, it could have been possible. But in practice, there would have been two problems.
First, encoding parameters in the same way for several opcodes allows you to use the same decoding/fetching circuitry for most move and ALU instructions. In those days, a lower transistor count would mean greater product yields and lower prices. That’s why the 6502 totally eclipsed Motorola’s 6800: it had the lowest transistor count, and thus on launch it was also the cheapest processor, by a wide margin.
Second, having different addressing modes available in different instructions is a very bad idea when you hand write assembly code, which was usual at the time. The 6502 suffered this, but it was almost bearable because its instruction set was small and there weren’t many exceptions, so you learned them after some time. The 8086 had a much larger instruction set, which would have made it painful. In fact, when the 68000 arrived, it was usually praised for the orthogonality of its instruction set (compared with that of the 8086, that is, which in itself was much more uniform than the 8080 it replaced).
As a historical note, it’s worth remembering that the iAPX 432 was the design that was getting all the attention at Intel; it started before the 8086 did.
The 8086 was meant as a stop-gap to allow existing 8080 and 8085 customers to move into the 16 bit world easily, while the iAPX 432 was going to be the next big leap. And the iAPX 432 had variable length instructions – 6 bits for the shortest, 321 bits for the longest – that didn’t need to be byte aligned, so it didn’t have this specific issue.
So one of the issues that you’d have trying to get the 8086 to use more complex decode is that it’s the stop-gap; the thing you’re releasing not because it’s meant to take over the world (that’s the iAPX 432), but the thing you’re releasing so that the world will wait for the next big thing.
Well, the iAPX failed precisely because it was too complex for the technology of the time. It had to be broken in two ICs (ALU and control unit), simply because they couldn’t put enough transistors in a single state-of-the-art die. That drove the costs through the ceiling and hindered the communication speed between both units, which resulted in a system that was incompatible, very expensive, and just a hair more powerful than the 8086 (and slower than Motorola’s 68000). The iAPX was, in many ways, decades ahead of its time – and also decades ahead of the technology needed to build it.
And with 20/20 hindsight, if Intel had known the iAPX 432 was going to fail, they might have paid more attention to making the 8086 a better chip.
That might well have gone badly for the 8086/8088, because one of the reasons the 8086/8088 succeeded is that final decision making on what stayed in the architecture and what didn’t was down to a single person, who could keep the shape of the entire design in their head; in contrast the 8800 or iAPX 432 had design committees responsible for it, and a badly chosen feature could remain alive for months until the committee could be persuaded to ditch it.
I also question whether the iAPX 432 was ahead of its time – it could have been, but it could also have been like the Itanium design. Itanium failed in large part because it assumed that OoOE could not scale nicely, but memory throughput, cache size, and cache throughput would scale. Instead, OoOE scaled nicely, such that all the compiler improvements done for Itanium benefited OoOE CPUs just as much as they did EPIC CPUs, and both memory and caches stalled out relative to the expectations of the Itanium design committee.
I believe that iAPX 432 was similar to Itanium in this regard; it made assumptions about the future of technology that turned out to be wrong (among other things, it assumed that we’d rewrite all software in Ada instead of continuing with Pascal or C), and that it would have been a failure even with better manufacturing technology as a result.
Kind of like how IA-64 was going to take over the 64-bit world, but AMD64 came in with an easier migration path and ate IA-64’s lunch…