Jumping into the middle of an instruction is not as strange as it sounds

Raymond Chen

Reuben Harris and Monte Davidoff spent time disassembling Bill Gates’s original Altair BASIC. In an interview with The Register, Harris was impressed with the code, noting with some admiration, “I found a jump instruction that jumped to the middle of another instruction.”¹

You can find the targets of those jumps in the error handling code: Search for “Three common errors.”

The trick here is that the 8080 uses variable-length instructions. The instruction sequence in question goes like this:

01CD    1E0C    OutOfMemory:    MVI E,0C
01CF    01                      LXI B,....

01D0    1E02    SyntaxError:    MVI E,02
01D2    01                      LXI B,....

01D3    1E14    DivideByZero:   MVI E,14

The 8080 processor has 8-bit registers named A, B, C, D, E, H, and L. Six of these registers can be paired up to create 16-bit pseudo-registers: BC, DE and HL.

The load extended immediate LXI instruction is a three-byte instruction which loads a 16-bit immediate value into a register pair. The first byte specifies the opcode and the destination register pair (in the above example, the BC register pair), and the second and third bytes form the 16-bit immediate.

The move immediate MVI instruction is a two-byte instruction which loads an 8-bit immediate value into a single 8-bit register. The first byte specifies the opcode and the destination register (in the above example, the E register), and the second byte is the 8-bit immediate.

Let’s write out the byte stream that results from jumping to the three labels:

Address	Code byte	`JMP OutOfMemory`	`JMP SyntaxError`	`JMP DivideByZero`
`01CD`	`1E`	`MVI E,0C`
`01CE`	`0C`	`MVI E,0C`
`01CF`	`01`	`LXI B,021E`
`01D0`	`1E`		`MVI E,02`
`01D1`	`02`		`MVI E,02`
`01D2`	`01`	`LXI B,141E`	`LXI B,141E`
`01D3`	`1E`			`MVI E,14`
`01D4`	`14`			`MVI E,14`

If you jump to 01CD, then the CPU performs a MVI E,0C, and then it interprets the 01 as the start of an LXI B instruction, and the next two bytes are treated as the 16-bit immediate operand. On the other hand, if you jump to 01D0, then the bytes that used to be the 16-bit immediate operand of the LXI B instruction are now treated as an MVI E,02 instruction.

You see the same thing happen at 01D3, which hides a two-byte instruction inside the 16-bit immediate operand of another LXI B instruction. If instruction falls through from above, then the CPU executes an LXI B,141E, but if you jump directly to 1D3, then the CPU executes a MVI E,14.

In both cases, the LXI B is just a garbage instruction. It loads some nonsense value into the BC register pair. The code doesn’t care; that register wasn’t holding anything useful anyway. The purpose of the instruction is to soak up the next two bytes and prevent them from being treated as another instruction.

Harris expressed some surprise at finding this, but really, it is a pretty common trick when hand-writing assembly for processors with variable-length instructions: If you want to hide a 1-byte instruction, look for another instruction with a 1-byte immediate, and hide the instruction in the immediate. If you want to hide a 2-byte instruction, hide it inside an instruction with a 2-byte immediate.

The “cloaking” instruction should do something harmless. Instructions like “compare with immediate” work great, since they typically affect only flags, and most of the time, there’s nothing interesting in the flags anyway. However, the 8080 does not have a “compare with 16-bit immediate” instruction, so we have to make do with “load 16-bit immediate” into a register we don’t care about.

On the 6502, the typical instruction for soaking up one or two bytes is the bit test BIT instruction. The argument is the address of the memory to test (either a 1-byte zero page address or a 2-byte absolute address), and the rest of the test goes into the flags register. Executing a garbage BIT instruction therefore reads a byte from some garbage memory location and then sets flags according to the value read. If the flags are subsequently ignored, then this is basically a three-byte NOP.

Microsoft 6502 BASIC had a special macro SKIP2 for generating the first byte of the BIT instruction.

This hacky usage of the BIT instruction is arguably more popular than its designed purpose as a bit-testing instruction!² (Related: The hunt for a faster syscall trap.)

One thing to watch out for is that the CPU does perform a load from the memory address that is the argument to the BIT instruction, so make sure that the two bytes, when reinterpreted as an address, don’t produce an address in an I/O-mapped region. Otherwise, you’ll be issuing inadvertent hardware commands. (The 6502 has no memory manager, so you don’t have to worry about access violations.)

The trick of “soaking up” bytes to generate multiple entry points to a function was employed in 16-bit Windows. For example, you had this sequence:

DelAtom:
    mov     cl, 2
    db      0BBh        ; mov bx, imm16
AddAtom:
    mov     cl, 1
    db      0BBh        ; mov bx, imm16
FindAtom:
    mov     cl, 0
    db      0BBh        ; mov bx, imm16

The three functions all have the same parameters, and they share a lot of code, so the entry points merely set up a function code in the cl register and all fall through to a common implementation.

So, yeah, jumping into the middle of an instruction. It’s a cool trick, but it’s not novel. It was rather commonly employed in the early days of personal computing.

¹ For some reason, that quotation has made its way into online dictionaries as a citation for jump instruction.

² If you’ve done significant work on the 6502, the machine code for this instruction (2C) is probably burned into your brain.

Topics

History

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

11 comments

Discussion is closed. Login to edit/delete existing comments.

Juan Castro January 12, 2022

Reading the “faster syscall” article:

“I was reminded of a meeting that took place between Intel and Microsoft over fifteen years ago. (Sadly, I was not myself at this meeting, so the story is second-hand.)”

Took me some ten seconds to understand that you were not personally at the meeting. At first glance I thought you were there but were immediately possessed by a supernatural entity. Or high.
Steve P January 12, 2022

Reminds me of an anti-disassembly trick I encountered walking through some commercial code with a debugger on Apple II (sorry, don't remember what it was or why I was doing this, it was a long time ago).

The disassembly was clean at the entry point and went on for a few lines of sensible and expected code until it hit a short jmp instruction, after which the code was complete gibberish. That short jmp landed in the middle of a multi-byte instruction above it. Re-aligning the disassembly to this mid-instruction address revealed the code for the remainder of...
Read more
Reminds me of an anti-disassembly trick I encountered walking through some commercial code with a debugger on Apple II (sorry, don’t remember what it was or why I was doing this, it was a long time ago).

The disassembly was clean at the entry point and went on for a few lines of sensible and expected code until it hit a short jmp instruction, after which the code was complete gibberish. That short jmp landed in the middle of a multi-byte instruction above it. Re-aligning the disassembly to this mid-instruction address revealed the code for the remainder of the function. More of a speed-bump than a protection scheme … but cool 🙂

Read less
Neil Rashbrook January 12, 2022

An alternative approach I saw in Sinclair BASIC is to replace the JMP instruction with a CALL (actually an RST) instruction, put the error number as a byte after the instruction, and then extract the return address from the stack and read the error number that way.
- Juan Castro January 12, 2022
  
  Documentation for a good number of processors actually recommends that as a (pardon me) routine way to call routines. Embed immediate parameters in the code (or even non-immediate ones if the code is in RAM), then get them from inside the procedure through the saved instruction pointer (whether in the stack or in a link register).
  - Antonio Rodríguez January 13, 2022 · Edited
    Stack space was scarce these days. The 6502 supported a single stack of 256 bytes (S, the stack pointer register, was 8 bits wide). The 8080/Z80 supported bigger stacks, but you wouldn't want to reserve a 4 KB stack if you had just 16 KB of RAM (of which you usually had to subtract video memory and system globals). In fact, the standard calling convention for Apple ProDOS was placing the call parameters after the JSR (6502's equivalent of CALL):
    <code>
    Each function had a fixed number of parameters, so the address of the first instruction after the call could...
    Read more
    Stack space was scarce these days. The 6502 supported a single stack of 256 bytes (S, the stack pointer register, was 8 bits wide). The 8080/Z80 supported bigger stacks, but you wouldn’t want to reserve a 4 KB stack if you had just 16 KB of RAM (of which you usually had to subtract video memory and system globals). In fact, the standard calling convention for Apple ProDOS was placing the call parameters after the JSR (6502’s equivalent of CALL):
    
    JSR $BF00 ;ProDOS MLI's entry point DB xx xx xx ;Function number and parameters {code continued here}
    
    Each function had a fixed number of parameters, so the address of the first instruction after the call could be calculated by the MLI (the kernel’s call dispatcher).
    
    Read less
  - Juan Castro January 13, 2022
    
    That’s actually due not to stack scarcity, but register scarcity. In CP/M (an OS of somewhat similar complexity) all parameters to system calls are passed in registers, since the 8080 has three 16-bit pointer registers, doubling as 6 8-bit registers, plus the accumulator. And a proper 16-bit stack, of course.
    
    Being experienced in the 8080, Z80, and 6809, I took a look at the 6502 and my reaction was, “You GOTTA be kidding me. No thanks.”
    
    Shame. Maybe if alternate-universe me had been exposed to the 6502 first, I’d actually enjoy programming for it.
  - word merchant January 14, 2022
    
    If you want to see how beautiful 6502 code could be, a good place to start would be a disassembly of the BBC Micro’s OS 1.2 and Basic 2.
    
    Both written by people right at the top of their game (sadly not me).
  - Raymond Chen Author January 13, 2022
    
    The joke is that the 6502 is a 256-register processor (the zero page). The catch is that you are programming in microcode.
  - Henke37 January 12, 2022
    
    Just don’t get in the way of the return predictor when doing that or you will suffer a performance penalty.
  - Raymond Chen Author January 12, 2022
    
    The types of microprocessors where these tricks were common didn’t even have instruction prefetch, much less a return predictor.
Antonio Rodríguez January 11, 2022 · Edited
In the Apple II, at least, the BIT instruction was used a lot for its original purpose: testing I/O status. All indicators for hardware in the motherboard (and in many expansion cards) used the bit 7, which got transferred into the N (sign) flag by the BIT instruction. For example, testing the keyboard and branching if there is a pending keypress was done this way in just two instructions:

<code>

An infinite loop which waited for a keypress could be written in a tight loop of just two instructions, too:

<code>

In these times, most programs skipped the firmware services for I/O and accessed...
Read more
In the Apple II, at least, the BIT instruction was used a lot for its original purpose: testing I/O status. All indicators for hardware in the motherboard (and in many expansion cards) used the bit 7, which got transferred into the N (sign) flag by the BIT instruction. For example, testing the keyboard and branching if there is a pending keypress was done this way in just two instructions:
```
    2C 00 C0    BIT $C000
    30 xx       BMI {target}
```
An infinite loop which waited for a keypress could be written in a tight loop of just two instructions, too:
```
wait:
    2C 00 C0    BIT $C000
    10 FB       BPL wait
```
In these times, most programs skipped the firmware services for I/O and accessed hardware directly, so this use was pretty frequent.
Read less