April 1st, 2024

8 reactions

Subroutine calls in the ancient world, before computers had stacks or heaps

Raymond Chen

We take stacks and heaps for granted nowadays, but back in the very old days of computing, computers operated without a stack or a heap.

Tell a recent college graduate this, and you may as well tell them that there was a time when you didn’t have instant access to millions of cat videos.

It’s not too hard to imagine computing without dynamic memory allocation. You just have to use fixed-size memory buffers for everything. If you have to operate on variable-sized data, you reserved a fixed-size buffer of some capacity that is large enough to accommodate any data you would reasonably be expected to process, and if somebody asked for more, you just exited the program with a fatal error. If you were really nice, you would provide a compile-time configuration so your clients could adjust the maximum capacity to suit their datasets. And if you were really fancy, you wrote a custom allocator that operated on that fixed-size buffer so people could “allocate” and “free” memory from the buffer.

But operating without a stack? How did you call a function if you didn’t have a stack for the return address or local variables?

Here’s how it worked.

First, the compiler defined a secret global variable for each inbound function parameter, plus another secret global variable for each function to hold the return address. It also defined a secret global variable for each of the function’s local variables.

To generate a function call, the compiler assigned the parameter values to the corresponding secret global variables, assigned the return address to the function’s secret “return address variable”, and then jumped to the start of the function.

The function read its parameters from its secret global variables, and used the pre-defined secret global variables that corresponded to its logically local variables. When the function was finished, it jumped to the address held in the function’s secret “return address variable.”

For example, suppose you had code like this, written in a C-like language:

int add_two_values(int a, int b)
{
    int c = a + b;
    return c;
}

void sample()
{
    int x = add_two_values(31415, 2718);
}

This would be transformed by the compiler into something like this:

int a2v_a;
int a2v_b;
int a2v_c;
void* a2v_retaddr;

int add_two_values()
{
    a2v_c = a2v_a + a2v_b;

    return_value_register = a2v_c;
    goto a2v_retaddr;
}

int sample_x;
void sample()
{
    a2v_a = 31415;
    a2v_b = 2718;
    a2v_retaddr = &resume;
    goto add_two_values;
resume:
    sample_x = return_value_register;
}

Check it out: We did a function call and return without a stack!

Now, you can optimize the ABI by, say, passing some of these values in registers rather than globals. For example, most processors had a special “link” register and a special instruction “branch with link” that automatically set the link register equal to the address of the instruction after the “branch with link” instruction, And maybe you optimize the calling convention to pass the first two parameters in registers, resulting in this:

int a2v_a;
int a2v_b;
int a2v_c;
void* a2v_retaddr;

int add_two_values()
{
    a2v_a = argument_register_1;
    a2v_b = argument_register_2;
    a2v_retaddr = link_register;

    a2v_c = a2v_a + a2v_b;

    return_value_register = a2v_c;
    goto a2v_retaddr;
}

int sample_x;
void sample()
{
    argument_register_1 = 31415;
    argument_register_2 = 2718;
    branch_with_link add_two_values;
    sample_x = return_value_register;
}

There was just one catch: You can’t do recursion.

Recursion doesn’t work because a recursive call would overwrite the return-address variable with the return address of the recursive call, and when the outer call completed, it would jump to the wrong place.

The programming languages of the day solved this problem by simply declaring it illegal: They didn’t support recursion.¹

Bonus chatter: Some compilers were even sneakier and used self-modifying code: The special return-address variable was really the address field of the jump instruction at the end of the function!

This was occasionally not so much a sneaky trick as a practical necessity: The processor might not support indirect jumps either!

After the practical value of subroutines was recognized, quite a few processors added a subroutine call instruction that worked by storing the return address at the first word of the subroutine, and beginning execution at the second word of the subroutine. To return from a subroutine, you execute an indirect jump through the subroutine start label. (As I recall, some processors stored the return address at the word before the first instruction of the subroutine.) Here’s what it looked like using a made-up assembly language:

add_two_values:
    nop                     ; return address goes here
    add   r1 = r1, r2       ; actual subroutine begins here
    jmp   @add_two_values   ; indirect jump to return address

sample:
    mov   r1 = 31415        ; first parameter
    mov   r2 = 2718         ; second parameter
    bsr   add_two_values    ; call subroutine
    st    sample_x = r1     ; save return value

When the CPU executed the bsr branch-to-subroutine instruction, it stored the return address into the first word of add_two_values (overwriting the sacrificial nop) and began execution at the following instruction, the add r1 = r1, r2.

¹ FORTRAN initially didn’t even support subroutines! Those were added in 1958. And support in FORTRAN for recursion didn’t become standard until 1991, and even then, you had to explicitly declare your subroutine as RECURSIVE.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

19 comments

Discussion is closed. Login to edit/delete existing comments.

Nick April 8, 2024

Since I can’t reply to an unapproved comment, just wondering why my earlier comment from April 1 was never approved. I’m sure you’re busy, but otherwise, did I do something wrong to cause it to get stuck (the YouTube link)?
Stephan Leclercq April 5, 2024
And then there was COBOL… You can’t pretend you understand spaghetti code unless you’ve written code like this at least once in your lifetime:
```
    PERFORM PARAGRAPH-A THROUGH PARAGRAPH-Z.
    ALTER PARAGRAPH-B TO PROCEED TO PARAGRAPH-D.
```
Xuejun Yang April 4, 2024

Hey, this is exactly what we did for this paper: https://users.cs.utah.edu/~regehr/papers/lctes062-yang.pdf. By translating function calls to a bunch of goto statements. Our research shows it makes sense for embedded applications when the memory is a scarce resource.
Richard Deeming April 4, 2024

My first thought on reading the title was of using GOSUB and RETURN on a ZX Spectrum. 🙂
- Yuri Khan April 4, 2024
  
  Yes and no. Early BASICs had a return address stack but no argument/locals stack. (No subroutine arguments nor local variables as such.) I once did a recursive Sierpinski triangle in those restrictions, simulating locals stack over a few integer arrays. Luckily, a maximum recursion depth of about 6 or 8 was quite sufficient for that time’s screen pixel counts.
Mark Yagnatinsky April 3, 2024

Aside from not supporting recursion, this sounds like it would not cope well with function pointers, right?
- Raymond Chen Author April 3, 2024
  
  Programming languages of the day generally didn’t have the concept of “function pointers”. But if they did, you can make function pointers be, say, a pointer to a structure that has the code address (read-only), the return address (read-write), and the parameters (read-write).
  - Joshua Hudson April 13, 2024
    
    Register calling conventions also date back pretty far; in the case of the return address being at the start of the function, it would work; however with old processors you will find yourself limited to functions that take one or two arguments.
aidtopia April 3, 2024

It’s not just the ancient world! There are still domains today where dynamic memory allocation is not allowed and you have to guarantee an upper bound on the stack size (which often implies no recursion). If your embedded system is a critical safety component, you probably have to conform to guidelines, like MISRA, that prohibit such perilous luxuries.

“[J]ust [exit] the program with a fatal error.” Ha! not unless you can guarantee the program will be immediately restarted in a known good state. 😉
紅樓鍮 April 2, 2024 · Edited

Stackless coroutines in C++ support inline allocation, if the compiler can prove lifetime shenanigans, can see the definition of the callee coroutine (so it knows how many bytes its coroutine frame consumes) and, of course, if the call isn't recursive. This process can then continue on for the transitive callees of the callee, and the end result is a big, fixed size contiguous buffer that's just big enough to provide memory for the execution of the entire coroutine call hierarchy. It's the compiler taking code that you wrote assuming the omnipresence of the stack, analyzing it, and emitting code in...
Read more
Stackless coroutines in C++ support inline allocation, if the compiler can prove lifetime shenanigans, can see the definition of the callee coroutine (so it knows how many bytes its coroutine frame consumes) and, of course, if the call isn’t recursive. This process can then continue on for the transitive callees of the callee, and the end result is a big, fixed size contiguous buffer that’s just big enough to provide memory for the execution of the entire coroutine call hierarchy. It’s the compiler taking code that you wrote assuming the omnipresence of the stack, analyzing it, and emitting code in a way as if you coded like the pre-stack programmers did.

Note how the compiler doing this optimization has nothing to do with coroutines; because most functions used in programming do not recurse directly or indirectly, the compiler could, in theory, compile such functions into machine code that operated within a static blob of scratch memory, using a special “stackless” calling convention that prohibited accessing the stack pointer in any way; and if you launched a thread with one of those stackless functions as its code, then the thread wouldn’t need to have a stack (well it needs, because there are devils such as interrupts, APCs, POSIX signals, etc, but in embedded programming there’s a good chance you’ll one day find yourself creating multiple threads running such stackless functions, and in a world where stackless functions were real, it would save a huge amount of memory to be able to create threads without having to give each thread hundreds of bytes of stack, and while interrupts exist, you now need a stack per processor for handling them, instead of one per thread).

Read less
Neil Rashbrook April 2, 2024

Ooh, FORTRAN eventually got recursion? I remember it didn’t back in the day when I had a course on it. (I think my year was the last year where FORTRAN was a mandatory course for first-year students. It was also the first year that they actually taught C to first year students.)
Dave Gzorple April 1, 2024

You didn’t really need to make up some assembly language, you could have used the most famous instruction set that did this, and one which is still used today, the IBM 360 with balr et al.
- Mark out West April 2, 2024 · Edited
  
  Yup. For reentrancy/recursion/multithreading IBM utilized a “dynamic storage area” protocol in lieu of a hardware stack. A main memory save area was dynamically allocated and chained together in essentially a doubly linked list that lengthened as the calls deepened. Each DSA saved the entire register set and the set of passed parameters. Because the save area was heap allocated, the executable code was considered read-only and thus thread-safe.
  
  Simpler routines simply dumped the register set into a statically allocated save area within the code and restored its contents on exit.
Ian Boyd April 1, 2024 · Edited

It's still amazing to me that even as recently as Windows 95, which had to run in 4 MB of RAM, that we had to scrimp and save every byte, and every cycle. Where every page fault was a nightmare.

Whereas now the CPU is executing two or three *hundred* instructions ahead, across multple loop iterations, across multiple function calls, and returns. Where a 2 GB address limit now feels confining, and our PCs regularly have more RAM than we could ever use, so Windows uses the other half of it as a cache (SuperFetch).

Where we don't have to have a...
Read more
It’s still amazing to me that even as recently as Windows 95, which had to run in 4 MB of RAM, that we had to scrimp and save every byte, and every cycle. Where every page fault was a nightmare.

Whereas now the CPU is executing two or three *hundred* instructions ahead, across multple loop iterations, across multiple function calls, and returns. Where a 2 GB address limit now feels confining, and our PCs regularly have more RAM than we could ever use, so Windows uses the other half of it as a cache (SuperFetch).

Where we don’t have to have a mini-COM, or declare memory as discardable or not-discardable, or squeeze every byte out of an image with palettes.

And you can have a full 3D recreation of the planet, with every tree and building, running on Javascript inside a browser.

What a wonderous age we live in.

Read less

Subroutine calls in the ancient world, before computers had stacks or heaps

Author

19 comments

Read next

What were the tax consequences of letting Windows 95 team members keep a piece of software as long as they tested it?

The confidential coffee maker was not the only source of shenanigans at the IBM Boca office

Author

19 comments

Read next

What were the tax consequences of letting Windows 95 team members keep a piece of software as long as they tested it?

The confidential coffee maker was not the only source of shenanigans at the IBM Boca office

Stay informed