December 11th, 2023

The mysterious second parameter to the x86 ENTER instruction

The x86 instruction set has an ENTER instruction which builds a stack frame. It is almost always used with a zero as the second parameter.

        enter   n, 0

This is functionally equivalent to

        push    ebp
        mov     ebp, esp
        sub     esp, n

But what happens if you increase that second parameter beyond zero?

Values greater than zero for the second parameter are intended for languages like Pascal which support nested functions that can access the local variables of their lexical parents. We learned about these functions a short time ago. But the designers of the x86 instruction set had a different design in mind for how a function can access the variables of its lexical parent: Instead of receiving a pointer to the start of a linked list of lexical parent frames, they receive an array of pointers to lexical parent frames.

In its full generality, the

    enter n, k + 1

instruction goes like this:

    push    ebp
    mov     internal_register, esp
    sub     ebp, 4 k times
    push    [ebp]   ⎰
    push    internal_register
    mov     ebp, internal_register
    sub     esp, n

If you ignore the order of operations and worry just about the final state, then you can reinterpret it like this, which I think captures the essence of the instruction better:

    push    ebp
    push    [ebp-4]   
    push    [ebp-8]    k pushes
    :                 
    push    [ebp-4*k] 
    lea     ebp, [esp + 4*k]    ; where we pushed the previous ebp
    push    ebp		        ; add our own frame to the array
    sub     esp, n

Let’s look at our example function again.

function Outer(n: integer) : integer;
    var i: integer;

    procedure Update(j: integer);
    begin
        i := i + j
    end;

    procedure Inner(m: integer);

        procedure MoreInner;
        begin
            Update(m)
        end;

    (* Inner body begins here *)
    begin
        MoreInner
    end;

(* Outer body begins here *)
begin
    i := 0;
    Inner(n);
    Outer := i
end;

On entry to Outer, the stack looks like this:

n parameter  
return address esp

The Outer function establishes its stack frame by performing an enter 4, 1. The extra 1 at the end means that this is the outermost of a chain of nested functions. In our cookbook, k is zero, so the functional equivalent is

    push    ebp
                                ; no pointers copied from parent
    lea     ebp, [esp+0]        ; equivalently, "mov ebp, esp"
    push    ebp                 ; pointer to our own frame
    sub     esp, 4

and we wind up with this stack frame for Outer:

    Outer frame  
    n parameter  
    return address  
  ▶︎ previous ebp ebp
 
    Outer frame pointer  
 
    i esp
 

That extra ,1 caused us to push the address of where we saved the previous ebp, which I’ve called the Outer frame pointer. That value isn’t really useful to us right now, since we already have that value in the ebp register. But it comes in handy when we call Inner.

On entry to Inner, the stack looks like this:

    m parameter  
    return address esp
 

The Inner function performs an enter 0, 2. The 0 means that Inner has no local variables, and the 2 means that we are now the second level in a chain of nested functions.

The functional equivalent now has one extra memory push before we push a pointer to our own frame:

    push    ebp
    push    [ebp-4]             ; one pointer copied from parent
    lea     ebp, [esp+4]
    push    ebp                 ; pointer to our own frame
    sub     esp, 4

Before pushing the address of its own frame, the enter instruction also copies one pointer from the parent’s frame, namely the Outer frame pointer.

    Inner frame  
    m parameter       Outer frame
    return address       n parameter
  ▶︎ previous ebp ebp     return address
 
    Outer frame pointer     ▶︎ previous ebp
     
    Inner frame pointer esp     Outer frame pointer
 
            i
 

Now things are interesting.

The Inner function has access to its own frame, via the ebp register (and redundantly via the Inner frame pointer on its stack). It also has access to the Outer frame through its local copy of the Outer frame pointer.

The next thing that happens is that Inner calls More­Inner with no parameters. This time More­Inner uses enter 0, 3 where the 0 means that More­Inner has no local variables, and the 3 means that it is a nested function three levels deep, so it should copy two frame pointers from its parent.

    MoreInner frame  
    return address       Inner frame    
  ▶︎ previous ebp       m parameter       Outer frame
 
    Outer frame pointer       return address       n parameter
 
    Inner frame pointer     ▶︎ previous ebp       return address
       
    MoreInner frame pointer       Outer frame pointer     ▶︎ previous ebp
           
            Inner frame pointer       Outer frame pointer
         
                    i
         

The frame for MoreInner contains its own parameters and local variables (nothing), plus pointers to both parent frames, plus a pointer to its own frame (which More­Inner doesn’t use, but which is ready for any nested function to use).

The code generation for MoreInner therefore reads the value of m by following the Inner frame pointer and then reading the m parameter from the Inner frame’s parameter space.

After More­Inner calls Update, the Update function starts with an enter 0, 2 because it is a level-2 nested function. This copies only the Outer frame pointer to Update‘s frame, resulting in this:

    Update frame  
    j parameter       Outer frame
    return address       n parameter
  ▶︎ previous ebp ebp     return address
 
    Outer frame pointer     ▶︎ previous ebp
     
    Update frame pointer esp     Outer frame pointer
 
            i
 

I didn’t draw it, but the “previous ebp” in the Update frame points to the More­Inner frame.

The Update function reads j from its own parameter space and uses to update the i variable in Outer‘s frame by following the Outer frame pointer.

The result is the same as the System V Application Binary Interface static chain pointer, but it’s done in a different way. Instead of passing the head of a linked list of frames, the enter instruction copies an entire array of pointers to frames. This reduces the number of instructions required in order to access faraway frames, but it increases the cost of a function call due to the extra copying.

I wonder if anybody uses the Intel design for nested functions. I suspect it’s silicon on the CPU that is completely wasted.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

12 comments

Discussion is closed. Login to edit/delete existing comments.

  • Bernd Oppolzer · Edited

    In normal Pascal procedure calls, such a vector of stack frame addresses is not needed. A standard Pascal runtime knows all the time about the current stack frame address of - say - the procedure which is currently active at static level n. This information is called the DISPLAY VECTOR and there is no need to copy the display vector on procedure calls, because it is stored at a well-known location inside the runtime. You...

    Read more
    • Kalle Niemitalo

      Does any current ABI specification define the static chain pointer or display vector in sufficient detail, to allow lexically nested procedures or functions to be passed as parameters between programming languages? On the Itanium, one could perhaps construct a temporary function descriptor and place the static chain pointer in its global pointer field, but I don’t know if this is permissible.

  • Melissa P

    well, no silicon wasted… the instructions only exist in microcode

    but you can’t get rid of those, as a lot of “positive malware” is still using them, aka “software protectors”.

    • Fabian Giesen

      Microcode ROM still takes up silicon real estate!

  • Nikolay Pavlov

    I wonder why C# and CLR opted for what are essentially global variables for closures, instead of going for this option. It is way more efficient: no allocations and less pressure on GC, perhaps even more cache-friendly due to locality.

    • 紅樓鍮 · Edited

      And if your question was why C# doesn't keep the closure state on the stack: this won't work if the reachability of the closure escapes the function that created the closure. For example, a function that returns a closure can't allocate that closure on its own stack frame:

      <code>

      (My C# is very rusty; please correct me if my code doesn't compile.)

      If just referred to the on the stack, then once returns, that ...

      Read more
    • 紅樓鍮

      IIRC, each instantiation of a C# closure puts captured variables on a heap-allocated structure, not "global variables" as you say.

      And to my knowledge, all languages that implement closures (other than C++, Rust and maybe all those emerging languages that are inspired by them) allocate closure state on the heap (and if they support objects, they also allocate objects on the heap, and objects and closures are almost certainly unified in one way or another at...

      Read more
      • Solomon Ucko

        Rust stores the captured variables (which could be copies or references) inside the closure object, and uses lifetimes to make sure references don’t escape. The closure is stored on the stack by default, but can be put on the heap using a smart pointer. I think Swift does something similar, but with more magic, by using `@noescape` and `@escaping`.

  • Rob Herbert

    It's been a long time since I programmed in pure Assembler, but I remember thinking the second parameter was basically useless.

    Yes, there was a theoretical use for it (and LEAVE) but given the number of times it was ever likely to be used, it was, as you say, wasted silicon.

    Worth mentioning, though, that it was one of those instructions introduced with the 80186 along with PUSHA and POPA, INSB and OUTSB, the moderately useful IMUL...

    Read more
    • skSdnW

      MSVC will generate LEAVE sometimes. ENTER is supposedly too slow even when optimizing for size.

  • Kalle Niemitalo

    Perhaps Intel was inspired by display registers in Burroughs Large Systems.

    • Alex Shpilkin · Edited

      The “display” in the name of those refers to the display technique introduced at the very end of Dijkstra’s “Recursive programming” (1960, DOI 10.1007/BF01386232). The variation used by Intel seems to have been introduced in Gries’s compilers book (1971, ISBN 0-471-32776-X).