December 11th, 2023

The mysterious second parameter to the x86 `ENTER` instruction

Raymond Chen

The x86 instruction set has an ENTER instruction which builds a stack frame. It is almost always used with a zero as the second parameter.

        enter   n, 0

This is functionally equivalent to

        push    ebp
        mov     ebp, esp
        sub     esp, n

But what happens if you increase that second parameter beyond zero?

Values greater than zero for the second parameter are intended for languages like Pascal which support nested functions that can access the local variables of their lexical parents. We learned about these functions a short time ago. But the designers of the x86 instruction set had a different design in mind for how a function can access the variables of its lexical parent: Instead of receiving a pointer to the start of a linked list of lexical parent frames, they receive an array of pointers to lexical parent frames.

In its full generality, the

    enter n, k + 1

instruction goes like this:

    push    ebp
    mov     internal_register, esp
    sub     ebp, 4  ⎱ k times
    push    [ebp]   ⎰
    push    internal_register
    mov     ebp, internal_register
    sub     esp, n

If you ignore the order of operations and worry just about the final state, then you can reinterpret it like this, which I think captures the essence of the instruction better:

    push    ebp
    push    [ebp-4]   
    push    [ebp-8]    k pushes
    :                 
    push    [ebp-4*k] 
    lea     ebp, [esp + 4*k]    ; where we pushed the previous ebp
    push    ebp		        ; add our own frame to the array
    sub     esp, n

Let’s look at our example function again.

function Outer(n: integer) : integer;
    var i: integer;

    procedure Update(j: integer);
    begin
        i := i + j
    end;

    procedure Inner(m: integer);

        procedure MoreInner;
        begin
            Update(m)
        end;

    (* Inner body begins here *)
    begin
        MoreInner
    end;

(* Outer body begins here *)
begin
    i := 0;
    Inner(n);
    Outer := i
end;

On entry to Outer, the stack looks like this:

`n` parameter
return address	← `esp`

The Outer function establishes its stack frame by performing an enter 4, 1. The extra 1 at the end means that this is the outermost of a chain of nested functions. In our cookbook, k is zero, so the functional equivalent is

    push    ebp
                                ; no pointers copied from parent
    lea     ebp, [esp+0]        ; equivalently, "mov ebp, esp"
    push    ebp                 ; pointer to our own frame
    sub     esp, 4

and we wind up with this stack frame for Outer:

	`Outer` frame
	`n` parameter
	return address
▶︎	previous `ebp`	← `ebp`
▶︎	previous `ebp`	← `ebp`
	`Outer` frame pointer
	`Outer` frame pointer
	`i`	← `esp`

That extra ,1 caused us to push the address of where we saved the previous ebp, which I’ve called the Outer frame pointer. That value isn’t really useful to us right now, since we already have that value in the ebp register. But it comes in handy when we call Inner.

On entry to Inner, the stack looks like this:

		`m` parameter
		return address	← `esp`

The Inner function performs an enter 0, 2. The 0 means that Inner has no local variables, and the 2 means that we are now the second level in a chain of nested functions.

The functional equivalent now has one extra memory push before we push a pointer to our own frame:

    push    ebp
    push    [ebp-4]             ; one pointer copied from parent
    lea     ebp, [esp+4]
    push    ebp                 ; pointer to our own frame
    sub     esp, 4

Before pushing the address of its own frame, the enter instruction also copies one pointer from the parent’s frame, namely the Outer frame pointer.

	`Inner` frame
	`m` parameter			`Outer` frame
	return address			`n` parameter
▶︎	previous `ebp`	← `ebp`		return address
▶︎	previous `ebp`	← `ebp`		return address
	`Outer` frame pointer		▶︎	previous `ebp`
	`Outer` frame pointer		▶︎	previous `ebp`
	`Inner` frame pointer	← `esp`		`Outer` frame pointer
	`Inner` frame pointer	← `esp`		`Outer` frame pointer
				`i`

Now things are interesting.

The Inner function has access to its own frame, via the ebp register (and redundantly via the Inner frame pointer on its stack). It also has access to the Outer frame through its local copy of the Outer frame pointer.

The next thing that happens is that Inner calls MoreInner with no parameters. This time MoreInner uses enter 0, 3 where the 0 means that MoreInner has no local variables, and the 3 means that it is a nested function three levels deep, so it should copy two frame pointers from its parent.

	`MoreInner` frame
	return address		`Inner` frame
▶︎	previous `ebp`		`m` parameter		`Outer` frame

	`Outer` frame pointer		return address		`n` parameter

	`Inner` frame pointer	▶︎	previous `ebp`		return address

	`MoreInner` frame pointer		`Outer` frame pointer	▶︎	previous `ebp`

			`Inner` frame pointer		`Outer` frame pointer

					`i`

The frame for MoreInner contains its own parameters and local variables (nothing), plus pointers to both parent frames, plus a pointer to its own frame (which MoreInner doesn’t use, but which is ready for any nested function to use).

The code generation for MoreInner therefore reads the value of m by following the Inner frame pointer and then reading the m parameter from the Inner frame’s parameter space.

After MoreInner calls Update, the Update function starts with an enter 0, 2 because it is a level-2 nested function. This copies only the Outer frame pointer to Update‘s frame, resulting in this:

	`Update` frame
	`j` parameter			`Outer` frame
	return address			`n` parameter
▶︎	previous `ebp`	← `ebp`		return address
▶︎	previous `ebp`	← `ebp`		return address
	`Outer` frame pointer		▶︎	previous `ebp`
	`Outer` frame pointer		▶︎	previous `ebp`
	`Update` frame pointer	← `esp`		`Outer` frame pointer
	`Update` frame pointer	← `esp`		`Outer` frame pointer
				`i`

I didn’t draw it, but the “previous ebp” in the Update frame points to the MoreInner frame.

The Update function reads j from its own parameter space and uses to update the i variable in Outer‘s frame by following the Outer frame pointer.

The result is the same as the System V Application Binary Interface static chain pointer, but it’s done in a different way. Instead of passing the head of a linked list of frames, the enter instruction copies an entire array of pointers to frames. This reduces the number of instructions required in order to access faraway frames, but it increases the cost of a function call due to the extra copying.

I wonder if anybody uses the Intel design for nested functions. I suspect it’s silicon on the CPU that is completely wasted.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

12 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :

Newest

Newest Popular Oldest

Bernd Oppolzer December 12, 2023 · Edited 0

In normal Pascal procedure calls, such a vector of stack frame addresses is not needed. A standard Pascal runtime knows all the time about the current stack frame address of - say - the procedure which is currently active at static level n. This information is called the DISPLAY VECTOR and there is no need to copy the display vector on procedure calls, because it is stored at a well-known location inside the runtime. You...
Read more
In normal Pascal procedure calls, such a vector of stack frame addresses is not needed. A standard Pascal runtime knows all the time about the current stack frame address of – say – the procedure which is currently active at static level n. This information is called the DISPLAY VECTOR and there is no need to copy the display vector on procedure calls, because it is stored at a well-known location inside the runtime. You only have to replace the stack frame addresses of the current static level, when you enter or leave a procedure (and maybe set the new current static level).

What makes things more complicated, are procedure and function PARAMETERS (in Pascal), that is: procedures that are passed as parameters to other procedures. In this case, it is indeed necessary to COPY THE COMPLETE DISPLAY VECTOR, because it is not possible to predict what static level the procedure (which is passed as a parameter) has. So maybe the ENTER instruction is meant for such use cases.

Some of the old Pascal compilers didn’t allow procedure parameters (or implemented them badly) due to these difficulties.
To see, if your (Pascal or Algol) compiler implemented procedure parameters correctly, you can use the “Man or Boy” test: https://en.wikipedia.org/wiki/Man_or_boy_test

Read less
- Kalle Niemitalo December 18, 2023 0
  
  Does any current ABI specification define the static chain pointer or display vector in sufficient detail, to allow lexically nested procedures or functions to be passed as parameters between programming languages? On the Itanium, one could perhaps construct a temporary function descriptor and place the static chain pointer in its global pointer field, but I don’t know if this is permissible.
Melissa P December 12, 2023 0

well, no silicon wasted… the instructions only exist in microcode

but you can’t get rid of those, as a lot of “positive malware” is still using them, aka “software protectors”.
- Fabian Giesen December 12, 2023 0
  
  Microcode ROM still takes up silicon real estate!
Nikolay Pavlov December 12, 2023 0

I wonder why C# and CLR opted for what are essentially global variables for closures, instead of going for this option. It is way more efficient: no allocations and less pressure on GC, perhaps even more cache-friendly due to locality.
- 紅樓鍮 December 12, 2023 · Edited 0
  And if your question was why C# doesn't keep the closure state on the stack: this won't work if the reachability of the closure escapes the function that created the closure. For example, a function that returns a closure can't allocate that closure on its own stack frame:
  
  <code>
  
  (My C# is very rusty; please correct me if my code doesn't compile.)
  
  If just referred to the on the stack, then once returns, that ...
  Read more
  And if your question was why C# doesn’t keep the closure state on the stack: this won’t work if the reachability of the closure escapes the function that created the closure. For example, a function that returns a closure can’t allocate that closure on its own stack frame:
  
  Func<int> CreateClosure() { int counter = 0; return () => ++counter; } var f = CreateClosure(); Debug.Assert(f() == 1); Debug.Assert(f() == 2);
  
  (My C# is very rusty; please correct me if my code doesn’t compile.)
  
  If () => ++counter just referred to the counter on the stack, then once CreateClosure returns, that counter would no longer exist, and the closure would try to access a nonexistent stack variable.
  
  Similarly, the compiler can’t know statically whether it’s safe to allocate the closure on the stack if the closure will be passed to another function:
  
  void PrintCounter() { int counter = 0; new Timer(_ => { var value = Interlocked.Increment(ref counter); Console.WriteLine($"{value}\n"); }, null, 0, 1000); }
  
  Passing a closure to Timer‘s constructor causes it to escape to another thread; that thread must be able to safely invoke the closure even after PrintCounter has returned.
  
  Non-heap-based languages like Rust allow you to create closures that are stored on the stack, but they (except C++, which wasn’t designed to be strictly safe) track your usage of said closures to ensure they don’t escape the parent function’s stack frame in the fashions shown above (if the parent function passes the closure to another function by reference, the callee must assert in its type signature that it doesn’t retain that reference beyond the closure’s lifetime). It’s also possible for heap-based languages to do such escape analysis dynamically, and elide the heap allocation in case they prove that escaping doesn’t happen; if my memory serves, Java does that optimization, but I don’t hear .NET does it.
  Read less
- 紅樓鍮 December 12, 2023 0
  
  IIRC, each instantiation of a C# closure puts captured variables on a heap-allocated structure, not "global variables" as you say.
  
  And to my knowledge, all languages that implement closures (other than C++, Rust and maybe all those emerging languages that are inspired by them) allocate closure state on the heap (and if they support objects, they also allocate objects on the heap, and objects and closures are almost certainly unified in one way or another at...
  Read more
  IIRC, each instantiation of a C# closure puts captured variables on a heap-allocated structure, not “global variables” as you say.
  
  And to my knowledge, all languages that implement closures (other than C++, Rust and maybe all those emerging languages that are inspired by them) allocate closure state on the heap (and if they support objects, they also allocate objects on the heap, and objects and closures are almost certainly unified in one way or another at the type system level), and these heap-based languages also tend to feature efficient garbage-collected heaps.
  
  Read less
  - Solomon Ucko December 12, 2023 0
    
    Rust stores the captured variables (which could be copies or references) inside the closure object, and uses lifetimes to make sure references don’t escape. The closure is stored on the stack by default, but can be put on the heap using a smart pointer. I think Swift does something similar, but with more magic, by using `@noescape` and `@escaping`.
Rob Herbert December 11, 2023 0

It's been a long time since I programmed in pure Assembler, but I remember thinking the second parameter was basically useless.

Yes, there was a theoretical use for it (and LEAVE) but given the number of times it was ever likely to be used, it was, as you say, wasted silicon.

Worth mentioning, though, that it was one of those instructions introduced with the 80186 along with PUSHA and POPA, INSB and OUTSB, the moderately useful IMUL...
Read more
It’s been a long time since I programmed in pure Assembler, but I remember thinking the second parameter was basically useless.

Yes, there was a theoretical use for it (and LEAVE) but given the number of times it was ever likely to be used, it was, as you say, wasted silicon.

Worth mentioning, though, that it was one of those instructions introduced with the 80186 along with PUSHA and POPA, INSB and OUTSB, the moderately useful IMUL byte, and the possibly never-used BOUND.

For real nerds, PUSHA and POPA on the 80186 worked differently from later CPUs and was a way to detect the CPU in use.

Read less
- skSdnW December 12, 2023 0
  
  MSVC will generate LEAVE sometimes. ENTER is supposedly too slow even when optimizing for size.
Kalle Niemitalo December 11, 2023 0

Perhaps Intel was inspired by display registers in Burroughs Large Systems.
- Alex Shpilkin December 12, 2023 · Edited 0
  
  The “display” in the name of those refers to the display technique introduced at the very end of Dijkstra’s “Recursive programming” (1960, DOI 10.1007/BF01386232). The variation used by Intel seems to have been introduced in Gries’s compilers book (1971, ISBN 0-471-32776-X).