Why does the Windows Portable Executable (PE) format have separate tables for import names and import addresses?, part 1

Raymond Chen

In the Windows Portable Executable (PE) format, the image import descriptor table describes the functions imported from a specific target DLL.

    DWORD   OriginalFirstThunk;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;

The OriginalFirstThunk points to an array of pointer-sized IMAGE_THUNK_DATA structures which describe the functions being imported. The FirstThunk points to an array of pointers, whose initial values are a copy of the values pointed to by OriginalFirstThunk. When the DLL is loaded, those initial values in the FirstThunk table are replaced by the actual function pointers determined at runtime.

But why are there two copies of the table? The two tables are never needed at the same time, so why not reuse the memory? When the DLL is initially loaded, the entries describe the functions being imported, and after the function addresses are located, they could be written back into the same table.

The answer is DLL binding.

As a load-time optimization, you can bind your DLL to its targets. If the target DLL has 0x20304000 as its preferred base address, then if the DLL gets loaded at that preferred base address, you know what all the function addresses are going to be, and binding records those precalculated function addresses into the FirstThunk table. After binding is performed, the FirstThunk table now holds the precalculated function addresses and is not a copy of the OriginalFirstThunk table. The module timestamp of the DLL that was used to calculate the bindings is recorded in the image import directory.¹

When the DLL is loaded, the loader checks whether the module timestamp recorded in the image import descriptor matches the timestamp of the actual module found at runtime. If so, then it just uses the precalculated values in the FirstThunk table. And if not, then the loader uses the OriginalFirstThunk table to look up the functions at runtime.

Therefore, you can’t combine the OriginalFirstThunk and FirstThunk tables: If the precalculated values in the FirstThunk table cannot be used, you need to go back to the original values in OriginalFirstThunk to resolve the imports the old-fashioned way.

Bonus chatter: Binding is of relatively little value nowadays due to address space layout randomization.

¹ And the module timestamp is often not really a timestamp.


Discussion is closed. Login to edit/delete existing comments.

  • Joshua Hudson 1

    Other fun facts about PE file format:

    1) Everything is relative to the load address. This actually makes writing position independent code quite reasonable all the way back in 1993, but only if you’re coding in assembly. The build toolchain wasn’t up to snuff yet. With the x64 instruction set’s rel prefix, making binaries with no relocations is easy now.

    2) The minimum Win32 subsystem version is 3.10 (not 3.1; it appears the binary version number of Windows NT 3.1 is 3.10). Filling operating system to 1.0 works just fine however.

    3) You can stuff readonly data between the DOS stub and the PE Header. I’m not sure if code works or not after the NX bit support was added. I used to think scandisk.exe in Windows 9x took advantage of this to reuse the scan code between DOS scanner and Windows scanner; however this is not the case.

    4) Stuffing the import address translation table between the DOS stub and the PE header works. (Why on earth? Somebody did it on stackoverflow and found out it works. I kind of think it shouldn’t work.)

    5) You cannot start a section at zero because the loader checks for section has no data in image by checking if its address in image is zero rather than if its size in image is zero. I think this is a bug. With modern disks having 4k blocks; it actually makes sense now to load the whole image as only two (actually three if there are any resources) sections, .text and .data where .text is RX and .data is RW. We can avoid every single page-in being misaligned by starting the first section at 0 bytes rather than 512 bytes.

    6) IMAGE_DLLCHARACTERISTICS_ NO_SEH is valid on an x64 executable. I don’t want to find out what this does when you pass function pointers to Windows DLLs.

    The specification needs a minor fix. It’s not clear if putting the import address translation table in readonly memory is intended to work or not.

    • Joshua Hudson 0

      lol what timing. Somebody finally ran into the 4GB PE file size limitation: https://github.com/Mozilla-Ocho/llamafile

      “Unfortunately, Windows users cannot make use of these example llamafiles because Windows has a maximum executable file size of 4GB, and all of these examples exceed that size. (The LLaVA llamafile works on Windows because it is 30MB shy of the size limit.)”

    • skSdnW 0

      NO_SEH is a hardening flag that tells the OS that the module has no exception handlers that can be called. This is mostly relevant for 32-bit where the chain starts in the TEB and prevents exploits from adding itself to the chain and ROPing into said module.

      If an exception happens in the callback function itself then I assume it just crashes. If the exception happens somewhere else then I would guess the callback function frame is just skipped when unwinding?

  • Melissa P 0

    I actually knew about this.

    But the next question is, why was ordinal (aka hint) linkage “hacked” into the name field, and furthermore why is it limited to 16-bit? I assume, that’s part 2.

  • 紅樓鍮 0

    Is there any reason why Windows (the OS, ABI and toolchain) doesn’t attempt to take advantage of position-independent code? The most popular Unix-like OSes support them on x86-64, and I can’t find any information on why Windows doesn’t (other than “Windows supports relocation”, and it’s not like the Unixes don’t support relocation).

Feedback usabilla icon