If each thread’s TEB is referenced by the fs selector, does that mean that the 80386 is limited to 1024 threads?

Raymond Chen

May 20th, 20196 0

Commenter Waleri Todorov recalled that the global descriptor table (GDT), which is one of the places that selectors are defined, is limited to 1024 selectors. Does that mean that there is a hard limit of 1024 threads?

The question was in the context of how Windows NT for the 80386 managed the thread environment block (TEB), namely, by using the fs register to point to the per-thread data. The point is that there are at most 1024 possible distinct values for the fs register to have, so does this implicitly limit the number of threads to 1024?

No, it doesn’t, because nobody said that the distinct values had to be different simultaneously.

Let’s start with a single-processor system. That single processor is executing only one thread at a time, so there needs to be only one valid value for fs at a time. When the processor changes threads, the definition of that selector is updated to refer to the TEB for the incoming thread. Using selectors to access another thread’s TEB is not part of the ABI; all that is required is that you can use fs to access your own TEB.

You can see this in the debugger. Break into a multithreaded program and look at the value of the fs register. On my system it’s 0x0053. Switch to another thread and look at the value of the fs register. It’s the same value: 0x0053. Every thread has the same selector in fs. What happens is that each time the processor changes threads, the GDT entry for 0x0053 is updated to refer to the TEB of the thread that is being scheduled.¹

This trick works even on multiprocessor systems. Each processor has its own GDTR internal register, so instead of sharing a single GDT for all processors, you can give each processor its own GDT.

So I guess this puts the theoretical maximum number of processors supported by an x86-based system at around twenty-four million, because that would exhaust all of kernel mode address space just for GDTs.²

No, wait, that’s still not the limit, because each processor also gets its own page table. After all, that’s how two processors can be executing threads from different processes (and therefore in different address spaces). So the theoretical limit is basically until you run out of memory.

But I suspect you’ll run into other problems long before you add that twenty-four-millionth processor.

¹ Bit 2 is clear for GDT selectors and set for LDT selectors, so you can infer that 0x0053 is a GDT selector.

² I calculated this by dividing 2³¹ by 0x60, which is my presumed minimum size for a GDT. A selector whose numeric value is 0x0053 implies that the GDT is at least 0x0058 bytes in size, because that’s how big you need to be to get to a selector value of 0x0053 in the first place.

Raymond Chen

6 comments

Discussion is closed. Login to edit/delete existing comments.

George Gonzalez May 20, 2019 7:11 am 0

The the very old days, while still programming in real mode, with a Pentium, we would often be rather short of registers inside an inner loop while doing VGA graphics with DPMI. Memory is fuzzy but I think we would use FS and GS as temporary registers. That probably didn’t translate to any CPU mode where those registers act as true protected-mode segment descriptors. Did Windows 95/98/Me allow you to muck with FS and GS?
- Julien Oster May 20, 2019 10:54 am 0
  
  In vm86 mode, which is what those Windows versions used to provide 16bit Windows (and DOS) application support, as far as I remember there wasn’t even a way to trap segment register loads, so that would make it almost certainly possible. Unless the kernel maybe looked at the segment registers of the vm86 task at inopportune times and then expected them to be sensible, but that sounds unlikely. But that’s probably not what you meant. In protected mode tasks (i.e. 32bit Windows), segment descriptors are looked up and cached in hidden registers on segment register load, and any problem doing so would cause a fault, so you cannot use them as general purpose registers. My memory is also fuzzy, so take this with a grain of salt.
  - Michael Getz May 20, 2019 11:49 am 0
    
    it would depend on if the application was intended for real mode 16 bit or protected mode 16bit IIRC. In protected mode the 16 bit code runs in a 32 bit space (windows 98 or later). The main difference is that things are thunked for it to make the jumps to 32bit code. Windows 95 did much the same but in reverse IIRC as the user mode was all 16bit, 32bit pointers would be thunked to 16 bit system DLLs etc. Real mode apps of course all bets were off as they were not designed to multitask.
    - Julien Oster May 20, 2019 12:55 pm 0
      
      Yeah, I was specifically talking about vm86 mode, which emulates real mode. I’m surprised to hear that there was support for 16bit protected mode applications at all in Windows. What was it used for?
      - Michael Getz May 20, 2019 2:45 pm 0
        
        Mostly supporting windows 95/3.1 natively as they ran a 16bit usermode unless you used WIN32s. 16bit applications after that still technically ran in a VDM but it was protected mode the whole time. The WIN16 api was pretty extensive and used by quite a lot of apps, including Installshield which was the most common installer until x64 came along. People forget that in x86 windows you can mix 16bit code and 32bit code in the same application. Raymond gave an explaition of the whole thing in Why can’t you thunk between 32-bit and 64-bit Windows?
W S May 20, 2019 3:07 pm 0

I believe Windows 95 uses one selector per thread “TEB” (TIB) and the limit would apply.

If each thread’s TEB is referenced by the fs selector, does that mean that the 80386 is limited to 1024 threads?

Raymond Chen

Read next

6 comments