April 4th, 2019

Why do we need atomic operations on the 80386, if it doesn’t support symmetric multiprocessing anyway?

The 80386 processor did not support symmetric multiprocessing, yet we discussed atomic operations when in our overview of the processor. If the processor doesn’t even support symmetric multiprocessing, why does it matter?

Well, one reason is that the 80386 processor does support asymmetric multiprocessing. Floating point operations are performed by a coprocessor, and the main processor and coprocessor are both accessing the same memory. Another source of competing memory access is from hardware devices that are using Direct Memory Access (DMA).

Even within the processor, you have to worry about races, because you might be racing with yourself.

The 80386 did not support symmetric multiprocessing, but it did support pre-emptive multitasking, which means that any multi-instruction sequence is at risk of being interrupted, and at the worst possible time.

    ; decrement the variable and check against zero
    mov     eax, [var]
    dec     eax
    mov     [var], eax
    je      zero

If the threads gets pre-empted between the first and third instructions, then the contents of the variable may be changed by another thread, and the decrement operation becomes non-atomic. To ensure atomicity, you need to force the compiler to generate a single dec instruction, and then to test the flags directly from the decrement.

    ; decrement the variable and check against zero
    dec     [var]
    jz      zero

There was no way to express this level of detail to compilers of that era, so you had to hide it behind a function call.

And if your operation cannot be expressed in a single instruction, then you’re out of luck. Increment and compare against 10? Compare and exchange if equal? Nope, you can’t do those things, at least not without some help from the operating system.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

12 comments

Discussion is closed. Login to edit/delete existing comments.

  • David Walker

    Do newer processors have a (single) "decrement and jump if zero" atomic instruction, built into the hardware?  Or, a decrement like you mention that can decrement a value at a memory location and set a flag, all interlocked at the CPU level (and multiprocessor-safe)?
    I realize that decrementing from a memory location usually involves reading the value into a register, decrementing the register, and writing the value back out.  But, if you can specify a...

    Read more
    • cheong00

      Do you mean “LOOPZ”? (decrements CX and jump to label if zero and ZF is set, although the jump range is limited)

      • David Walker

        I was actually looking for an atomic increment or decrement of a memory location, not a register.

      • Raymond ChenMicrosoft employee Author

        Yup, it’s right there in the article. But all you get is the sign, not the value. See also the link at the end of the article.

      • David Walker

        Oh, right.  Thanks.

      • Yuhong Bao

        FYI, if you code in x86 assembly, you can use any flags generated by the LOCK DEC/SUB/ADD/INC instructions after it is executed.

  • Alex Cohn

    The self-imposed race conditions between two threads running on a single CPU can be handled without atomics. E.g. given i that another thread can change, `int local_copy_of_i = i+1; i = local_copy_of_i; if (local_copy_of_i > 10) do_something(); else do_something_else();` I believe it will even resolve DMA race conditions.

    • Murray Colpman

      This wouldn't work if two increments of i is expected to increment i twice. Say i is 0, your local thread takes local_copy_of_i to be 1. Then you're preempted and another thread does the same, taking its local_copy_of_i to be 1. The other thread writes back the incremented value 1, and then your thread also writes back the local_copy_of_i which is still 1. Oops - you've incremented i twice from 0 and got 1!

      Read more
  • Yuhong Bao

    Actually, there is nothing preventing the 80386 from supporting SMP and NT 3.1 did support it with Compaq SystemPro I think. It is not common though.

    • Jernej Simončič

      Wasn’t SystemPro asymmetric?

      • Piotr Gliźniewicz

        Yes, the SystemPro was asymmetric, it used the 2nd CPU for I/O. I think “80386 processor did not support symmetric multiprocessing” means more “80386 processor did not support symmetric multiprocessing out of the box”. You could build a multiprocessor using 8080s with enough external logic. I’m not an expert on the topic, but I think if a CPU supports external bus masters it should also be possible to build an SMP system with it.

      • Yuhong Bao

        AFAIK the 80386 and 80486 bus was very similar