October 11th, 2019

Fibers aren’t useful for much any more; there’s just one corner of it that remains useful for a reason unrelated to fibers

Raymond Chen

Fibers were the new hotness back in 1996, but the initial excitement was gradually met with the realizations that fibers are awful. Gor Nishanov has a fantastic write-up of the history of fibers and why they suck. Of particular note is that nearly all of the original proponents of fibers subsequently abandoned them.

Fibers make asynchronous functions appear to be synchronous. Depending on what color glasses you are wearing, this is either a cool trick or a hidden gotcha. Over time, the consensus of most of the computing community has settled on the side of “hidden gotcha”.

But there’s still one part of Windows fibers that is still useful: The fiber destruction callback.

Okay, let’s step back and look at thread-local storage first. Windows thread-local storage works like this:

You allocate a thread-local storage slot.
You learn that a thread has been created via the DLL_THREAD_ATTACH notification. You can initialize the thread-local storage in response to this notification.
You learn that a thread has been destroyed via the DLL_THREAD_DETACH notification. You can clean up the thread-local storage in response to this notification.
You free the thread-local storage slot.

There are a few problems here.

One problem is the case of the thread that existed prior to the allocation of the thread-local storage slot. It’s possible that you never received any DLL_THREAD_ATTACH notification for these pre-existing threads, because your DLL wasn’t even loaded at the time. Even if your DLL did receive those notifications, you couldn’t do anything to initialize a thread-local storage slot that didn’t exist.

In practice, what happens is that thread-local storage is lazily allocated. The DLL ignores the DLL_THREAD_ATTACH notification and allocates the storage on demand the first time a thread does something that requires it.

Another problem is with threads that remain in existence at the time the thread-local storage slot is deallocated. You have to free the memory at deallocation time because even if you get a subsequent DLL_THREAD_DETACH, the slot is already gone, so you lost track of the memory you wanted to free.

A common workaround for this is to keep your own data structure that remembers all the data that has been allocated for thread-local storage, and free that data structure when the slot is deallocated. But if you’re going to do this, then you really didn’t need the thread-local storage slot in the first place. When a thread needs to access per-thread storage, you can just look it up in that data structure you’re using to keep track of them!

A third problem is with code that doesn’t have access to the DllMain function. Executables do not receive DLL notifications, so they never learn about thread creation or destruction. Static libraries do not have access to the DllMain function of their host.

A common workaround for this is to hire a lackey. Executables may have a lackey DLL whose purpose is to forward the notification back to the executable. Static libraries may require the host to forwared the notifications into a special function in the static library.

Even though workarounds for all these problems exist, they’re still annoying problems.

An alternate workaround is to abandon thread-local storage and use fiber-local storage instead. But not because you care about fibers. Rather, it’s because fiber-local storage has this nifty callback.

DWORD WINAPI FlsAlloc(
    _In_ PFLS_CALLBACK_FUNCTION lpCallback
);

The callback function is called when a fiber is destroyed. This is the fiber equivalent of DLL_THREAD_DETACH, except that it’s not just for DLLs. Executables can use it too.

Even better: When you deallocate the fiber-local storage slot, the callback is invoked for every extant fiber. This lets you clean up all your per-fiber data before it’s too late.

As a bonus you support fibers, in case anybody uses your code with fibers.

The new fancy pattern is now this:

You allocate a filber-local storage slot with a callback function.
The fiber-local storage is allocated on demand the first time a fiber needs access to it.
The callback function frees the fiber-local storage, if it had been allocated.
When finished, you free the thread-local storage slot. This will call the callback for each fiber still in existence, so you can clean up the data.

Even though fibers are basically dead, you can use fiber-local storage to get an improved version of thread-local storage.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

13 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :

Newest

Newest Popular Oldest

Jordan Chavez October 14, 2019 1

This is a very misleading title. *Windows* fibers aren't useful for much; fibers/stackful/symmetric coroutines are in general extremely useful, just look at the success of Go. Here are more examples:

https://unity.com/dots
https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine
https://ourmachinery.com/post/fiber-based-job-system/
http://cbloomrants.blogspot.com/2012/12/12-21-12-coroutine-centric-architecture.html

And before anyone gets into a discussion of semantics, what I mean by "fiber" is a user-mode object with its own stack+cpu context and associated switching API. All goroutines/stackful coroutines/fibers boil down to this.

The Nishanov paper makes the particularly silly assertion that multiplexing...
Read more
This is a very misleading title. *Windows* fibers aren’t useful for much; fibers/stackful/symmetric coroutines are in general extremely useful, just look at the success of Go. Here are more examples:

https://unity.com/dots
https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine
https://ourmachinery.com/post/fiber-based-job-system/
http://cbloomrants.blogspot.com/2012/12/12-21-12-coroutine-centric-architecture.html

And before anyone gets into a discussion of semantics, what I mean by “fiber” is a user-mode object with its own stack+cpu context and associated switching API. All goroutines/stackful coroutines/fibers boil down to this.

The Nishanov paper makes the particularly silly assertion that multiplexing M coroutines over N threads is a bad idea (with M being large and N typically being the number of cores)…this is exactly how every commercial game engine does concurrency to maximize CPU utilization. It’s even more lopsided on the server-side, evidenced by the growing adoption of Go for highly parallel servers.

Part of the problem with Windows fibers is that you can’t control stack allocation, so can’t re-use fiber instances easily and such. This has blocked their adoption more than anything else IMO.

Read less
- Yukkuri Reimu October 14, 2019 0
  
  So that’s why games crash so much 😉
Gunnar Dalsnes October 13, 2019 0

Does this fiber stuff really belong in the OS api when it is only user mode? Could it just as easy had been a library?
- Raymond Chen Author October 13, 2019 0
  
  Moving the stack requires coordination with the OS, since the stack is part of the ABI.
Piotr Siódmak October 12, 2019 0

The linked document says “they do have a capability to switch from fiber to fiber without involving kernel transition”. Wouldn’t that make them better after the spectre/meltdown exploit fixes (not really fixes but disabled optimizations if I remember correctly) that slowed down user/kernel transitions?
‪ ‪ October 12, 2019 0

If don’t need to call a destructor and only have a memory allocation problem, can solve it with __declspec(thread).
Ron Parker October 11, 2019 0

“When you deallocate the fiber-local storage slot, the callback is invoked for every extant fiber.”

Does this mean that you have to ConvertThreadToFiber for any thread that wants to use the FLS slot? Or do non-fiber threads get called too?
- Ron Parker October 11, 2019 0
  
  Answering my own questions: it looks like calling FlsSetValue sets a thread/fiber up to be called, and it looks like the callbacks get called on the thread/fiber that deallocated the FLS slot, so you shouldn’t do anything with thread affinity (e.g. STA COM calls) in your callback, and you shouldn’t try to lock anything that might be locked by one of the other threads, meaning you probably shouldn’t try to lock anything.
- Ron Parker October 11, 2019 0
  
  Relatedly, I assume that the thread/fiber doing the deallocation blocks for the other threads to handle their callbacks, so it’s a bad idea to deallocate an FLS slot while holding any locks?
Alex Martin October 11, 2019 0

What are your thoughts on coroutines in general? Do you dislike the whole concept, or is it just fibers specifically that are problematic?
- Joe Beans October 11, 2019 0
  
  Coroutines are the final nail in the coffin for fibers. The team that came up with the C# async spec should get the Nobel Prize.
  - Alex Martin October 14, 2019 0
    
    The coroutine concept has existed for decades. Fibers are essentially just OS-supported coroutines, but they aren’t the best-implemented coroutines.
  - Me Gusta October 11, 2019 0
    
    Why not Simula, Smalltalk or Modula-2?