Fibers were the new hotness back in 1996, but the initial excitement was gradually met with the realizations that fibers are awful. Gor Nishanov has a fantastic write-up of the history of fibers and why they suck. Of particular note is that nearly all of the original proponents of fibers subsequently abandoned them.
Fibers make asynchronous functions appear to be synchronous. Depending on what color glasses you are wearing, this is either a cool trick or a hidden gotcha. Over time, the consensus of most of the computing community has settled on the side of “hidden gotcha”.
But there’s still one part of Windows fibers that is still useful: The fiber destruction callback.
Okay, let’s step back and look at thread-local storage first. Windows thread-local storage works like this:
- You allocate a thread-local storage slot.
- You learn that a thread has been created via the
DLL_
THREAD_
ATTACH
notification. You can initialize the thread-local storage in response to this notification. - You learn that a thread has been destroyed via the
DLL_
THREAD_
DETACH
notification. You can clean up the thread-local storage in response to this notification. - You free the thread-local storage slot.
There are a few problems here.
One problem is the case of the thread that existed prior to the allocation of the thread-local storage slot. It’s possible that you never received any DLL_
THREAD_
ATTACH
notification for these pre-existing threads, because your DLL wasn’t even loaded at the time. Even if your DLL did receive those notifications, you couldn’t do anything to initialize a thread-local storage slot that didn’t exist.
In practice, what happens is that thread-local storage is lazily allocated. The DLL ignores the DLL_
THREAD_
ATTACH
notification and allocates the storage on demand the first time a thread does something that requires it.
Another problem is with threads that remain in existence at the time the thread-local storage slot is deallocated. You have to free the memory at deallocation time because even if you get a subsequent DLL_
THREAD_
DETACH
, the slot is already gone, so you lost track of the memory you wanted to free.
A common workaround for this is to keep your own data structure that remembers all the data that has been allocated for thread-local storage, and free that data structure when the slot is deallocated. But if you’re going to do this, then you really didn’t need the thread-local storage slot in the first place. When a thread needs to access per-thread storage, you can just look it up in that data structure you’re using to keep track of them!
A third problem is with code that doesn’t have access to the DllÂMain
function. Executables do not receive DLL notifications, so they never learn about thread creation or destruction. Static libraries do not have access to the DllÂMain
function of their host.
A common workaround for this is to hire a lackey. Executables may have a lackey DLL whose purpose is to forward the notification back to the executable. Static libraries may require the host to forwared the notifications into a special function in the static library.
Even though workarounds for all these problems exist, they’re still annoying problems.
An alternate workaround is to abandon thread-local storage and use fiber-local storage instead. But not because you care about fibers. Rather, it’s because fiber-local storage has this nifty callback.
DWORD WINAPI FlsAlloc( _In_ PFLS_CALLBACK_FUNCTION lpCallback );
The callback function is called when a fiber is destroyed. This is the fiber equivalent of DLL_
THREAD_
DETACH
, except that it’s not just for DLLs. Executables can use it too.
Even better: When you deallocate the fiber-local storage slot, the callback is invoked for every extant fiber. This lets you clean up all your per-fiber data before it’s too late.
As a bonus you support fibers, in case anybody uses your code with fibers.
The new fancy pattern is now this:
- You allocate a filber-local storage slot with a callback function.
- The fiber-local storage is allocated on demand the first time a fiber needs access to it.
- The callback function frees the fiber-local storage, if it had been allocated.
- When finished, you free the thread-local storage slot. This will call the callback for each fiber still in existence, so you can clean up the data.
Even though fibers are basically dead, you can use fiber-local storage to get an improved version of thread-local storage.
This is a very misleading title. *Windows* fibers aren't useful for much; fibers/stackful/symmetric coroutines are in general extremely useful, just look at the success of Go. Here are more examples:
https://unity.com/dots
https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine
https://ourmachinery.com/post/fiber-based-job-system/
http://cbloomrants.blogspot.com/2012/12/12-21-12-coroutine-centric-architecture.html
And before anyone gets into a discussion of semantics, what I mean by "fiber" is a user-mode object with its own stack+cpu context and associated switching API. All goroutines/stackful coroutines/fibers boil down to this.
The Nishanov paper makes the particularly silly assertion that multiplexing...
So that’s why games crash so much 😉
Does this fiber stuff really belong in the OS api when it is only user mode? Could it just as easy had been a library?
Moving the stack requires coordination with the OS, since the stack is part of the ABI.
The linked document says “they do have a capability to switch from fiber to fiber without involving kernel transition”. Wouldn’t that make them better after the spectre/meltdown exploit fixes (not really fixes but disabled optimizations if I remember correctly) that slowed down user/kernel transitions?
If don’t need to call a destructor and only have a memory allocation problem, can solve it with __declspec(thread).
“When you deallocate the fiber-local storage slot, the callback is invoked for every extant fiber.”
Does this mean that you have to ConvertThreadToFiber for any thread that wants to use the FLS slot? Or do non-fiber threads get called too?
Answering my own questions: it looks like calling FlsSetValue sets a thread/fiber up to be called, and it looks like the callbacks get called on the thread/fiber that deallocated the FLS slot, so you shouldn’t do anything with thread affinity (e.g. STA COM calls) in your callback, and you shouldn’t try to lock anything that might be locked by one of the other threads, meaning you probably shouldn’t try to lock anything.
Relatedly, I assume that the thread/fiber doing the deallocation blocks for the other threads to handle their callbacks, so it’s a bad idea to deallocate an FLS slot while holding any locks?
What are your thoughts on coroutines in general? Do you dislike the whole concept, or is it just fibers specifically that are problematic?
Coroutines are the final nail in the coffin for fibers. The team that came up with the C# async spec should get the Nobel Prize.
The coroutine concept has existed for decades. Fibers are essentially just OS-supported coroutines, but they aren’t the best-implemented coroutines.
Why not Simula, Smalltalk or Modula-2?