Using fibers to expand a thread's stack at runtime, part 5

Now that we figured out how to probe whether there is sufficient remaining stack, we can use that to avoid all the work of creating and managing a fiber if we don’t need it.

template<typename Lambda>
auto RunOnFiberIfNeeded(
    size_t minimumStack,
    Lambda&& lambda)
{
  if (is_stack_available(minimumStack)) {
    return lambda();
  } else {
    return RunOnFiber(std::forward<Lambda>(lambda));
  }
}

If there is sufficient stack available, then we just run the lambda immediately. Otherwise, we ask RunOnFiber to create a fiber and run the lambda on that temporary fiber. When the lambda returns, we destroy the temporary fiber.

We basically reinvented the segmented stack.

One place you can use this technique is if your function is highly recursive. In that case, you may not know how much stack is needed for the entire operation, but you do know how much stack you need for each step of the recursion. You can kick off the next level of the recursion with RunOnFiberIfNeeded, and the function will consume the calling thread’s stack until it runs low, and then overflow onto a fiber.

Here’s the original function:

void FrobAllNodes(Tree& tree)
{
  FrobTheNode(tree.Node());
  for (auto&& child : tree.Children()) {
    FrobAllNodes(child);
  }
}

And here’s a revised version:

void FrobAllNodes(Tree& tree)
{
  RunOnFiberIfNeeded(20480, [&]()
  {
    FrobTheNode(tree.Node());
    for (auto&& child : tree.Children()) {
      FrobAllNodes(child);
    }
  });
}

This code checks whether the stack is running low, and if so switches to a fiber.

Problem solved?

Not quite.

What can happen is that when a thread’s stack gets too low, the code creates a fiber for the next level of the recursion. That level returns, and the temporary fiber is destroyed. And then the code moves to the next child and starts a new level of recursion. That new level also creates a fiber, and the process repeats. Each time we cross the boundary, we either create a fiber (when recursing) or destroy one (when returning).

If the recursion crosses the boundary a lot, then you end up spending a lot of time managing fibers and not a lot of time doing work. For example, consider a tree that is four levels deep, with a single child at the first three levels, and ten children at level four.

w	┐
	x	┐
		y	┬	┬	┬	┬	┬	┬	┬	┬	┬	┐
			0	1	2	3	4	5	6	7	8	9

If the original stack had enough space to accommodate three levels of the recursion, then what’ll happen is that we’ll quickly get through the first three levels, and then each of the ten leaf nodes will create and destroy its own fiber:

Process w.
Recurse for x.
Process x.
Recurse for y.
Process y.
Recurse for 0 — create a fiber.
Process 0.
Return from 0 — destroy a fiber.
Recurse for 1 — create a fiber.
Process 1.
Return from 1 — destroy a fiber.
Recurse for 2 — create a fiber.
Process 2.
Return from 2 — destroy a fiber.
⋮
Recurse for 9 — create a fiber.
Process 9.
Return from 9 — destroy a fiber.
Return from z.
Return from y.
Return from x.

This is known as the “hot-split” problem: If the stack is almost depleted, then the next call will trigger the creation of a new stack segment (which for us is a fiber). When that call returns, the segment is freed. If the same call is repeated in a loop, you end up incurring significant overhead creating and destroying stack segments.

We’ll look at addressing this problem next time.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

5 comments

Ted Spence June 11, 2020

Are there any situations where it is preferred to actually run stack-based recursion rather than unrolling the recursion into a growable queue that can be processed in a loop?

I’ve always used trivial recursion when I’m lazy, and something like this pseudocode whenever the queue could potentially get large:

void FrobAllNodes(tree: Tree)
{
  const queue = new Queue();
  queue.push(tree.Node());
  while (!queue.isEmpty()) {
    const node = queue.pop();
    frobOneNode(node);
    queue.push(...node.Children());
  }
}

Perhaps there’s a C++ reason that I’m not familiar with since my last C++ project was many years ago…

Raymond Chen Author June 11, 2020

The queue approach works well for preorder traversal, but postorder and inorder traversal get complicated. Also, the recursive version requires only O(depth) storage, whereas the queue requires O(depth + breadth) storage.
- Ted Spence June 15, 2020
  
  > postorder and inorder traversal get complicated
  
  > whereas the queue requires O(depth + breadth) storage.
  
  When I read this, my first response was “O(depth+breadth) * size of a pointer?” … but then I realized: we may not have pointers to all our objects in memory. Constructing them and throwing them into a queue, for example, if I were traversing an extremely large virtual hard drive to find an object deterministically would be rough, and would depend heavily on the order of traversal.
  
  Thanks for the context!

Kalle Niemitalo June 11, 2020

Can debuggers show stack traces across stack switches implemented as fibers? If not automatically, then perhaps with some assistance from the program itself?

Raymond Chen Author June 11, 2020

I don’t know of any way to tell the debugger to continue a stack trace into another fiber.

Stay informed

Get notified when new posts are published.

Discussion is closed. Login to edit/delete existing comments.

Ted Spence June 11, 2020
Are there any situations where it is preferred to actually run stack-based recursion rather than unrolling the recursion into a growable queue that can be processed in a loop?

I’ve always used trivial recursion when I’m lazy, and something like this pseudocode whenever the queue could potentially get large:
```
void FrobAllNodes(tree: Tree)
{
  const queue = new Queue();
  queue.push(tree.Node());
  while (!queue.isEmpty()) {
    const node = queue.pop();
    frobOneNode(node);
    queue.push(...node.Children());
  }
}
```
Perhaps there’s a C++ reason that I’m not familiar with since my last C++ project was many years ago…
- Raymond Chen Author June 11, 2020
  
  The queue approach works well for preorder traversal, but postorder and inorder traversal get complicated. Also, the recursive version requires only O(depth) storage, whereas the queue requires O(depth + breadth) storage.
  - Ted Spence June 15, 2020
    
    > postorder and inorder traversal get complicated
    
    > whereas the queue requires O(depth + breadth) storage.
    
    When I read this, my first response was “O(depth+breadth) * size of a pointer?” … but then I realized: we may not have pointers to all our objects in memory. Constructing them and throwing them into a queue, for example, if I were traversing an extremely large virtual hard drive to find an object deterministically would be rough, and would depend heavily on the order of traversal.
    
    Thanks for the context!
Kalle Niemitalo June 11, 2020

Can debuggers show stack traces across stack switches implemented as fibers? If not automatically, then perhaps with some assistance from the program itself?
- Raymond Chen Author June 11, 2020
  
  I don’t know of any way to tell the debugger to continue a stack trace into another fiber.

Using fibers to expand a thread’s stack at runtime, part 5

Author

5 comments

Read next

Using fibers to expand a thread’s stack at runtime, part 6

Error 0x80131040 “The located assembly’s manifest definition does not match the assembly reference” when I use a Windows Runtime class written in C# from my C++/WinRT project