February 24th, 2025

std::generator: Standard Library Coroutine Support

Sy Brand
C++ Developer Advocate

std::generator is a C++23 feature that enables you to write concise, straightforward functions that generate sequences of values on-demand without manually managing state. It builds upon C++20’s coroutines, providing some standard library support for this powerful, but complex, language feature. As of Visual Studio 2022 version 17.13, we ship an implementation of std::generator in our standard library.

This blog post will walk through an example of how to use the feature, compare it to implementing custom ranges, consider some of the design decisions made, and briefly look at the performance implications.

Motivating Example

Here’s a short motivating example from P2502, which proposed the feature:

std::generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) {
        co_yield std::exchange(a, std::exchange(b, a + b));
    }
}

int answer_to_the_universe() {
    auto rng = fibonacci() | std::views::drop(6) | std::views::take(3);
    return std::ranges::fold_left(std::move(rng), 0, std::plus{});
}

Here, fibonacci is a coroutine, because it uses the co_yield keyword. Coroutines are a generalization of a subroutine. A subroutine begins its execution when it is called and completes its execution when it returns to the caller (forgetting about things like exceptions for simplicity). A coroutine’s execution similarly begins when it is called, but it need not complete execution to return control flow back to the caller: it can suspend its execution and be resumed later.

Note that, in the above example, fibonacci contains an infinite loop. This is not a problem, because when the co_yield statement is executed, the coroutine yields both control flow and a value back to the caller, suspending the coroutine’s execution. std::exchange is a small helper utility that assigns a new value to a variable and returns the old one.

The behavior of operations like co_yield in a coroutine are determined by the coroutine’s return type, in this case std::generator.

The answer_to_the_universe function uses the std::generator object returned by fibonacci in tandem with range adaptors. It pipes the generator into two range adaptors. The first says to drop the first 6 elements from the range, while the second says to take the next 3 elements. Note that these range adaptors are not executed eagerly; elements are only dropped or taken when a value is requested from the resulting range.

After building a range to operate on, the answer_to_the_universe function sums the values in the range using the std::ranges::fold_left algorithm. This is a range-based version of std::accumulate. Folding over the values of the range will continually request values from the range, which will generate those values by resuming execution of the coroutine and forwarding on the yielded results. At the end, we end up with the obvious answer: 42.

Comparison With Custom Ranges

Let’s look at how std::generator makes it much easier to write code that generates sequences of values that are compatible with the ranges algorithms and views in the standard library.

Say we want to represent an infinite range of random numbers using the xorshift pseudorandom number generator (PRNG), which looks like this:

std::uint64_t xorshift(std::uint64_t state) {
    state ^= state << 13;
    state ^= state >> 7;
    state ^= state << 17;
    return state;
}

Given the current state of the PRNG, which is a 64-bit integer, xorshift(state) generates the next random number in the sequence. Don’t ask me where those numbers come from, they’re magic.

Writing a custom range type for this is achievable, but it is quite verbose and requires a somewhat in-depth understanding of ranges. Let’s give it a shot.

We’ll write a type called xorshift_range, which we should be able to use like this:

int main() {
    auto is_even = [](auto i) { return i % 2 == 0; };
    auto ten_even_numbers = xorshift_generator(std::random_device{}())
                          | std::views::filter(is_even)
                          | std::views::take(10);
    for (auto i : ten_even_numbers) {
        std::cout << i << ' ';
    }
}

The constructor should take a seed for the PRNG, and we should be able to pipe the result into existing range adaptors to, for example, retrieve ten even random numbers.

Let’s start with a definition for the type:

class xorshift_range {
public:
    explicit xorshift_range(std::uint64_t seed)
        : seed_(seed) {
    }
    class iterator;

    iterator begin() const;
    std::unreachable_sentinel_t end() const {
        return {};
    }
private:
    std::uint64_t seed_;
};

Reasonably straightforward for now. The constructor takes a seed and stores it in a member. We declare an iterator type, which will implement the generation logic, and a begin function that will return an instance of the iterator type. The iterator will be infinite, so it will always compare not equal to end(). As such, end returns a std::unreachable_sentinel_t, which compares not equal to everything.

The implementation of the iterator type is more complex, so we’ll take it a bit at a time.

class xorshift_range::iterator {
public:
    using difference_type = std::ptrdiff_t;
    using value_type = std::uint64_t;
    using iterator_concept = std::forward_iterator_tag;

We first provide some member types to indicate some information about our iterator. We won’t use difference_type in the implementation of the iterator, but it’s required for compatibility with ranges, so we pick a reasonable default of std::ptrdiff_t. value_type is std::uint64_t becuause we produce unsigned 64-bit integers. Our iterator cannot move backwards, but you can copy instances of it and increment them independently, which corresponds to std::forward_iterator, so we set iterator_concept to std::forward_iterator_tag.

Next come the constructors:

    iterator() = default;
    explicit iterator(std::uint64_t seed)
        : state_(seed) {
        ++(*this);
    }

C++20 iterators must be default-constructible, so we define a zero-argument constructor. We also define a constructor that takes a seed, stores it in a state_ member (which we’ll define soon), and drives the iterator forward to generate the first random number.

The operator overloads deal with generating the next number and returning it to client code:

    std::uint64_t operator*() const {
        return state_;
    }
    iterator& operator++() {
        state_ = xorshift(state_);
        return *this;
    }
    iterator operator++(int) {
        auto copy = *this;
        ++(*this);
        return copy;
    }

The dereference operator returns the current number. The preincrement operator generates the next random number in the sequence using the xorshift function we saw earlier. We implement the postincrement operator in terms of the prefix one.

We also need equality operators and the state_ member:

    friend bool operator==(const iterator&, const iterator&) = default;
private:
    std::uint64_t state_ = 0;
};

The default equality operator will check equality based on the current state.

Finally, we need the implementation of begin and one more piece of ranges magic:

xorshift_range::iterator xorshift_range::begin() const {
    return iterator{seed_};
}
namespace std::ranges {
    template<>
    constexpr bool enable_borrowed_range<xorshift_range> = true;
}

The begin function just returns an iterator with the supplied seed. The specialization of std::ranges::enable_borrowed_range allows us to use xorshift_range as an rvalue in some contexts, because the iterators can safely outlive the parent xorshift_range object without any dangling references.

Phew, that was a lot of work. Want to see the std::generator version?

std::generator<std::uint64_t> xorshift_generator(std::uint64_t seed) {
    while (true) {
        seed = xorshift(seed);
        co_yield seed;
    }
}

That’s it. That’s the entire code.

It’s not entirely the same as the range version, but is clearly a lot easier to implement and maintain. Here are a few differences:

We’ll look at the reasons behind these shortly. Whether these tradeoffs are worth it for you depends on your use cases.

Design Decisions Taken

There are several potential designs for a std::generator-like type, all of which have different tradeoffs and knock-on effects on how the type can be used.

Copyability

Specializations of std::generator and their iterators are move-only. This is because a coroutine’s state is a unique resource: you cannot perform a deep copy of a coroutine, and resuming a coroutine through one handle to it will be visible through all other handles.

Range and Iterator Category

As another consequence of a coroutine’s state being a unique resource, specializations of std::generator model std::ranges::input_range, and their iterators model std::input_iterator. This means that std::generators are single-pass ranges, which limits the set of range adaptors and algorithms that you can use with them. For example, you cannot pass a generator to std::ranges::max_element, because it requires a forward range.

Synchronicity

std::generator is synchronous. This means that you cannot co_await an asynchronous operation inside the body of the coroutine that returns a std::generator.

Recursiveness

std::generator is recursive in that a coroutine that returns a specialization of std::generator may yield a std::generator of the same type directly rather than having to yield its values one-by-one. It’s easier to understand with an example.

Say we have a binary tree that stores integers, and we want to yield all of its elements in order. We might write code like this:

struct tree {
    tree* left;
    tree* right;
    int value;
};
std::generator<int> yield_all_loop(const tree& t) {
    if (t.left) {
        for (auto l : yield_all_loop(*t.left)) {
            co_yield l;
        }
    }

    co_yield t.value;
    
    if (t.right) {
        for (auto r : yield_all_loop(*t.right)) {
            co_yield r;
        }
    }
}

We recursively call yield_all_loop on the left and right subtrees and loop over the elements of each, yielding their values. This results in the number of coroutine resumptions/suspensions growing linearly with the depth of the tree. That is, if all branches of the tree are the same size and the tree has a depth of 16, then 16 coroutines will be resumed and suspended every time a value is requested. That’s a lot of overhead.

std::generator allows you to yield the generators for the subtrees directly rather than having to manually yield their values all the way up the call stack. This is supported by the std::ranges::elements_of type, which doesn’t do anything other than indicate that we want to yield the elements of a generator rather than the generator itself. The code looks like this:

std::generator<int> yield_all_direct(const tree& t) {
    if (t.left) {
         co_yield std::ranges::elements_of(yield_all_direct(*t.left));
    }
    co_yield t.value;
    if (t.right) {
        co_yield std::ranges::elements_of(yield_all_direct(*t.right));
    }
}

This eliminates the linear growth issue. On my laptop, with a full tree with a depth of 16, this version runs twice as fast as the loop-based version.

Reference/Value Type Shenanigans

The design space for deciding what std::ranges::range_reference_t and std::ranges::range_value_t should be for a generator type and how these can be customized is surprisingly complex. This is because we want to support yielding and dereferencing to both reference and value types, while avoiding both dangling references and unnecessary copies. Ideally, we also shouldn’t need to always specify the first template parameter as a reference type, i.e. std::generator<std::string> should compile, and iterators into it should avoid copying the yielded strings when dereferenced.

The design that the committee landed on for std::generator is that you can supply two template arguments to tune the reference and value types. If you only supply one, let’s call it Ref, then the reference type is Ref&& and the value type is std::remove_cvref_t<Ref>. What this means in practice is that dereferencing the generator’s iterators gives references, avoiding copies, and machinery that relies on std::ranges::range_value_t will act on values of the given type. Again, an example with code:

std::generator<std::string> gen() {
    std::string hello = "hello";
    co_yield hello; // 0 copies
    co_yield "Hello"; // 1 copy (conversion from const char* to std::string)
}
int main() {
    for (auto&& s : gen()) {} // 0 copies
    //copies all strings (no dangling refs)
    auto vec = std::ranges::to<std::vector>(gen());
}

This all acts as we expect. It compiles, we avoid all the copies we can (except for the string literal), and we don’t avoid copies we shouldn’t. If we want to avoid copying that string literal, we could choose to instead generate std::string_views, but this has a problem:

std::generator<std::string_view> gen() {
    std::string hello = "hello";
    co_yield hello; // 0 copies of string data
    co_yield "Hello"; // 0 copies
}

int main() {
    for (auto s : gen()) {} // 0 copies
    // uh oh, dangling references
    auto vec = std::ranges::to<std::vector>(gen());
}

In this case, the value type of the generator is std::string_view, so std::ranges::to copies everything into a std::vector<std::string_view>, which holds a bunch of dangling references. Not ideal.

The solution is to manually specify the value type of the generator with the second template argument:

std::generator<std::string_view, std::string> gen() {
    std::string hello = "hello";
    co_yield hello; // 0 copies of string data
    co_yield "Hello"; // 0 copies
}

int main() {
    for (auto s : gen()) {} // 0 copies
    // copies all strings, no dangling ref
    auto vec = std::ranges::to<std::vector>(gen());
}

This is the best of all worlds.

The moral is: if the template parameter you’re supplying to std::generator has reference semantics, think about manually supplying a reasonable value type to avoid dangling references.

Allocator Support

Coroutines require the compiler to generate code that tracks the state of the coroutine and its local variables. This is called a coroutine frame or activation record. In general, the compiler allocates the coroutine frame on the heap unless it can detect that it never outlives its caller, in which case it instead allocates the frame on the caller’s stack. This optimization is called the coroutine Heap Allocation eLision Optimization (HALO). By default, std::generator uses std::allocator for allocating its coroutine frame. In case you want more fine-grained control over where the frame for your std::generator is allocated, it supports supplying an allocator in a few different ways.

First, you can add an allocator to the coroutine’s parameter list:

template <class Allocator>
std::generator<int> f(std::allocator_arg_t, Allocator alloc);

// Usage
f(std::allocator_arg, MyAlloc{});

The first parameter must be std::allocator_arg_t, then the second parameter defines the allocator to use.

You can also supply the allocator to use as the third template argument for std::generator:

std::generator<int, void, StatelessAllocator<int>> f();

// Usage
f();

Finally, you can combine these two methods if you have a stateful allocator that you want to specify the type of statically:

std::generator<int, void, StatefulAllocator<int>>
f(std::allocator_arg_t, StatefulAllocator<int> alloc);

// Usage
f(std::allocator_arg, some_allocator); // must be convertible to StatefulAllocator<int>

A Short Note on Performance

Current implementations of std::generator for both MSVC and libstdc++/GCC introduce some overhead. In the above xorshift sample, the generator version shows around a 3x slowdown with MSVC and 2x slowdown with GCC. We hope to improve on this in future versions of the compiler by, for example, enabling HALO for this use case.

Send Us Your Feedback

We are very much interested in your feedback to continue to improve our standard library and compiler. The comments below are open. You can also share feedback through GitHub Issues for the standard library and Visual Studio Developer Community for the compiler. You can also reach us via email at visualcpp@microsoft.com.

Author

Sy Brand
C++ Developer Advocate

Sy Brand is Microsoft’s C++ Developer Advocate. Their background is in compilers and debuggers for embedded accelerators, but they’re also interested in generic library design, metaprogramming, functional-style C++, undefined behaviour, and making our communities more inclusive and welcoming.

0 comments