{"id":32015,"date":"2023-04-13T14:55:40","date_gmt":"2023-04-13T14:55:40","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=32015"},"modified":"2023-04-14T15:53:25","modified_gmt":"2023-04-14T15:53:25","slug":"cpp23s-new-fold-algorithms","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/cpp23s-new-fold-algorithms\/","title":{"rendered":"C++23&#8217;s New Fold Algorithms"},"content":{"rendered":"<p>C++20 added new versions of the standard library algorithms which take ranges as their first argument rather than iterator pairs, alongside other improvements. However, key algorithms like <code>std::accumulate<\/code> were not updated. This has been done in C++23, with the new <code>std::ranges::fold_*<\/code> family of algorithms. The standards paper for this is <a href=\"https:\/\/wg21.link\/p2322\">P2322<\/a> and was written by Barry Revzin. It been implemented in <a href=\"https:\/\/learn.microsoft.com\/visualstudio\/releases\/2022\/release-notes#summary-of-whats-new-in-this-release-of-visual-studio-2022-version-175\">Visual Studio 2022 version 17.5<\/a>. In this post I\u2019ll explain the benefits of the new \u201crangified\u201d algorithms, talk you through the new C++23 additions, and explore some of the design space for fold algorithms in C++.<\/p>\n<h2>Background: Rangified Algorithms<\/h2>\n<p>C++20\u2019s algorithms make several improvements to the old iterator-based ones. The most obvious is that they now can take a range instead of requiring you to pass iterator pairs. But they also allow passing a \u201cprojection function\u201d to be called on elements of the range before being processed, and the use of C++20 concepts for constraining their interfaces more strictly defines what valid uses of these algorithms are. These changes allow you to make refactors like:<\/p>\n<pre><code class=\"language-c++\">\/\/ C++17 algorithm\r\ncat find_kitten(const std::vector&lt;cat&gt;&amp; cats) {\r\n    return *std::find_if(cats.begin(), cats.end(), \r\n        [](cat const&amp; c) { return c.age == 0; });\r\n}\r\n\r\n\/\/ C++20 algorithm\r\ncat find_kitten(std::span&lt;cat&gt; cats) {\r\n    return *std::ranges::find(cats, 0, &amp;cat::age);\r\n}<\/code><\/pre>\n<p>The differences here are:<\/p>\n<ol>\n<li>Instead of having to pass <code>cats.begin()<\/code> and <code>cats.end()<\/code>, we just pass <code>cats<\/code> itself.<\/li>\n<li>Since we are comparing a member variable of the <code>cat<\/code> to <code>0<\/code>, in C++17 we need to use <code>std::find_if<\/code> and pass a closure which accesses that member and does the comparison. Since the rangified algorithms support projections, in C++20 we can use <code>std::ranges::find<\/code> and pass <code>&amp;cat::age<\/code> as a projection, getting rid of the need for the lambda completely.<\/li>\n<\/ol>\n<p>These improvements can greatly clean up code which makes heavy use of the standard library algorithms.<\/p>\n<p>Unfortunately, alongside the algorithms which reside in the <code>&lt;algorithm&gt;<\/code> header, there are also several important ones in the <code>&lt;numeric&gt;<\/code> header, and these were not rangified in C++20<a href=\"#foot1\"><sup>1<\/sup><\/a>. In this post we\u2019re particularly interested in <code>std::accumulate<\/code> and <code>std::reduce<\/code>.<\/p>\n<h2><code>accumulate<\/code> and <code>reduce<\/code><\/h2>\n<p><code>std::accumulate<\/code> and <code>std::reduce<\/code> are both <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fold_(higher-order_function)\">fold<\/a> operations. They &#8220;fold&#8221; or &#8220;reduce&#8221; or &#8220;combine&#8221; multiple values into a single value. Both take two iterators, an initial value, and a binary operator (which defaults to <code>+<\/code>). They then run the given operator over the range of values given by the iterators, collecting a result as they go. For instance, given <code>std::array&lt;int,3&gt; arr = {1,2,3}<\/code>, <code>std::accumulate(begin(arr), end(arr), 0, std::plus())<\/code> will run <code>(((0 + 1) + 2) + 3)<\/code>. Or <code>std::accumulate(begin(arr), end(arr), 0, f)<\/code> will run <code>f(f(f(0, 1), 2), 3)<\/code>.<\/p>\n<p>These functions are both what are called <em>left folds<\/em> because they run from left to right. There&#8217;s also <em>right folds<\/em>, which as you may guess, run from right to left. For the last example a right fold would look like <code>f(1, f(2, f(3, 0)))<\/code>. For some operations, like <code>+<\/code>, these would give the same result, but for operations which are not <a href=\"https:\/\/en.wikipedia.org\/wiki\/Associative_property\">associative<\/a> (like <code>-<\/code>), it could make a difference.<\/p>\n<p>So why do we have both <code>std::accumulate<\/code> and <code>std::reduce<\/code>? <code>std::reduce<\/code> was added in C++17 as one of the many <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/using-c17-parallel-algorithms-for-better-performance\/\">parallel algorithms<\/a> which let you take advantage of parallel execution for improved performance. The reason it has a different name than <code>std::accumulate<\/code> is because it has different constraints on what types and operations you can use: namely the operation used must be both associative and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Commutative_property\">commutative<\/a>.<\/p>\n<p>To understand why, consider the following code:<\/p>\n<pre><code class=\"language-cpp\">std::array&lt;int,8&gt; a {0,1,2,3,4,5,6,7};\r\nauto result = std::reduce(std::execution::par, \/\/execute in parallel \r\n    begin(a), end(a), 0, f);<\/code><\/pre>\n<p>If <code>f<\/code> is associative, then <code>std::reduce<\/code> could reduce the first half of <code>a<\/code> on one CPU core, reduce the second half of <code>a<\/code> on another core, then call <code>f<\/code> on the result. However, if <code>f<\/code> is <em>not<\/em> associative, this could result in a different answer than carrying out a pure left fold. As such, requiring associativity lets <code>std::reduce<\/code> distribute the reduction across multiple units of execution.<\/p>\n<p>Requiring commutativity likewise enables some potential optimizations. One is that the implementation is free to reorder operations as it likes. If, for example, some calls to <code>f<\/code> take longer than others, <code>reduce<\/code> could use whichever intermediate results are available without having to wait around for the &#8220;right&#8221; result. Commutativity also enables vectorization when using the <code>std::execution::par_unseq<\/code> policy. For example, if <code>f<\/code> is addition, the first half of <code>a<\/code> could be loaded into one vector register, the second half loaded into another, and a vector addition executed on them. This would result in <code>(0 + 4) + (1 + 5) + (2 + 6) + (3 + 7)<\/code>. Notice that the operands have been interleaved: this requires commutativity.<\/p>\n<h2><code>std::ranges::fold_*<\/code><\/h2>\n<p>C++23 comes with six fold functions which fulfil different important use cases. The one you&#8217;ll reach for most is <code>std::ranges::fold_left<\/code>.<\/p>\n<h3><code>fold_left<\/code><\/h3>\n<p>You can use <code>fold_left<\/code> in place of calls to <code>std::accumulate<\/code>. For instance, I have three cats, and when I brush them, I collect all the loose fur as I go so I can throw it away:<\/p>\n<pre><code class=\"language-cpp\">fur brush(fur existing_fur, cat&amp; c);\r\nstd::vector&lt;cat&gt; cats = get_cats(); \r\n\r\n\/\/ C++20\r\nauto loose_fur = std::accumulate(begin(cats), end(cats), fur{}, brush);\r\n\/\/ C++23\r\nauto loose_fur = std::ranges::fold_left(cats, fur{}, brush);\r\n\r\nthrow_away(loose_fur);<\/code><\/pre>\n<p><code>std::accumulate<\/code> is really a generic left fold, but its name suggests summation, and the defaulting of the binary operator to addition further contributes to this. This makes uses of <code>accumulate<\/code> for non-summation purposes look a little clunky. This is why the new version is instead called <code>fold_left<\/code>, and does not have a default operator.<\/p>\n<h3><code>fold_right<\/code><\/h3>\n<p>As you can probably guess, since there&#8217;s a <code>fold_left<\/code> function, there&#8217;s also a <code>fold_right<\/code> function. For associative operations like <code>brush<\/code>, there&#8217;s no real difference in behaviour. But say we have a function which takes some amount of food and feeds half of it to a cat, returning the leftovers:<\/p>\n<pre><code class=\"language-cpp\">food feed_half(cat&amp; c, food f) {\r\n    auto to_feed = f \/ 2;\r\n    c.eaten += to_feed;\r\n    return f - to_feed;\r\n}<\/code><\/pre>\n<p>This operation is not associative, therefore using <code>fold_left<\/code> or <code>fold_right<\/code> would result in feeding different cats different amounts of food. We could call <code>fold_right<\/code> like this:<\/p>\n<pre><code class=\"language-cpp\">std::vector&lt;cat&gt; cats = get_cats();\r\n\/\/feed cats from right to left, starting with 100 food\r\nauto leftovers = std::ranges::fold_right(cats, 100, feed_half); <\/code><\/pre>\n<p>Note that for <code>fold_right<\/code>, the order of arguments to the operator are flipped from <code>fold_left<\/code>: the accumulator is on the right rather than the left.<\/p>\n<p>In these examples we&#8217;ve been able to pick a reasonable initial value for our folds (<code>fur{}<\/code> in the first, <code>100<\/code> in the second). But how do we pick our initial element? What if there is no good initial value to pick?<\/p>\n<h3>Aside on initial element<\/h3>\n<p><code>std::reduce<\/code> allows omitting the initial element for the fold, in which case it uses a value-initialized object of the given iterator&#8217;s value type (e.g. <code>0<\/code> for <code>int<\/code>). This can be handy in some cases, such as summing integers, but doesn&#8217;t work so well in others, such as taking the product of integers. Usually what we want for the initial element is some <em>identity element<\/em> for the value type of the range with respect to the given binary operator. Given any object <code>x<\/code> of type <code>T<\/code> and operation <code>f<\/code>, the identity element <code>id<\/code> is one for which <code>f(id,x) == x<\/code>. For example, the identity element for the pair <code>int, operator+<\/code> is <code>0<\/code>. For <code>int, operator*<\/code> it&#8217;s <code>1<\/code>. For <code>std::string, operator+<\/code> it&#8217;s <code>\"\"<\/code>.<\/p>\n<p>These pairs of types and associative binary operators which have an identity element turn out to be surprisingly common in programming, they&#8217;re called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Monoid\">monoids<\/a>. Ben Deane has several great talks on monoids in C++, I&#8217;d highly recommend <a href=\"https:\/\/www.youtube.com\/watch?v=INnattuluiM\">watching this one<\/a>.<\/p>\n<p>We don&#8217;t have a way to easily get at the identity element of a given monoid in C++. You could imagine implementing some <code>monoid_traits<\/code> class template which could be specialised by users to support custom types. This could let <code>fold_left<\/code> be implemented something like (omitting constraints for brevity):<\/p>\n<pre><code class=\"language-cpp\">template &lt;class Rng, class F, class T = std::ranges::range_value_t&lt;Rng&gt;&gt;\r\nconstexpr T fold_left (Rng&amp;&amp; rng, F&amp;&amp; op, T init = monoid_traits&lt;std::ranges::range_value_t&lt;Rng&gt;, F&gt;::identity_element());<\/code><\/pre>\n<p>Maybe you think that&#8217;s horrifying and too much work. I might agree for many cases. I do think it&#8217;s an interesting design area though, and I know <a href=\"https:\/\/graphblas.org\/\">GraphBLAS<\/a> does something like this; there&#8217;s a <a href=\"https:\/\/www.youtube.com\/watch?v=xMBNCtFV8sI\">CppCon talk<\/a> which shows the kinds of things they do with monoid traits. <a href=\"https:\/\/wg21.link\/p1813\">P1813<\/a> explored a similar idea for a concepts design for numeric algorithms.<\/p>\n<p>I think the C++23 design makes the right decision in simply not defaulting the initial element at all. This forces you to think about it and not accidentally supply a value-initialized value in cases where it&#8217;s not correct.<\/p>\n<h3><code>fold_left_first<\/code> and <code>fold_right_last<\/code><\/h3>\n<p>But what if you don&#8217;t have an identity element for your type? Then you need to pull out the first element by-hand, which is quite unwieldly and a bit tricky to get right for <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/iterator\/input_iterator\">input iterators<\/a>:<\/p>\n<pre><code class=\"language-cpp\">auto b = std::ranges::begin(rng);\r\nauto e = std::ranges::end(rng);\r\nauto init = *b;\r\nfold_left(std::ranges::next(b), e, std::move(init), f);<\/code><\/pre>\n<p>As such, many languages provide an alternative <code>fold<\/code> function which uses the first element of the range as the initial element (thus additionally requiring the range to be non-empty).<\/p>\n<p>The version we have in C++23 has this too, it calls them <code>fold_left_first<\/code> and <code>fold_right_last<\/code>. This lets you simply write:<\/p>\n<pre><code class=\"language-cpp\">std::ranges::fold_left_first(rng, f);<\/code><\/pre>\n<p>Much better.<\/p>\n<h3><code>fold_left_with_iter<\/code> and <code>fold_left_first_with_iter<\/code><\/h3>\n<p>The final two versions of <code>fold<\/code> which are in C++23 are ones which expose an additional result computed by the fold: the end iterator. Recall that for some ranges, the type returned by <code>std::ranges::end<\/code> is not an iterator, but a <em>sentinel<\/em>: some rule for when to finish iteration. For some ranges, computing the iterator for the range which is equal to what <code>std::ranges::end<\/code> returns may actually be quite expensive. Since carrying out the fold necessarily requires computing this iterator, C++23 provides functions which return this iterator alongside the value computed.<\/p>\n<p>For example, say we have a collection of cats sorted by age, and we have some food which is specially formulated for younger cats. We could split the food between the younger cats like so:<\/p>\n<pre><code class=\"language-cpp\">std::vector&lt;cat&gt; cats = get_sorted_cats();\r\nauto young_cats = cats | std::views::take_while([](auto c) { return c.age &lt; 7; });\r\nauto leftover_food = std::ranges::fold_left(young_cats, food_for_young_cats, feed_half);<\/code><\/pre>\n<p>However, now if we want to give some other food to the older cats, we need to recompute the point in <code>cats<\/code> where the young cats stop and the older cats begin:<\/p>\n<pre><code class=\"language-cpp\">auto first_old_cat = std::ranges::find_if(cats, [](auto c) { return c.age &gt;= 7; });\r\ngive_some_other_food(first_old_cat, std::ranges::end(cats));<\/code><\/pre>\n<p><code>fold_left_with_iter<\/code> lets you avoid the recomputation:<\/p>\n<pre><code class=\"language-cpp\">std::vector&lt;cat&gt; cats = get_sorted_cats();\r\nauto young_cats = cats | std::views::take_while([](auto c) { return c.age &lt; 7; });\r\nauto [first_old_cat, leftover_food] = std::ranges::fold_left_with_iter(young_cats, food_for_young_cats, feed_half);\r\n\r\ngive_some_other_food(first_old_cat, std::ranges::end(cats));<\/code><\/pre>\n<h2>What about <code>reduce<\/code>?<\/h2>\n<p>The <code>fold<\/code> family of functions extend and replace <code>std::accumulate<\/code>. But what about <code>std::reduce<\/code>? <code>std::ranges::reduce<\/code> is planned, but no one has written the necessary standards proposal for it yet. It wouldn&#8217;t be <em>that<\/em> different from <code>std::ranges::fold_*<\/code>, but there&#8217;s some subtle design points, like different constraints on the binary operator to allow <code>op(*it, acc)<\/code>, <code>op(acc, *it)<\/code>, and <code>op(*it, *it)<\/code>.<\/p>\n<h2>What about projections?<\/h2>\n<p>I said at the start that the rangified algorithms have 3 main benefits:<\/p>\n<ol>\n<li>Can pass a range rather than iterator pair<\/li>\n<li>Are constrained by concepts<\/li>\n<li>Support projection functions<\/li>\n<\/ol>\n<p>Only the first two apply for <code>fold_*<\/code>, however: projection functions aren&#8217;t supported for a rather subtle reason. You can see <a href=\"https:\/\/wg21.link\/p2322r6\">P2322r6<\/a> for all the details, but essentially, for the <code>fold_left_first*<\/code> and <code>fold_right_last*<\/code> overloads, allowing projections would incur an extra copy even in cases where it shouldn&#8217;t be required. As such, projections were removed from those overloads, and then from the remaining three for consistency.<\/p>\n<p>If you want to use a projection function with <code>fold_left<\/code>, you could do something like:<\/p>\n<pre><code class=\"language-cpp\">auto res = fold_left(rng | std::views::transform(projection), init, f);<\/code><\/pre>\n<h2>Feedback<\/h2>\n<p>You can try out the feature in <a href=\"https:\/\/visualstudio.microsoft.com\/vs\/features\/cplusplus\/\">Visual Studio 2022 version 17.5<\/a>. If you have any questions, comments, or issues with the feature, you can comment below, or reach us via email at <a href=\"mailto:visualcpp@microsoft.com\">visualcpp@microsoft.com<\/a> or via Twitter at <a href=\"https:\/\/twitter.com\/visualc\">@VisualC<\/a>.<\/p>\n<h2>Acknowledgements<\/h2>\n<p>Thanks to Christopher Di Bella, Barry Revzin, and Stephan T. Lavavej for feedback and information. Thanks to <span style=\"font-size: 1rem; text-align: var(--bs-body-text-align);\">Jakub Mazurkiewicz for implementing this feature in our standard library.<\/span><\/p>\n<hr \/>\n<p><sup><a id=\"foot1\"><\/a>1<\/sup>Why? Quoting Casey Carter: &#8220;Every time someone asks why we didn\u2019t cover <code>&lt;numeric&gt;<\/code> and <code>&lt;memory&gt;<\/code> algorithms: We thought 187 pages of Technical Specification was enough.&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>New fold algorithms in C++23, what they do, how to use them.<\/p>\n","protected":false},"author":706,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-32015","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus"],"acf":[],"blog_post_summary":"<p>New fold algorithms in C++23, what they do, how to use them.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/32015","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/706"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=32015"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/32015\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=32015"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=32015"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=32015"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}