{"id":108233,"date":"2023-05-23T07:00:00","date_gmt":"2023-05-23T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=108233"},"modified":"2023-05-23T06:19:00","modified_gmt":"2023-05-23T13:19:00","slug":"20230523-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230523-00\/?p=108233","title":{"rendered":"On creating (and using) a transforming iterator"},"content":{"rendered":"<p>The C++20 ranges library come with the ability to transform a view; that is, to provide a unary function that is applied to each element of a view, producing a new view.<\/p>\n<pre>std::array&lt;int, 2&gt; a = { 99, 42 };\r\n\r\n\/\/ This range reports values that are one larger than the values\r\n\/\/ in an array. The array values are unchanged.\r\nauto r = a |\r\n    std::ranges::views::transform([](int v) { return v + 1; });\r\n\r\n\/\/ This creates a vector from the range\r\nstd::vector v(r.begin(), r.end());\r\n\r\n\/\/ The resulting vector is { 100, 43 }\r\n<\/pre>\n<p>But what if your code base isn&#8217;t ready to move to C++20 yet? For example, you might be a library that wants to support C++17 or even C++14.<\/p>\n<p>You can take the original iterator and wrap it inside another iterator that overrides the <code>*<\/code> operator so that it applies a transformation to the value before returning it. Note that our transforming iterator cannot support the <code>-&gt;<\/code> operator since there is no value object we can return a pointer to; our value is generated on demand. Fortunately, the standard permits us to omit <code>-&gt;<\/code> support, provided we define <code>pointer<\/code> as <code>void<\/code>.<\/p>\n<pre>template&lt;typename Inner&gt;\r\nstruct Wrap\r\n{\r\n    Wrap(Inner const&amp; inner) :\r\n        m_inner(inner) {}\r\n    Inner m_inner;\r\n};\r\n\r\ntemplate&lt;typename It, typename Transformer&gt;\r\nclass transform_iterator : Wrap&lt;Transformer&gt;\r\n{\r\n    It m_it;\r\npublic:\r\n    transform_iterator(It const&amp; it,\r\n        Transformer const&amp; transformer) :\r\n        m_it(it),\r\n        Wrap&lt;Transformer&gt;(transformer) {}\r\n\r\n    \/\/ copy constructors and assignment operators defaulted\r\n\r\n    using difference_type = typename\r\n        std::iterator_traits&lt;It&gt;::difference_type;\r\n    using value_type = typename std::invoke_result&lt;\r\n        Transformer, It&gt;::type;\r\n    using pointer = void;\r\n    using reference = void;\r\n    using iterator_category = std::input_iterator_tag;\r\n\r\n    bool operator==(transform_iterator const&amp; other)\r\n    { return m_it == other.m_it; }\r\n    bool operator!=(transform_iterator const&amp; other)\r\n    { return m_it != other.m_it; }\r\n\r\n    auto operator*() const { return (*this)(*m_it); }\r\n\r\n    auto operator++() { ++m_it; return *this; }\r\n    auto operator++(int)\r\n    { auto prev = *this; ++m_it; return prev; }\r\n};\r\n\r\n\/\/ For C++14 (no CTAD)\r\ntemplate&lt;typename It, typename Transformer&gt;\r\nauto make_transform_iterator(\r\n    It const&amp; it, Transformer const&amp; transformer)\r\n{\r\n    return transform_iterator&lt;It, Transformer&gt;\r\n        (it, transformer);\r\n}\r\n<\/pre>\n<p>We can use this transforming iterator like this:<\/p>\n<pre>std::array&lt;int, 2&gt; a = { 99, 42 };\r\n\r\nauto transformer = [](int v) { return v + 1; };\r\n\r\n\/\/ This creates a vector from the array, applying\r\n\/\/ a transformation to each value\r\nstd::vector v(\r\n    transform_iterator(a.begin(), transformer),\r\n    transform_iterator(a.end(), transformer));\r\n\r\n\/\/ If C++14 (no CTAD)\r\nstd::vector&lt;int&gt; v(\r\n    transform_iterator(a.begin(), transformer),\r\n    transform_iterator(a.end(), transformer));\r\n\r\n\/\/ The resulting vector is { 100, 43 }\r\n<\/pre>\n<p>The transformation can even change the type:<\/p>\n<pre>std::array&lt;int, 2&gt; a = { 99, 42 };\r\n\r\nauto transformer = [](int v)\r\n    { return std::make_pair(v + 1, v); };\r\n\r\n\/\/ This creates a map from the array\r\nstd::map m(\r\n    transform_iterator(a.begin(), transformer),\r\n    transform_iterator(a.end(), transformer));\r\n\r\n\/\/ If C++14 (no CTAD)\r\nstd::map&lt;int, int&gt; m(\r\n    transform_iterator(a.begin(), transformer),\r\n    transform_iterator(a.end(), transformer));\r\n\r\n\/\/ The resulting map is\r\n\/\/  m[100] = 99\r\n\/\/  m[43] = 42\r\n<\/pre>\n<p>There are some non-obvious pieces of the above <code>transform_<wbr \/>iterator<\/code>.<\/p>\n<p>We want to accept not just lambdas as transformers, but any Callable. That means that we use <code>std::invoke<\/code> to invoke the transformer on the wrapped iterator. This allows transformers to be function pointers, member function pointers, pointers to member variables, lambdas, or any other class with a public <code>operator()<\/code>. For example:<\/p>\n<pre>struct S\r\n{\r\n    int value;\r\n    int ValuePlusOne() { return value + 1; };\r\n};\r\n\r\nint ValueMinusOne(S const&amp; s)\r\n{\r\n    return s.value - 1;\r\n}\r\n\r\nvoid example()\r\n{\r\n    std::array&lt;S, 2&gt; a { 99, 42 };\r\n\r\n    \/\/ v1 = { 99, 42 }; - pointer to data member\r\n    std::vector&lt;int&gt; v1(\r\n    transform_iterator(a.begin(), &amp;S::value),\r\n    transform_iterator(a.end(), &amp;S::value));\r\n\r\n    \/\/ v2 = { 100, 43 }; - pointer to member function\r\n    std::vector&lt;int&gt; v2(\r\n    transform_iterator(a.begin(), &amp;S::ValuePlusOne),\r\n    transform_iterator(a.end(), &amp;S::ValuePlusOne));\r\n\r\n    \/\/ v3 = { 98, 41 }; - pointer to free function\r\n    std::vector&lt;int&gt; v3(\r\n    transform_iterator(a.begin(), &amp;ValueMinusOne),\r\n    transform_iterator(a.end(), &amp;ValueMinusOne));\r\n}\r\n<\/pre>\n<p>The <code>transform_<wbr \/>iterator<\/code> derives from a wrapped <code>Transformer<\/code>. This is a space optimization that takes advantage of empty base optimization (EBO): If the <code>Transformer<\/code> is an empty class, then the <code>Wrapped&lt;Transformer&gt;<\/code> will also be an empty class, and empty base classes are permitted to occupy zero bytes.\u00b9 (Normally, objects cannot be of size zero.) This means that a <code>transform_<wbr \/>iterator<\/code> that has an empty class as a transformer (such as a captureless lambda) is the same size as the original iterator.<\/p>\n<p>We wrap the <code>Transformer<\/code> inside a class because base classes must be classes, but the <code>Transformer<\/code> might not be a class, as we noted above.<\/p>\n<p>A transforming iterator is handy for populating a <code>std::map<\/code> because all three of the major implementations of the C++ standard library optimize the two-iterator <code>insert()<\/code> overload for the case where the items are inserted in increasing key order at the end of map.\u00b2<\/p>\n<p>In the case where you have two versions of a function, one of which takes a range and another of which takes items one at a time, you can avoid the need for a transforming iterator by turning the problem around: Instead of producing a range of transformed iterators to pass to function, you produce an output iterator that calls the single-parameter version of the function.<\/p>\n<pre>std::vector&lt;int&gt; v;\r\nstd::transform(a.begin(), a.end(), std::back_inserter(v),\r\n               transformer);\r\n\r\nstd::map&lt;int, int&gt; m;\r\nstd::transform(a.begin(), a.end(), std::inserter(m, m.end()),\r\n               transformer);\r\n<\/pre>\n<p>This is simpler but has its downsides:<\/p>\n<ul>\n<li>You lose CTAD, since the compiler cannot infer the template type parameters from the constructor.<\/li>\n<li>Repeated single-element function calls may be less efficient than a bulk operation.<\/li>\n<\/ul>\n<p>For example, inserting a transformed range into a vector is linear in the number of elements inserted plus the number of elements after the insertion point. In other words, inserting <var>n<\/var> new elements in front of <var>k<\/var> existing elements is <var>O<\/var>(<var>n<\/var> + <var>k<\/var>) if you do it in a single call to <code>insert<\/code>:<\/p>\n<pre>\/\/ Bulk insert after the first element\r\n\/\/ This takes O(a.size() + v.size() - 1) =\r\n\/\/ O(a.size() + v.size())\r\nv.insert(v.begin() + 1,\r\n    transform_iterator(a.begin(), transformer),\r\n    transform_iterator(a.end(), transformer));\r\n<\/pre>\n<p>If you insert one element at a time, then you pay the <var>k<\/var> each time, which results in a running time of <var>n<\/var>\u2009<var>O<\/var>(1 + <var>k<\/var>) = <var>O<\/var>(<var>n<\/var> + <var>nk<\/var>), an extra cost of <var>O<\/var>(<var>nk<\/var>).<\/p>\n<pre>\/\/ One-at-a-time insert after the first element\r\n\/\/ This takes O(a.size() + a.size() * (v.size() - 1)) =\r\n\/\/ O(a.size() * v.size())\r\nv.insert(v.begin() + 1,\r\n    transform_iterator(a.begin(), transformer),\r\n    transform_iterator(a.end(), transformer));\r\n<\/pre>\n<p><b>Bonus chatter<\/b>: The Boost library comes with an implementation of <code>transform_<wbr \/>iterator<\/code>, so you can use that one instead of the custom one here. But at least you got to see a number of techniques that are commonly seen in library code.<\/p>\n<p>\u00b9 There are other things that could prevent the empty base optimization, but they do not apply here.<\/p>\n<p>\u00b2 In other words, it&#8217;s as if the ranged insertion method were written as<\/p>\n<pre>template&amp;typename Iterator&gt;\r\nvoid insert(Iterator first, Iterator last)\r\n{\r\n    for (; first != last; ++first) {\r\n        insert(end(), *first);\r\n    }\r\n}\r\n<\/pre>\n<p>Last time, we looked at <a title=\"Speeding up the insertion of a sorted (or mostly-sorted) key list into a std::map or other ordered associative container\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230522-00\/?p=108226\"> other ways of doing efficient bulk insertions<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It lets you change the thing being iterated over, on the fly.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-108233","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>It lets you change the thing being iterated over, on the fly.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=108233"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108233\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=108233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=108233"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=108233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}