Standard Library Algorithms: Changes and Additions in C++17

Today we have a guest post from Marc Gregoire, Software Architect at Nikon Metrology and Microsoft MVP since 2007.

The C++14 standard already contains a wealth of different kinds of algorithms. C++17 adds a couple more algorithms and updates some existing ones. This article explains what’s new and what has changed in the C++17 Standard Library.

New Algorithms

Sampling

C++17 includes the following new sampling algorithm:

sample(first, last, out, n, gen)

It uses the given random number generator (gen) to pick n random elements from a given range [first, last) and writes them to the given output iterator (out).

Here is a simple piece of code that constructs a vector containing the integers 1 to 20. It then sets up a random number generator, and finally generates 10 sequences of 5 values, in which each value is randomly sampled from the data vector:

using namespace std;

vector<int> data(20);
iota(begin(data), end(data), 1);
copy(cbegin(data), cend(data), ostream_iterator<int>(cout, " "));
cout << '\n';

random_device seeder;
const auto seed = seeder.entropy() ? seeder() : time(nullptr);
default_random_engine generator(
       static_cast<default_random_engine::result_type>(seed));

const size_t numberOfSamples = 5;
vector<int> sampledData(numberOfSamples);

for (size_t i = 0; i < 10; ++i)
{
    sample(cbegin(data), cend(data), begin(sampledData),
           numberOfSamples, generator);
    copy(cbegin(sampledData), cend(sampledData),
         ostream_iterator<int>(cout, " "));
    cout << '\n';
}

Here is an example of a possible output:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4 8 9 17 19
4 7 12 13 18
5 7 8 14 18
1 4 5 10 20
2 4 8 13 17
2 3 4 5 20
4 7 8 9 13
1 7 8 10 15
4 5 8 12 13
1 3 8 10 19

Iterating

The C++ Standard Library already included for_each() to process each element in a given range. C++17 adds a for_each_n(first, n, func) algorithm. It calls the given function object (func) for each element in the range given by a first iterator (first) and a number of elements (n). As such, it is very similar to for_each(), but for_each_n() only processes the first n elements of the range.

Here is a simple example that generates a vector of 20 values, then uses for_each_n() to print the first 5 values to the console:

using namespace std;

vector<int> data(20);
iota(begin(data), end(data), 1);

for_each_n(begin(data), 5,
           [](const auto& value) { cout << value << '\n'; });

Searching

C++17 includes a couple of specialized searchers, all defined in <functional>:

default_searcher
boyer_moore_searcher
boyer_moore_horspool_searcher

The Boyer-Moore searchers are often used to find a piece of text in a large block of text, and are usually more efficient than the default searcher. In practice, the two Boyer-Moore searchers are able to skip certain characters instead of having to compare each individual character. This gives these algorithms a sublinear complexity, making them much faster than the default searcher. See the Wikipedia article for more details of the algorithm.

To use these specialized searchers, you create an instance of one of them and pass that instance as the last parameter to std::search(), for example:

using namespace std;

const string haystack = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";
const string needle = "consectetur";

const auto result = search(cbegin(haystack), cend(haystack),
       boyer_moore_searcher(cbegin(needle), cend(needle)));

if (result != cend(haystack))
    cout << "Found it.\n";
else
    cout << "Not found.\n";

If you want to carry out multiple searches on the same range, you can construct a single instance of std::boyer_moore_searcher and reuse it, rather than creating a new one for each std::search() call.

Generalized Sum Algorithms

Scan

The following generalized sum algorithms have been added to the C++17 Standard Library:

exclusive_scan(first, last, out, init[, bin_op])
inclusive_scan(first, last, out[, bin_op[, init]])
transform_exclusive_scan(first, last, out, init, bin_op, un_op)
transform_inclusive_scan(first, last, out, bin_op, un_op[, init])

Where bin_op is a binary operator (std::plus<>() by default), and un_op is a unary operator.

All these algorithms calculate a sequence of sums of the elements in a given range [first, last), denoted as [e₀, e_n). The calculated sums are written to [out, out + (last-first)), denoted as [s₀, s_n). Suppose further that we denote the binary operator (bin_op) as ⊕. The exclusive_scan() algorithm then calculates the following sequence of sums:

s0 = init 
s1 = init ⊕ e0
s2 = init ⊕ e0 ⊕ e1 
...
sn-1 = init ⊕ e0 ⊕ e1 ⊕ ... ⊕ en−2

While inclusive_scan() calculates the following sums:

s0 = init ⊕ e0 
s1 = init ⊕ e0 ⊕ e1 
... 
sn-1 = init ⊕ e0 ⊕ e1 ⊕ ... ⊕ en−1

The only difference is that inclusive_scan() includes the i^th element in the i^th sum, while exclusive_scan() does not include the i^th element in the i^th sum.

exclusive_scan() and inclusive_scan() are similar to partial_sum(). However, partial_sum() evaluates everything from left to right, while exclusive_scan() and inclusive_scan() evaluate everything in a non-deterministic order. That means that the result of these will be non-deterministic if the binary operator that is used is not associative. Because of the non-deterministic order, these algorithms can be executed in parallel by specifying a parallel execution policy, see later.

The sums calculated by inclusive_scan() with init equal to 0 and an associative binary operator are exactly the same as the sums calculated by partial_sum().

transform_exclusive_scan() and transform_inclusive_scan() are very similar. The only difference is that they apply the given unary operator before calculating the sum. Suppose the unary operator is denoted as a function call f(). The transform_exclusive_scan() algorithm then calculates the following sums sequence:

s0 = init
s1 = init ⊕ f(e0)
s2 = init ⊕ f(e0) ⊕ f(e1)
...
sn-1 = init ⊕ f(e0) ⊕ f(e1) ⊕ ... ⊕ f(en−2)

And the transform_inclusive_scan() algorithm calculates the following sums:

s0 = init ⊕ f(e0)
s1 = init ⊕ f(e0) ⊕ f(e1)
...
sn-1 = init ⊕ f(e0) ⊕ f(e1) ⊕ ... ⊕ f(en−1)

Reduce

Additionally, the following two reduce algorithms have been added:

reduce(first, last[, init[, bin_op]])
transform_reduce(first, last, init, bin_op, un_op)

Where bin_op is a binary operator, std::plus<>() by default. These algorithms result in a single value, similar to accumulate().

Suppose again that the range [first, last) is denoted as [e₀, e_n). reduce() then calculates the following sum:

init ⊕ e0 ⊕ e1 ⊕ ... ⊕ en−1

While transform_reduce() results in the following sum, assuming the unary operator is denoted as a function call f():

init ⊕ f(e0) ⊕ f(e1) ⊕ ... ⊕ f(en−1)

Unlike accumulate(), reduce() supports parallel execution. The accumulate() algorithm always evaluates everything deterministically from left to right, while the evaluation order is non-deterministic for reduce(). A consequence is that the result of reduce() will be non-deterministic in case the binary operator is not associative or not commutative.

The sum calculated by reduce() with init equal to 0 is exactly the same as the result of calling accumulate() as long as the binary operator that is used is associative and commutative.

Finally, there is another set of overloads for transform_reduce():

transform_reduce(first1, last1, first2, init[, bin_op1, bin_op2])

It requires two ranges: a range [first1, last1), denoted as [a₀, a_n), and a range starting at first2, denoted as [b₀, b_n). Suppose bin_op1 (std::plus<>() by default) is denoted as ⊕, and bin_op2 (std::multiplies<>() by default) is denoted as ⊖, then it calculates the following sum:

init ⊕ (a0 ⊖ b0) ⊕ (a1 ⊖ b1) ⊕ ... ⊕ (an-1 ⊖ bn-1)

Parallel Algorithms

A major addition to the C++17 Standard Library is support for parallel execution of more than 60 of its algorithms, such as sort(), all_of(), find(), transform(), …

If a Standard Library algorithm supports parallel execution, then it accepts an execution policy as its first parameter. This policy determines to what degree the algorithm may parallelize or vectorize its execution. Currently, the following policy types and instances are defined in the std::execution namespace in the <execution> header:

*Execution Policy Type*	Global Instance	Description
sequenced_policy	`seq`	No parallel execution is allowed.
parallel_policy	`par`	Parallel execution is allowed.
parallel_unsequenced_policy	`par_unseq`	Parallel and vectorized execution is allowed. Execution is also allowed to switch between different threads.

The parallel_unsequenced_policy imposes a lot of restrictions on what the algorithm’s function callbacks are allowed to do. With that policy, the calls to the callbacks are unsequenced. As such, its callbacks are not allowed to perform memory allocation/deallocation, acquire mutexes, and more. The other policies do not have such restrictions, and in fact guarantee that their callback calls are sequenced, although of course they can be in a non-deterministic order. In any case, you are responsible to prevent data races and deadlocks.

Using these parallel policies is straightforward. Here is a quick example that generates a vector of 1 billion double values, then uses the std::transform() algorithm to calculate the square root of each value in parallel:

using namespace std;

vector<double> data(1'000'000'000);
iota(begin(data), end(data), 1);

transform(execution::par_unseq, begin(data), end(data), begin(data),
          [](const auto& value) { return sqrt(value); });

If you run this piece of code on an 8-core machine, the CPU load can look as follows. The peak you see on all eight cores is the parallel execution of the call to std::transform().

Utility Functions

C++17 also includes a couple of handy utility functions that are not really algorithms, but still useful to know.

clamp()

std::clamp(value, low, high) is defined in the <algorithm> header. It ensures that a given value is within a given range [low, high]. The result of calling clamp() is:

a reference to low if value < low
a reference to high if value > high
otherwise, a reference to the given value

One use-case is to clamp audio samples to a 16-bit range:

using namespace std;
const int low = -32'768;
const int high = 32'767;
cout << clamp(12'000, low, high) << '\n';
cout << clamp(-36'000, low, high) << '\n';
cout << clamp(40'000, low, high) << '\n';

The output of this code snippet is as follows:

12000
-32768
32767

gcd() and lcm()

std::gcd() returns the greatest common divisor of two integer types, while lcm() returns the least common multiple of two integer types. Both algorithms are defined in the <numeric> header.

Using these algorithms is straightforward, for example:

cout << gcd(24, 44) << '\n';
cout << lcm(24, 44) << '\n';

The output is as follows:

4
264

Removed Algorithms

C++17 has removed one algorithm: std::random_shuffle(). This algorithm was previously already marked as deprecated by C++14. You should use std::shuffle() instead.

Standard Library Algorithms: Changes and Additions in C++17

Today we have a guest post from Marc Gregoire, Software Architect at Nikon Metrology and Microsoft MVP since 2007.

New Algorithms

Sampling

Iterating

Searching

Generalized Sum Algorithms

Scan

Reduce

Parallel Algorithms

Utility Functions

clamp()

gcd() and lcm()

Removed Algorithms

Further Reading Material

Author

0 comments

Read next

Exploring Clang Tooling Part 1: Extending Clang-Tidy

Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query

Today we have a guest post from Marc Gregoire, Software Architect at Nikon Metrology and Microsoft MVP since 2007.

New Algorithms

Sampling

Iterating

Searching

Generalized Sum Algorithms

Scan

Reduce

Parallel Algorithms

Utility Functions

clamp()

gcd() and lcm()

Removed Algorithms

Further Reading Material

Author

0 comments

Read next

Exploring Clang Tooling Part 1: Extending Clang-Tidy

Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query

Stay informed