# Standard Library Algorithms: Changes and Additions in C++17 Visual CPP Team

#### Today we have a guest post from Marc Gregoire, Software Architect at Nikon Metrology and Microsoft MVP since 2007.

The C++14 standard already contains a wealth of different kinds of algorithms. C++17 adds a couple more algorithms and updates some existing ones. This article explains what’s new and what has changed in the C++17 Standard Library.

## New Algorithms

### Sampling

C++17 includes the following new sampling algorithm:

• `sample(first, last, out, n, gen)`

It uses the given random number generator (`gen`) to pick `n` random elements from a given range [`first`, `last`) and writes them to the given output iterator (`out`).

Here is a simple piece of code that constructs a vector containing the integers 1 to 20. It then sets up a random number generator, and finally generates 10 sequences of 5 values, in which each value is randomly sampled from the `data` vector:

```using namespace std;

vector<int> data(20);
iota(begin(data), end(data), 1);
copy(cbegin(data), cend(data), ostream_iterator<int>(cout, " "));
cout << '\n';

random_device seeder;
const auto seed = seeder.entropy() ? seeder() : time(nullptr);
default_random_engine generator(
static_cast<default_random_engine::result_type>(seed));

const size_t numberOfSamples = 5;
vector<int> sampledData(numberOfSamples);

for (size_t i = 0; i < 10; ++i)
{
sample(cbegin(data), cend(data), begin(sampledData),
numberOfSamples, generator);
copy(cbegin(sampledData), cend(sampledData),
ostream_iterator<int>(cout, " "));
cout << '\n';
}

Here is an example of a possible output:```
```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4 8 9 17 19
4 7 12 13 18
5 7 8 14 18
1 4 5 10 20
2 4 8 13 17
2 3 4 5 20
4 7 8 9 13
1 7 8 10 15
4 5 8 12 13
1 3 8 10 19```

### Iterating

The C++ Standard Library already included `for_each()` to process each element in a given range. C++17 adds a `for_each_n(first, n, func)` algorithm. It calls the given function object (`func`) for each element in the range given by a first iterator (`first`) and a number of elements (`n`). As such, it is very similar to `for_each()`, but `for_each_n()` only processes the first `n` elements of the range.

Here is a simple example that generates a vector of 20 values, then uses `for_each_n()` to print the first 5 values to the console:

```using namespace std;

vector<int> data(20);
iota(begin(data), end(data), 1);

for_each_n(begin(data), 5,
[](const auto& value) { cout << value << '\n'; });```

### Searching

C++17 includes a couple of specialized searchers, all defined in `<functional>`:

• `default_searcher`
• `boyer_moore_searcher`
• `boyer_moore_horspool_searcher`

The Boyer-Moore searchers are often used to find a piece of text in a large block of text, and are usually more efficient than the default searcher. In practice, the two Boyer-Moore searchers are able to skip certain characters instead of having to compare each individual character. This gives these algorithms a sublinear complexity, making them much faster than the default searcher. See the Wikipedia article for more details of the algorithm.

To use these specialized searchers, you create an instance of one of them and pass that instance as the last parameter to `std::search()`, for example:

```using namespace std;

const string haystack = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";
const string needle = "consectetur";

const auto result = search(cbegin(haystack), cend(haystack),
boyer_moore_searcher(cbegin(needle), cend(needle)));

if (result != cend(haystack))
cout << "Found it.\n";
else

If you want to carry out multiple searches on the same range, you can construct a single instance of `std::boyer_moore_searcher` and reuse it, rather than creating a new one for each `std::search()` call.

### Generalized Sum Algorithms

#### Scan

The following generalized sum algorithms have been added to the C++17 Standard Library:

• `exclusive_scan(first, last, out, init[, bin_op])`
• `inclusive_scan(first, last, out[, bin_op[, init]])`
• `transform_exclusive_scan(first, last, out, init, bin_op, un_op)`
• `transform_inclusive_scan(first, last, out, bin_op, un_op[, init])`

Where `bin_op` is a binary operator (`std::plus<>()` by default), and un_op is a unary operator.

All these algorithms calculate a sequence of sums of the elements in a given range [`first`, `last`), denoted as [e0, en). The calculated sums are written to [`out`, `out + (last-first)`), denoted as [s0, sn). Suppose further that we denote the binary operator (`bin_op`) as ⊕. The `exclusive_scan()` algorithm then calculates the following sequence of sums:

```s0 = init
s1 = init ⊕ e0
s2 = init ⊕ e0 ⊕ e1
...
sn-1 = init ⊕ e0 ⊕ e1 ⊕ ... ⊕ en−2```

While `inclusive_scan()` calculates the following sums:

```s0 = init ⊕ e0
s1 = init ⊕ e0 ⊕ e1
...
sn-1 = init ⊕ e0 ⊕ e1 ⊕ ... ⊕ en−1```

The only difference is that `inclusive_scan()` includes the ith element in the ith sum, while `exclusive_scan()` does not include the ith element in the ith sum.

`exclusive_scan()` and `inclusive_scan()` are similar to `partial_sum()`. However, `partial_sum()` evaluates everything from left to right, while `exclusive_scan()` and `inclusive_scan()` evaluate everything in a non-deterministic order. That means that the result of these will be non-deterministic if the binary operator that is used is not associative. Because of the non-deterministic order, these algorithms can be executed in parallel by specifying a parallel execution policy, see later.

The sums calculated by `inclusive_scan()` with `init` equal to 0 and an associative binary operator are exactly the same as the sums calculated by `partial_sum()`.

`transform_exclusive_scan()` and `transform_inclusive_scan()` are very similar. The only difference is that they apply the given unary operator before calculating the sum. Suppose the unary operator is denoted as a function call f(). The `transform_exclusive_scan()` algorithm then calculates the following sums sequence:

```s0 = init
s1 = init ⊕ f(e0)
s2 = init ⊕ f(e0) ⊕ f(e1)
...
sn-1 = init ⊕ f(e0) ⊕ f(e1) ⊕ ... ⊕ f(en−2)```

And the transform_inclusive_scan() algorithm calculates the following sums:

```s0 = init ⊕ f(e0)
s1 = init ⊕ f(e0) ⊕ f(e1)
...
sn-1 = init ⊕ f(e0) ⊕ f(e1) ⊕ ... ⊕ f(en−1)```

#### Reduce

• ` reduce(first, last[, init[, bin_op]])`
• ` transform_reduce(first, last, init, bin_op, un_op)`

Where `bin_op` is a binary operator, `std::plus<>()` by default. These algorithms result in a single value, similar to `accumulate()`.

Suppose again that the range [`first`, `last`) is denoted as [e0, en). `reduce()` then calculates the following sum:

`init ⊕ e0 ⊕ e1 ⊕ ... ⊕ en−1`

While transform_reduce() results in the following sum, assuming the unary operator is denoted as a function call f():

`init ⊕ f(e0) ⊕ f(e1) ⊕ ... ⊕ f(en−1)`

Unlike `accumulate()`, `reduce()` supports parallel execution. The `accumulate()` algorithm always evaluates everything deterministically from left to right, while the evaluation order is non-deterministic for `reduce()`. A consequence is that the result of `reduce()` will be non-deterministic in case the binary operator is not associative or not commutative.

The sum calculated by `reduce()` with `init` equal to 0 is exactly the same as the result of calling `accumulate()` as long as the binary operator that is used is associative and commutative.

Finally, there is another set of overloads for `transform_reduce()`:

• ` transform_reduce(first1, last1, first2, init[, bin_op1, bin_op2])`

It requires two ranges: a range [`first1`, `last1`), denoted as [a0, an), and a range starting at `first2`, denoted as [b0, bn). Suppose `bin_op1` (`std::plus<>()` by default) is denoted as ⊕, and `bin_op2` (`std::multiplies<>()` by default) is denoted as ⊖, then it calculates the following sum:

`init ⊕ (a0 ⊖ b0) ⊕ (a1 ⊖ b1) ⊕ ... ⊕ (an-1 ⊖ bn-1)`

## Parallel Algorithms

A major addition to the C++17 Standard Library is support for parallel execution of more than 60 of its algorithms, such as `sort()`, `all_of()`, `find()`, `transform()`, …

If a Standard Library algorithm supports parallel execution, then it accepts an execution policy as its first parameter. This policy determines to what degree the algorithm may parallelize or vectorize its execution. Currently, the following policy types and instances are defined in the `std::execution` namespace in the `<execution>` header:

 Execution Policy Type Global Instance Description sequenced_policy `seq` No parallel execution is allowed. parallel_policy `par` Parallel execution is allowed. parallel_unsequenced_policy `par_unseq` Parallel and vectorized execution is allowed. Execution is also allowed to switch between different threads.

The `parallel_unsequenced_policy` imposes a lot of restrictions on what the algorithm’s function callbacks are allowed to do. With that policy, the calls to the callbacks are unsequenced. As such, its callbacks are not allowed to perform memory allocation/deallocation, acquire mutexes, and more. The other policies do not have such restrictions, and in fact guarantee that their callback calls are sequenced, although of course they can be in a non-deterministic order. In any case, you are responsible to prevent data races and deadlocks.

Using these parallel policies is straightforward. Here is a quick example that generates a `vector` of 1 billion double values, then uses the `std::transform()` algorithm to calculate the square root of each value in parallel:

```using namespace std;

vector<double> data(1'000'000'000);
iota(begin(data), end(data), 1);

transform(execution::par_unseq, begin(data), end(data), begin(data),
[](const auto& value) { return sqrt(value); });```

If you run this piece of code on an 8-core machine, the CPU load can look as follows. The peak you see on all eight cores is the parallel execution of the call to `std::transform()`. ## Utility Functions

C++17 also includes a couple of handy utility functions that are not really algorithms, but still useful to know.

### clamp()

`std::clamp(value, low, high)` is defined in the `<algorithm>` header. It ensures that a given value is within a given range [`low`, `high`]. The result of calling `clamp()` is:

• a reference to `low` if `value` < `low`
• a reference to `high` if `value` > `high`
• otherwise, a reference to the given `value`

One use-case is to clamp audio samples to a 16-bit range:

```using namespace std;
const int low = -32'768;
const int high = 32'767;
cout << clamp(12'000, low, high) << '\n';
cout << clamp(-36'000, low, high) << '\n';
cout << clamp(40'000, low, high) << '\n';```

The output of this code snippet is as follows:

```12000
-32768
32767```

### gcd() and lcm()

`std::gcd()` returns the greatest common divisor of two integer types, while `lcm()` returns the least common multiple of two integer types. Both algorithms are defined in the `<numeric>` header.

Using these algorithms is straightforward, for example:

```cout << gcd(24, 44) << '\n';
cout << lcm(24, 44) << '\n';```

The output is as follows:

```4
264```

## Removed Algorithms

C++17 has removed one algorithm: std::random_shuffle(). This algorithm was previously already marked as deprecated by C++14. You should use `std::shuffle()` instead.