C++20 added two new library features for those of you that like bit twiddling: “bit rotating and counting functions” and “integral power of two operations”. Most of the added function templates are simple numeric utility functions, and a handful map somewhat directly to common instructions on modern CPUs. We’ve implemented them in Visual Studio 2019 version 16.8 Preview 2. This post will tell you about our implementations and what processor-specific optimizations you can expect.
All the functions are fully constexpr enabled, which should be quite handy for things like computing lookup tables. I know I’ve seen these functions implemented in several numeric codebases, so hopefully having standard versions will be useful. The new functions are:
std::countl_zero, std::countr_zero
These count the number of zeros on the left (from most-significant-bit to least-significant-bit), or right (from least-significant-bit to most-significant-bit) respectively. On x86 and x64 platforms these emit the LZCNT
and TZCNT
instructions respectively. By default, the availability of the instructions is checked at runtime. The BSF
and BSR
instructions will be used if the runtime check fails. The runtime check is omitted when compiling with /arch:AVX2
or higher, since all CPUs that support AVX2 also support LZCNT
and TZCNT
. On ARM and ARM64 countl_zero
emits the CLZ
instruction; countr_zero
does not emit any special instructions on ARM
or ARM64
at this time.
Interestingly, LZCNT
and TZCNT
have a somewhat oddball instruction encoding on x86 and x64; they are encoded as REP BSF
and REP BSR
. The rep prefix is ignored on CPUs that don’t support LZCNT
or TZCNT
. This means that running code with LZCNT
or TZCNT
on a CPU that doesn’t support them will still run, but TZCNT
won’t have the correct output for zero and LZCNT
will both have the wrong output for zero and return the index of the first set bit starting from the least significant bit, which is the opposite of what it does on CPUs that actually support the instruction. This isn’t a very useful fallback, and while we tried to use it in <bit>
to simplify some code, it ended up being more trouble than it was worth.
Note that Visual Studio version 16.8 Preview 1 contains a bug related to LZCNT
usage on CPUs that only support bsr. This will be fixed in Visual Studio version 16.8 Preview 3.
std::popcount
std::popcount
counts the number of set bits in its input. On x86 and x64 platforms popcount
emits the POPCNT
instruction, again checking the availability of POPCNT
at runtime. If compiled with /arch:AVX
or higher no check is made as all CPUs that support AVX also support POPCNT
. No special instructions are emitted on ARM or ARM64 at this time.
std::countl_one, std::countr_one
Counts the number of ones on the left or right of their input; these rely on countl_zero
and countr_zero
and thus will use the same intrinsics as those
std::has_single_bit
Functionally equivalent to popcount(x) == 1
. has_single_bit
does not use any special instructions.
std::bit_ceil, std::bit_floor
Finds the nearest power of two above or below the input. If the input is already a power of two it is returned unchanged. If the result would not be representable in the input’s type, then the behavior is undefined (this occurs for example in bit_ceil(static_cast<unsigned char>(0b11111111))
). bit_ceil
is only allowed as a constant expression if this undefined behavior does not occur. bit_floor
returns zero when the input is zero.
std::rotl, std::rotr
Bitwise rotates the input to the left or right. Currently this does not explicitly emit any special instructions.
You can try all of these functions out today by downloading Visual Studio 2019 version 16.8 Preview 2 and compiling your code with /std:c++latest
. If you find any bugs in our library implementation report them by opening an issue at on our GitHub page. Please report any other Visual Studio issues on Developer Community. Don’t hesitate to contact us with any questions or suggestions.
You can also read find our reference documentation or the relevant C++ papers on <bit>
: P0553R4: Bit operations and P0556R3: Integral power-of-2 operations.
0 comments