STL Fixes In VS 2017 RTM
VS 2017 RTM will be released soon. VS 2017 RC is available now and contains all of the changes described here – please try it out and send feedback through the IDE’s Help > Send Feedback > Report A Problem (or Provide A Suggestion).
This is the third and final post for what’s changed in the STL between VS 2015 Update 3 and VS 2017 RTM. In the first post (for VS 2017 Preview 4), we explained how 2015 and 2017 will be binary compatible. In the second post (for VS 2017 Preview 5), we listed what features have been added to the compiler and STL. (Since then, we’ve implemented P0504R0 Revisiting in_place_t/in_place_type_t<T>/in_place_index_t<I> and P0510R0 Rejecting variants Of Nothing, Arrays, References, And Incomplete Types.)
We’ve overhauled vector<T>’s member functions, fixing many runtime correctness and performance bugs.
* Fixed aliasing bugs. For example, the Standard permits v.emplace_back(v), which we were mishandling at runtime, and v.push_back(v), which we were guarding against with deficient code (asking “does this object live within our memory block?” doesn’t work in general). The fix involves performing our actions in a careful order, so we don’t invalidate whatever we’ve been given. Occasionally, to defend against aliasing, we must construct an element on the stack, which we do only when there’s no other choice (e.g. emplace(), with sufficient capacity, not at the end). (There is an active bug here, which is fortunately highly obscure – we do not yet attempt to rigorously use the allocator’s construct() to deal with such objects on the stack.) Note that our implementation follows the Standard, which does not attempt to permit aliasing in every member function – for example, aliasing is not permitted when range-inserting multiple elements, so we make no attempt to handle that.
* Fixed exception handling guarantees. Previously, we unconditionally moved elements during reallocation, starting with the original implementation of move semantics in VS 2010. This was delightfully fast, but regrettably incorrect. Now, we follow the Standard-mandated move_if_noexcept() pattern. For example, when push_back() and emplace_back() are called, and they need to reallocate, they ask the element: “Are you nothrow move constructible? If so, I can move you (it won’t fail, and it’ll hopefully be fast). Otherwise, are you copy constructible? If so, I’ll fall back to copying you (might be slow, but won’t damage the strong exception guarantee). Otherwise, you’re saying you’re movable-only with a potentially-throwing move constructor, so I’ll move you, but you don’t get the strong EH guarantee if you throw.” Now, with a couple of obscure exceptions, all of vector’s member functions achieve the basic or strong EH guarantees as mandated by the Standard. (The first exception involves questionable Standardese, which implies that range insertion with input-only iterators must provide the strong guarantee when element construction from the range throws. That’s basically unimplementable without heroic measures, and no known implementation has ever attempted to do that. Our implementation provides the basic guarantee: we emplace_back() elements repeatedly, then rotate() them into place. If one of the emplace_back()s throw, we may have discarded our original memory block long ago, which is an observable change. The second exception involves “reloading” proxy objects (and sentinel nodes in the other containers) for POCCA/POCMA allocators, where we aren’t hardened against out-of-memory. Fortunately, std::allocator doesn’t trigger reloads.)
* Eliminated unnecessary EH logic. For example, vector’s copy assignment operator had an unnecessary try-catch block. It just has to provide the basic guarantee, which we can achieve through proper action sequencing.
* Improved debug performance slightly. Although this isn’t a top priority for us (in the absence of the optimizer, everything we do is expensive), we try to avoid severely or gratuitously harming debug perf. In this case, we were sometimes unnecessarily using iterators in our internal implementation, when we could have been using pointers.
* Improved iterator invalidation checks. For example, resize() wasn’t marking end iterators as being invalidated.
* Improved performance by avoiding unnecessary rotate() calls. For example, emplace(where, val) was calling emplace_back() followed by rotate(). Now, vector calls rotate() in only one scenario (range insertion with input-only iterators, as previously described).
* Locked down access control. Now, helper member functions are private. (In general, we rely on _Ugly names being reserved for implementers, so public helpers aren’t actually a bug.)
* Improved performance with stateful allocators. For example, move construction with non-equal allocators now attempts to activate our memmove() optimization. (Previously, we used make_move_iterator(), which had the side effect of inhibiting the memmove() optimization.) Note that a further improvement is coming in VS 2017 Update 1, where move assignment will attempt to reuse the buffer in the non-POCMA non-equal case.
Note that this overhaul inherently involves source breaking changes. Most commonly, the Standard-mandated move_if_noexcept() pattern will instantiate copy constructors in certain scenarios. If they can’t be instantiated, your program will fail to compile. Also, we’re taking advantage of other operations that are required by the Standard. For example, N4618 23.2.3 [sequence.reqmts] says that a.assign(i,j) “Requires: T shall be EmplaceConstructible into X from *i and assignable from *i.” We’re now taking advantage of “assignable from *i” for increased performance.
The compiler has an elaborate system for warnings, involving warning levels and push/disable/pop pragmas. Compiler warnings apply to both user code and STL headers. Other STL implementations disable all compiler warnings in “system headers”, but we follow a different philosophy. Compiler warnings exist to complain about certain questionable actions, like value-modifying sign conversions or returning references to temporaries. These actions are equally concerning whether performed directly by user code, or by STL function templates performing actions on behalf of users. Obviously, the STL shouldn’t emit warnings for its own code, but we believe that it’s undesirable to suppress all warnings in STL headers.
For many years, the STL has attempted to be /W4 /analyze clean (not /Wall, that’s different), verified by extensive test suites. Historically, we pushed the warning level to 3 in STL headers, and further suppressed certain warnings. While this allowed us to compile cleanly, it was overly aggressive and suppressed desirable warnings.
Now, we’ve overhauled the STL to follow a new approach. First, we detect whether you’re compiling with /W3 (or weaker, but you should never ever do that) versus /W4 (or /Wall, but that’s technically unsupported with the STL and you’re on your own). When we sense /W3 (or weaker), the STL pushes its warning level to 3 (i.e. no change from previous behavior). When we sense /W4 (or stronger), the STL now pushes its warning level to 4, meaning that level 4 warnings will now be applied to our code. Additionally, we have audited all of our individual warning suppressions (in both product and test code), removing unnecessary suppressions and making the remaining ones more targeted (sometimes down to individual functions or classes). We’re also suppressing warning C4702 (unreachable code) throughout the entire STL; while this warning can be valuable to users, it is optimization-level-dependent, and we believe that allowing it to trigger in STL headers is more noisy than valuable. We’re using two internal test suites, plus libc++’s open-source test suite, to verify that we’re not emitting warnings for our own code.
Here’s what this means for you. If you’re compiling with /W3 (which we discourage), you should observe no major changes. Because we’ve reworked and tightened up our suppressions, you might observe a few new warnings, but this should be fairly rare. (And when they happen, they should be warning about scary things that you’ve asked the STL to do. If they’re noisy and undesirable, report a bug.) If you’re compiling with /W4 (which we encourage!), you may observe warnings being emitted from STL headers, which is a source breaking change with /WX, but a good one. After all, you asked for level-4 warnings, and the STL is now respecting that. For example, various truncation and sign-conversion warnings will now be emitted from STL algorithms depending on the input types. Additionally, non-Standard extensions being activated by input types will now trigger warnings in STL headers. When this happens, you should fix your code to avoid the warnings (e.g. by changing the types you pass to the STL, correcting the signatures of your function objects, etc.). However, there are escape hatches.
First, the macro _STL_WARNING_LEVEL controls whether the STL pushes its warning level to 3 or 4. It’s automatically determined by inspecting /W3 or /W4 as previously described, but you can override this by defining the macro project-wide. (Only the values 3 and 4 are allowed; anything else will emit a hard error.) So, if you want to compile with /W4 but have the STL push to level 3 like before, you can request that.
Second, the macro _STL_EXTRA_DISABLED_WARNINGS (which will always default to be empty) can be defined project-wide to suppress chosen warnings throughout STL headers. For example, defining it to be 4127 6326 would suppress “conditional expression is constant” and “Potential comparison of a constant with another constant” (we should be clean for those already, this is just an example).
Correctness fixes and other improvements:
* STL algorithms now occasionally declare their iterators as const. Source breaking change: iterators may need to mark their operator* as const, as required by the Standard.
* basic_string iterator debugging checks emit improved diagnostics.
* basic_string’s iterator-range-accepting functions had additional overloads for (char *, char *). These additional overloads have been removed, as they prevented string.assign(“abc”, 0) from compiling. (This is not a source breaking change; code that was calling the old overloads will now call the (Iterator, Iterator) overloads instead.)
* basic_string range overloads of append, assign, insert, and replace no longer require the basic_string’s allocator to be default constructible.
* basic_string::c_str(), basic_string::data(), filesystem::path::c_str(), and locale::c_str() are now SAL annotated to indicate that they are null terminated.
* array::operator() is now SAL annotated for improved code analysis warnings. (Note: we aren’t attempting to SAL annotate the entire STL. We consider such annotations on a case-by-case basis.)
* condition_variable_any::wait_until now accepts lower-precision time_point types.
* stdext::make_checked_array_iterator’s debugging checks now allow iterator comparisons allowed by C++14’s null forward iterator requirements.
* Improved <random> static_assert messages, citing the C++ Working Paper’s requirements.
* We’ve further improved the STL’s defenses against overloaded operator,() and operator&().
* replace_copy() and replace_copy_if() were incorrectly implemented with a conditional operator, mistakenly requiring the input element type and the new value type to be convertible to some common type. Now they’re correctly implemented with an if-else branch, avoiding such a convertibility requirement. (The input element type and the new value type need to be writable to the output iterator, separately.)
* The STL now respects null fancy pointers and doesn’t attempt to dereference them, even momentarily. (Part of the vector overhaul.)
* Various STL member functions (e.g. allocator::allocate(), vector::resize()) have been marked with _CRT_GUARDOVERFLOW. When the /sdl compiler option is used, this expands to __declspec(guard(overflow)), which detects integer overflows before function calls.
* In <random>, independent_bits_engine is mandated to wrap a base engine (N4618 184.108.40.206 [rand.req.adapt]/5, /8) for construction and seeding, but they can have different result_types. For example, independent_bits_engine can be asked to produce uint64_t by running 32-bit mt19937. This triggers truncation warnings. The compiler is correct because this is a physical, data-loss truncation – however, it is mandated by the Standard. We’ve added static_cast, which silences the compiler without affecting codegen.
* Fixed a bug in std::variant which caused the compiler to fill all available heap space and exit with an error message when compiling std::get<T>(v) for a variant v such that T is not a unique alternative type. For example, std::get<int>(v) or std::get<char>(v) when v is std::variant<int, int>.
Runtime performance improvements:
* basic_string move construction, move assignment, and swap performance was tripled by making them branchless in the common case that Traits is std::char_traits and the allocator pointer type is not a fancy pointer. We move/swap the representation rather than the individual basic_string data members.
* The basic_string::find(character) family now works by searching for a character instead of a string of size 1.
* basic_string::reserve no longer has duplicate range checks.
* In all basic_string functions that allocate, removed branches for the string shrinking case, as only reserve does that.
* stable_partition no longer performs self-move-assignment. Also, it now skips over elements that are already partitioned on both ends of the input range.
* shuffle and random_shuffle no longer perform self-move-assignment.
* Algorithms that allocate temporary space (stable_partition, inplace_merge, stable_sort) no longer pass around identical copies of the base address and size of the temporary space.
* The filesystem::last_write_time(path, time) family now issues 1 disk operation instead of 2.
* Small performance improvement for std::variant’s visit() implementation: do not re-verify after dispatching to the appropriate visit function that all variants are not valueless_by_exception(), because std::visit() already guarantees that property before dispatching. Negligibly improves performance of std::visit(), but greatly reduces the size of generated code for visitation.
Compiler throughput improvements:
* Source breaking change: <memory> features that aren’t used by the STL internally (uninitialized_copy, uninitialized_copy_n, uninitialized_fill, raw_storage_iterator, and auto_ptr) now appear only in <memory>.
* Centralized STL algorithm iterator debugging checks.