C++14 STL Features, Fixes, And Breaking Changes In Visual Studio 14 CTP1
I’m Microsoft’s STL maintainer, and once again we’ve got about a year’s worth of work to tell you about. (“We” means P.J. Plauger of Dinkumware for most features, myself for most fixes and Library Issue resolutions, plus fixes contributed by our libraries dev lead Artur Laksberg and our CRT maintainer James McNellis.)
If you missed the announcement, you can download VS14 CTP1 right now (pay attention to where they say “in a virtual machine, or on a computer that is available for reformatting”), and VS14 RTM “will most likely be available sometime in 2015”.
Please note that in this post, I’m covering the changes between 2013 RTM and VS14 CTP1 – that is, the stuff listed here is what’s new in VS14 CTP1. (For example, N3656 “make_unique (Revision 1)” shipped in 2013 RTM, so it isn’t listed here.)
We’ve implemented the following features which were voted into C++14, plus one Technical Specification:
N3642 <chrono>/<string> UDLs
N3644 Null Forward Iterators
N3657 Heterogeneous Associative Lookup
N3671 Dual-Range equal()/is_permutation()/mismatch()
N3779 <complex> UDLs
N3940 Filesystem “V3” Technical Specification
Note that <complex>’s operator””if() overloads for imaginary floats were #if 0’ed due to missing compiler support. (The problem is that “if” is a keyword. C++14 says that when operator””if() is written without spaces, “if” won’t be treated as a keyword, so it’s okay. Yeah, this is a wacky rule.) The compiler was later fixed to support this special rule, so I’ve removed the #if 0 in my next batch of changes – but they haven’t been checked in yet, so they aren’t available in VS14 CTP1.
Also note that our <filesystem> V3 machinery is still being defined in V2’s namespace std::tr2::sys. That’s because we did this work when N3803 (published October 2013) was the latest draft, and it specified a placeholder “to be determined” namespace std::tbd::filesystem. The current draft N3940 (published March 2014) specifies std::experimental::filesystem::v1, and changing namespaces is on our todo list.
Furthermore, we’ve implemented the following Library Issue resolutions which were voted into C++14:
LWG 2097 packaged_task constructors should be constrained
LWG 2104 unique_lock move-assignment should not be noexcept
LWG 2112 User-defined classes that cannot be derived from
LWG 2144 Missing noexcept specification in type_index
LWG 2145 error_category default constructor
LWG 2162 allocator_traits::max_size missing noexcept
LWG 2174 wstring_convert::converted() should be noexcept
LWG 2176 Sp
ecial members for wstring_convert and wbuffer_convert
LWG 2187 vector<bool> is missing emplace and emplace_back member functions
LWG 2193 Default constructors for standard library containers are explicit
LWG 2247 Type traits and std::nullptr_t
LWG 2268 Setting a default argument in the declaration of a member function assign of std::basic_string
LWG 2272 quoted should use char_traits::eq for character comparison
LWG 2278 User-defined literals for Standard Library types
LWG 2285 make_reverse_iterator
LWG 2306 match_results::reference should be value_type&, not const value_type&
LWG 2315 weak_ptr should be movable
LWG 2324 Insert iterator constructors should use addressof()
LWG 2329 regex_match()/regex_search() with match_results should forbid temporary strings
LWG 2332 regex_iterator/regex_token_iterator should forbid temporary regexes
LWG 2339 Wording issue in nth_element
LWG 2344 quoted()’s interaction with padding is unclear
LWG 2346 integral_constant’s member functions should be marked noexcept
GB 9 Remove gets()
The story for noexcept is somewhat complicated. We have internal _NOEXCEPT and _THROW0() macros (not for public consumption) which currently expand to “throw ()” (which in turn is treated by the compiler as a synonym for __declspec(nothrow), differing from C++98-14’s Standard semantics for “throw ()”). These macros should expand to noexcept, but we’ve been prevented from doing so by a series of relatively minor compiler bugs, mostly involving C++14’s rules for implicit noexcept on destructors. (As the STL’s implementation is inherently complex and widely used, it serves as a stringent test for compiler features.) The good news is that these compiler bugs have been fixed, and I’ve been able to switch the STL’s macros over to using real noexcept in my next batch of changes (with all of the STL’s tests passing). Unfortunately, this isn’t available in VS14 CTP1. (Additionally, we’re still ironing out problems with conditional noexcept, which the STL is supposed to use in a few places. Currently, our macros for that expand to nothing.)
As for gets(), which was removed from C11 and C++14 (note: C++14 still incorporates the C99 Standard Library, but has taken this change from C11 as a special exception), our CRT’s <stdio.h> is still providing ::gets(), but our STL’s <cstdio> is no longer providing std::gets().
We’ve also implemented an optimization, contributed by Eric Brumer from the compiler back-end team. The compiler’s autovectorization really loves highly-aligned memory, so we’ve changed std::allocator to automatically return highly-aligned memory for large allocations where it’ll potentially make a difference in exchange for minimal overhead. If you’re curious, the magic numbers we’re currently using are that we’ll activate this special behavior for 4096-byte or larger allocations, and we’ll align them to (at least) 32 bytes (256 bits), although we absolutely reserve the right to modify this in the future. (Currently, we’re doing this for x86 and x64, but not ARM – we haven’t observed performance benefits due to over-alignment on that platform yet.) Note that to avoid mismatch nightmares, this behavior cannot be disabled – it is activated regardless of whether you’ve asked the compiler to autovectorize, or even to emit AVX/etc. instructions at all.
My introductory notes when I wrote about the STL fixes in VC 2013 continue to apply here. Speaking of which, after I wrote that post, I was able to get a couple more fixes checked into 2013 RTM, but I never found the time to go back and update that post. So for completeness, the following fixes shipped in 2013 RTM: std::bind() now calls std::tie() with qualification in order to avoid being confused by boost::tie() (DevDiv#728471/Connect#792163), and std::function’s constructor now avoids crashing when out of memory (DevDiv#748972).
Additionally, we thought we had fixed the bug in iostreams where it was misparsing floating-point, but shortly before 2013 RTM we discovered a regression and reverted the change. We’re working on this again for VS14, but we’re still aware of problems in this area.
Now, let’s look at the fixes that are available in VS14 CTP1. We’ve p
erformed a couple of major overhauls:
* <chrono>’s clocks had several problems. high_resolution_clock wasn’t high resolution (DevDiv#349782/Connect#719443) and steady_clock and the CRT’s clock() weren’t steady (DevDiv#454551/Connect#753115). We’ve fixed this by making high_resolution_clock a typedef for steady_clock (as permitted by the Standard), which is now powered by QueryPerformanceCounter(), which is high resolution and meets the Standard’s requirements for steadiness/monotonicity. As a result, steady_clock::time_point is now a typedef for chrono::time_point<steady_clock> (DevDiv#930226/Connect#858357), although strictly conformant code should not assume this. (N3936 188.8.131.52 [time.clock.steady]/1 says that steady_clock::time_point is chrono::time_point<unspecified, chrono::duration<unspecified, ratio<unspecified, unspecified>>>.) Independently, the CRT’s clock() was reimplemented with QueryPerformanceCounter(). (Note that while this is a significant improvement, it still doesn’t conform to the C Standard’s requirement for clock() to return “processor time”, which can advance slower or faster than one second per physical second depending on how many cores are being used. Our CRT maintainer James McNellis believes that changing clock()’s behavior like that could break existing code – and for the record, I completely agree that this would be too scary to change.) Additionally, we received a bug report about system_clock, asking whether it should return local time (time-zone-dependent) instead of UTC (DevDiv#756378). The Standard is vague about this topic (184.108.40.206 [time.clock.system]/1 “Objects of class system_clock represent wall clock time from the system-wide realtime clock.”, wow that’s so helpful!). Our implementation used GetSystemTimeAsFileTime(), which returns UTC. After thinking about this issue, I concluded that UTC is strongly desirable here (programs should use UTC everywhere, performing time-zone adjustments for user I/O only). I also checked with GCC/libstdc++ and clang/libc++’s maintainers, who confirmed that their implementations also return UTC. So while I declined to change this behavior, I improved system_clock’s implementation while I was in the neighborhood. Now we call GetSystemTimePreciseAsFileTime() when it’s available from the OS (Win8+), which has massively better resolution. Note that the CRT/STL’s OS-sensing behavior is automatic and requires no input from the user-programmer (i.e. it is not controlled by macros).
* <atomic>’s compiletime correctness, runtime correctness, and performance have been improved. We’ve eradicated the last of our x86 inline assembly code, replacing it with intrinsics for improved performance. (In these functions, the 8-byte atomics for x86, we’re still an instruction or two away from being optimal, so we’ve requested new intrinsics from the compiler back-end team.) We fixed a couple of runtime correctness bugs in the compare_exchange family of functions. First, now we always perform the mapping specified by 29.6.5 [atomics.types.operations.req]/21 “When only one memory_order argument is supplied, the value of success is order, and the value of failure is order except that a value of memory_order_acq_rel shall be replaced by the value memory_order_acquire and a value of memory_order_release shall be replaced by the value memory_order_relaxed.” (DevDiv#879907/Connect#817225). Second, we fixed a bug in atomic<T *>’s compare_exchange where we were unconditionally writing to “expected” (DevDiv#887644/Connect#819819), while /21 says that the write must be conditional: “Atomically, compares the contents of the memory pointed to by object or by this for equality with that in expected, and if true, replaces the contents of the memory pointed to by object or by this with that in desired, and if false, updates the contents of the memory in expected with the contents of the memory pointed to by object or by this.” This fix also improved performance. (Note that this was specific to atomic<T *>; atomic<integral> was unaffected.) We also fixed several compiler errors. Each atomic_meow is now a typedef for atomic<meow>, so “atomic_int atom(1729);” now compiles (DevDiv#350397/Connect#720151), and we fixed compiler errors in atomic<const T *> (DevDiv#829873/Connect#809351, DevDiv#879700/Connect#817201) and volatile atomic<T> (DevDiv#846428/Connect#811913). Finally, we improved the performance of atomic construction – 29.6.5 [atomics.types.operations.req]/5 says “Initialization is not an atomic operation” but we were unnecessarily using atomic instructions for initialization.
Individual fixes in no particular order:
* C++11’s minimal allocator interface is awesome, but it means that STL implementations have to do extra work in order to deal with user-defined allocators that lack portions of C++03’s verbose allocator interface (e.g. nested rebind structs). In 2013 RTM (thanks to variadic templates) we completed the machinery needed to adapt minimal allocators to the verbose interface, but we didn’t consistently use it throughout the STL (DevDiv#781187/Connect#800709). So for VS14 CTP1, we audited the entire STL and fixed all the problems, so now anything that takes an allocator will accept the minimal interface. Notably, std::function, shared_ptr/allocate_shared(), and basic_string were fixed.
* <chrono>’s duration % duration, duration % rep, and duration / rep have been fixed to follow the Standard – previously they would fail to compile in various situations (DevDiv#742944/Connect#794649).
* The STL now supports the /Gv compiler option (/Gd, /Gr, and /Gz were already supported), as well as functions explicitly marked with __vectorcall (DevDiv#793009/Connect#804357). We have a test to verify the former by including all STL headers under /Gv. For the latter, __vectorcall will work wherever __stdcall/etc. works – which isn’t everywhere (that’s tracked by a separate bug, still active).
* The STL now supports the /Zc:strictStrings compiler option (DevDiv#784218). C++03 permitted (but ISO-deprecated) conversions from string literals to modifiable char *. C++11 removed this conversion, and /Zc:strictStrings enforces this prohibition. While /Zc:strictStrings is currently off by default, I strongly encourage using it.
* In 2006, <locale>’s implementation was damaged in an obscure but extremely pernicious way, specific to x64 in debug mode (DevDiv#447546/Connect#750951, DevDiv#755427/Connect#796566). With custom allocation functions (including globally replaced operator new/delete()), custom-allocated facets would be deallocated with free(), and then the world would explode. I eventually figured out the full extent of the problem and thoroughly eradicated it forever.
* Working in conjunction with compiler fixes from Xiang Fan, we’ve changed the STL’s headers to dramatically reduce object file sizes (and static library sizes) by avoiding the emission of unused machinery (DevDiv#888567/Connect#820750). Such unused machinery was typically discarded by the linker, so EXE/DLL sizes should be unchanged (although they may experience minor improvements). For example, when compiling a file (for x86 with /MD /O2) that includes all C and C++ Standard Library headers and does nothing else with them, VS 2013 emitted a 731 KB object file, while VS14 CTP1 emits less than 1 KB.
* C++11 requires STL implementations to tolerate overloaded address-of operators. VS 2013’s containers did, but not all of its algorithms (DevDiv#758134/Connect#797008). Additionally, STL implementations are required to tolerate overloaded comma operators (“because nothing forbids them”), which is problematic for algorithms that take potentially-user-defined iterators and say things like “++iter1, ++iter2” in their for-loops (DevDiv#758138/Connect#797012). We’ve audited all STL algorithms, with all permutations of iterator strengths, for address-of/comma issues. We’ve fixed all of them (by adding a handful of addressof() calls and eleventy zillion (void) casts), and we’ve added a test to ensure that they stay fixed.
* Since 2005, we’ve shipped debug checks that detect and complain about invalid inputs to STL algorithms (like transposed iterators). However, they’ve been slightly too aggressive, complaining about null pointers passed as iterators even when the Standard says that they’re perfectly valid. For example, merging two [null, null) ranges to a null output is a valid no-op. We’ve audited every STL algorithm and fixed their debug checks to accept null pointers validly passed as iterators, while still rejecting invalid scenarios for null pointers. (For example, [non-null, null) is a bogus range.) This resolves long-standing bug reports (DevDiv#253803/Connect#683214, DevDiv#420517/Connect#741478, DevDiv#859062/Connect#813652).
* C++11’s binary search algorithms are required to work with heterogeneous types, where the types of the range’s elements and the given value can differ, and the range’s elements might not even be comparable to each other. We fixed lower_bound() and upper_bound() years ago, but missed equal_range() (DevDiv#813065/Connect#807044). We left a C++03-era debug check in equal_range(), which was bad for two reasons: (1) it tried to verify that the input range was sorted, but C++11 doesn’t require element < element to compile, and (2) this was a linear-time validation in a log-time algorithm which was always a bad idea! We’ve removed the offending debug check, so equal_range() now conforms to C++11. (However, equal_range() still contains another debug check. lower_bound() is given only elem < value and upper_bound() is given only value < elem so they just have to trust that it’s a valid comparison. equal_range() requires both elem < value and value < elem to compile, so we can use our usual debug check to verify that they’re never simultaneously true. This is okay because the asymptotic complexity of the algorithm is unaffected.)
* Our unordered associative containers didn’t provide the strong guarantee for single-element insertion and