C++20 STL Features: 1 Year of Development on GitHub
The talk contains complete examples (not snippets!) of several C++20 features: integer comparison functions, constexpr algorithms, uniform container erasure, atomic_ref, and span.
Here are the important links from the end of the talk:
- Repository: github.com/microsoft/STL
- Changelog: github.com/microsoft/STL/wiki/Changelog
- Status Chart: microsoft.github.io/STL/
- C++20: wg21.link/n4861
Finally, at the end of the talk I had time to answer a half-dozen questions, but there were many more. Here are those extra questions and my answers:
Q: Why do you squash pull requests instead of just merging them?
A: This significantly simplifies the branch’s history, since one squashed commit == one PR. You can still view the PR’s history on GitHub. Merges would create highly non-linear history (making it hard to figure out when things changed and why; the MSVC internal git repo is full of non-linear merges so we have unfortunate extensive experience with that). Most of the information from non-squashed merges would be uninteresting too – basically code review feedback, fixing bugs during development, etc. For highly unusual situations I could imagine wanting to sequence a PR as a series of commits that are then rebased-and-merged to the default branch, which we’d need to temporarily enable via policy, but generally having that history in the PR is sufficient.
Q: Regarding the atomic_ref, why not just specify relaxed access when you don’t want to pay the atomic penalty?
A: My understanding is that relaxed is still significantly more expensive than plain operations. For example, on x86/x64 for MSVC, atomic increments are implemented by _InterlockedIncrement which provides full sequential consistency, even if you asked for relaxed; I’ve heard that this costs somewhere around 10-100 cycles, whereas a plain increment is half a cycle or less. Even on ARM/ARM64, where there are _Meow_nf intrinsics (“no fence”) for relaxed, I believe they still imply additional costs compared to plain logic.
Q: Were you already expecting that open-sourcing your STL would improve the STL’s team throughput? Or were you afraid that collaborating with third-party contributors would carry too much overhead?
A: Great question – this was one of the top things we thought/worried about on the road to open-sourcing. I’d say we were prepared to take an overhead/throughput cost in the short term, while hoping for throughput improvements in the long term – and were pleasantly surprised that the short-term costs were less than expected, and that we’re already enjoying throughput gains – e.g. midpoint/lerp were lingering because we didn’t have deep numeric expertise, until statementreply contributed an amazing PR analyzing and fixing the remaining issues. I believe that major throughput gains are still to come – my plan/dream for C++23 and beyond is that proposals will be written with implementations based on our STL, such that a PR is ready to be reviewed and merged as soon as the proposal is accepted by WG21. (Bonus points for simultaneous contribution to libc++.) That will improve Standardization quality/throughput as well as the implementation.
Q: For shipped binaries is there an integration with Microsoft public facing symbols and sources servers so debugger will pull in correct version of sources during debugging?
A: The answer here is that there is no change to how the VS product is built and interacts with the symbol server, so everything will continue to work. GitHub is where we do all development, and we ensure that the repo is binary-identical to the MS-internal src/vctools/crt/github tree by replicating PRs over to MSVC. From there, the product is built, the sources are packaged into the VS Installer, and the PDBs are uploaded to the symbol server. In the far future, we may build official binaries through the GitHub CI system and then bundle them into VS through some mechanism – but we’re unsure how to do that right now, and it would involve a lot of work for unclear payoff. We should be able to achieve most of the time savings by simply finishing our build system migration and then getting the MS-internal MSVC MSBuild system (so much MS! 😹) to invoke the CMake/Ninja build system we use for GitHub; we already have such CMake invocations for the LLVM ASAN support libraries.
Q: Did you encounter cases when design in the standard isn’t as practical as it should be? Did you report this to the committee?
A: Yes, this happens fairly frequently. There’s a distinction between “this design isn’t great for implementers and/or users” and “this specification is unclear/inconsistent with other practice/internally inconsistent/violates conservation of momentum”. For the former (suboptimal design), we sometimes mention it to the Library Evolution Working Group, especially as new features are being developed, but it’s generally “too late” after a feature has been accepted into the Working Paper. (Not always, since features can be revised before the International Standard is published; one place this happened was span which received an unsigned size_type before C++20 was completed.) The latter (bogus specification) is common, and we report those to the Library Working Group (as LWG issues) which can usually be quickly resolved. In the meantime, we use our best judgement to implement what’s possible and what the Standard “should have said”.
Q: Why <charconv> does not work with wchar_t?
A: That’s a question for Jens Maurer who proposed the feature. My understanding is that charconv was meant as a minimal API, and the idea was that it would be primarily used with JSON and other APIs where char is sufficient. However, converting wchar_t to char and back, even for the limited purposes of float parsing, is highly inconvenient/slow, and to_chars ended up being much faster than anyone in L[E]WG realized was possible at the time (as Ulf Adams invented Ryu and Ryu Printf after the feature was accepted!), so the overhead of wchar_t conversion became even more significant. While charconv is extremely complicated, making it handle wchar_t would be a very simple matter of templatizing the codepaths that interact with the characters; the tables and core algorithm would not need to be replicated.
Q: Did the decision of open sourcing the code come top-down or the team had to fight up the chain to convince management this is a good idea?
A: An interesting question 😸 I think I can say that it was a bottom-up decision – Mahmoud Saleh (my boss, the VC Libraries dev lead) drove the process of getting approval, with support from the rest of the MSVC chain. We did have to convince our ultrabosses that it was a good idea, but it wasn’t a fight – it was a useful exercise of thinking through the rationale, the costs/benefits, and the consequences to working in the open. The top-down change in strategy definitely made this possible – going open-source was unthinkable for the MS of 10 years ago, and now we’re continually looking for places where it makes sense, including for foundational components like the STL and .NET Core (we spoke to that team as part of going open-source to understand the challenges and opportunities we were about to face, they were extremely helpful).
The opportunities that we’re looking for are where we can advance the interests of the entire C++ community, so when programmers think about the future of C++, they’ll naturally think of Microsoft. For example, all C++ programmers benefit when the major toolchains support the latest features, in a timely manner, at a high level of quality – so Microsoft has invested a ton of developer-years of effort in catching up in conformance, to the point where MSVC is often the first to implement new features. The STL was the most compelling opportunity to open-source for several reasons: it’s a relatively small code base and test suite (large in absolute terms – it’s half of the Standard, after all! – but smaller than the compiler or other massive projects), we were already shipping its source code for viewing so it was “just” a question of changing the license, the library is evolving increasingly quickly, and (perhaps most importantly) the library tends not to be deeply interconnected, so it’s possible to add or change things without understanding and changing everything else. Now that we have an open-source Standard Library like GCC’s libstdc++ and Clang/LLVM’s libc++, we hope that it’ll be easier to propose library features for Standardization, in a form that works well on all platforms.
Q: What is the best way to learn all the latest STL features? Is there an on-line cookbook? Functional style? Is there an expert on your team writing a book?
A: I would say that the best way to is to implement them 😹😹 None of the STL maintainers have time to write a book, but we’re working with Tyler Whitney from the Microsoft Docs team as he adds documentation for the various features we’ve implemented over the last few years. cppreference is also a good source of information built up by the community. I generally think that the best way to learn a feature, other than implementing it, is to try using it in toy examples first, to become familiar with the basics in a simple clean environment, followed by using it in a basic way in a real codebase, before getting to advanced uses. Trying to immediately use a new feature in a production codebase can be a headache since you might not immediately see when an issue is caused by attempting to use the feature itself incorrectly, or whether it’s caused by an interaction with the codebase (“I know how to use this feature generally, so what’s wrong here – oh, it’s that it requires copyability, but this type is move-only, okay” or whatever). If you find a better technique, let me know! It is also possible to read the Library Standardese directly – it is very detailed. The downsides are that it’s written in a somewhat strange style, and occasionally information is “hidden” elsewhere (e.g. the container specifications are highly centralized in an unusual way), but it’s generally possible to find function signatures and basic type requirements and value preconditions that way. The Core Language Standardese is much much harder to understand for ordinary humans (versus extraordinary compiler developers) – but of course I would say that, since I’m a library developer who specifically works on the STL because it’s easy compared to compiler development 🤣
Q: Is this part of the VS 2019 16.8.0 Preview 3.0?
A: Yes, all of the features that I described are available in that release today. We consider them to be at production quality, with the usual caveats that Preview releases aren’t “go-live” supported by VS, and that /std:c++latest is technically considered experimental and subject to change. (Note that we can and have broken ABI for /std:c++latest features – ABI lockdown will happen when we complete C++20 and add /std:c++20 in celebration. So anything built with /std:c++latest does need to be continually built with the latest toolset – but that shouldn’t be a problem if you want to live on the leading edge of C++!)
Q: When is vNext going to become a concrete version?
A: Our plans are still tentative and subject to change, but we’re planning to work on vNext after completing C++20, in a clean switchover – that is, VS 2019 (the “v19” release series that began with VS 2015) will receive all C++20 features, then we’ll do vNext, then C++23 features will be added to vNext only – we will continue to service v19 for critical bugs and security fixes, but not new feature work. We hope to finish C++20 in 2020, then work on vNext in H1 2021 – we’re unsure how long we’ll have to work on the vNext overhaul although we expect it to be at least 6 months. (I personally hope for a year, but I also want a pony and a unicorn). At this time, we don’t yet know exactly how this will ship to users (i.e. what release).