Overview
- Recap: Where we were, what we’re doing now.
- Changes adopted in Word: Composing headers for reuse.
- Testing Details: The setup of the performance experiment.
- Opportunity for Improvement: Identifying a potential performance bottleneck.
- Upcoming Improvements to MSVC.
- What’s next for Office?
- Closing thoughts.
Introduction
In our previous two (part 1, part 2) blog posts we discussed how Office was thinking holistically about header units. In this installment we’d like to share the concrete steps taken to integrate header units into the build of Microsoft Word, and their effect on build throughput. Throughout the post we’ll use the term “build throughput” as opposed to “build performance” to avoid potential confusion with application runtime performance. This blog is primarily a recap of the presentation that Zachary gave at Pure Virtual C++ 2024.
Remember that precompiled headers are an established technology, dating back 30 years, compared to header units, which are a newer feature, only 5 years old. There are still a lot of opportunities for optimization!
In the best case migrating from a precompiled header (PCH) to header units produced a 21.3% build throughput improvement. In the worst case the migration yielded a 0.9% build throughput regression.
Changes adopted in Word
There are two precompiled headers that Word code can select between. The first of these, minpch, is extremely lightweight. It contains things like the C++ standard library, windows.h, some common Word internal helpers, and a very small set of low-level Office shared headers. In total it includes about 250 files if you count all transitive includes. Minpch is not widely used; it’s the PCH chosen by only 2% of the files in Word.
On the other extreme is the word_shared precompiled header. It captures nearly every upstream header, with only some minimal holdbacks for test headers. It contains around 2500 transitive includes. This makes word_shared a challenging test for header units, both in terms of the breadth of C++ compiled into a header unit and what effect it will have on build throughput. As previously mentioned, Word used C++ Build Insights to measure each individual file’s contribution as justification for its inclusion in word_shared.
At a high level the conversion we performed was:
- Switch
/Yc
to/exportHeader
- Switch
/Yu
to/headerUnit:quoteword_shared.h=path/to/word_shared.ifc
- Measure!
Testing Details
Process
These were the steps to gather measurements:
- Set the configuration to PCH or Header Units
- Perform a full build of Word
- Disable build caching
- Run all compiler invocations, excluding PCH or HU creation (repeat 7x)
- Record CPU time spent by the compiler, excluding highest and lowest results
- Repeat every night, on 3 unique boxes, for 3 weeks
In step 4, 3 unique sets of files were chosen to compile and measure. In the first case all C++ files (roughly 6000) in Word were compiled. In addition, two active areas of investment were chosen to measure. The first of these, henceforth referred to as “Folder A”, consisted of approximately 300 files. The second, which will be termed “Folder B”, consisted of around 200 files.
Test Equipment
Build the World (BTW) | Physical Workstation (PDW) | Microsoft DevBox (MDB) | |
Threads | 64 | 32 | 16 |
Max Clock | 4.2 GHz | 4.3 GHz | 3.5 GHz |
RAM | 256 GB | 128 GB | 64 GB |
Storage | Boot SSD + Software RAID data drive | Boot SSD + Software RAID data drive | Single Azure Premium SSD (P30) |
The Build the World machines are server-class machines, generally reserved for the limited number of developers that need to make changes that span all of Office at once. The physical workstations are more common and reflect what a developer working on an application team, such as Word, would have for their box. The Microsoft Dev Box is a publicly available product; a virtual machine running in Azure. Other MDB options are available from the service—these specifications are merely the specifications chosen for an internal pilot program.
Results
Additional Advantages
Beyond build speed, header units have additional advantages. The IFC files produced by header units are portable. Precompiled headers produced by MSVC must be created and consumed locally. Header units don’t have this limitation.
It would be possible for a separate project upstream of Word to create the minpch and word_shared header units. Most Word developers would not need to build the files locally; the IFC files could simply be downloaded from the cloud. Second, the IFC produced for a header unit follows a fully documented specification and corresponding SDK. Cameron’s presentation on the IFC SDK from Pure Virtual C++ is available on YouTube.
Finally, header units are significantly smaller on disk than their corresponding PCH. The comparison between a single PCH file and a corresponding header unit may not be an unbiased comparison because of how well header units compose with their dependencies. To create a fairer picture, we’ve also included data on the disk size of Word’s header units plus all its upstream dependencies.
PCH | Header Unit | % Reduction | Full header unit chain | % Reduction | |
minpch | 174 MB | 8.78 MB | 95% | 42.8 MB | 77% |
word_shared | 1.16 GB | 111 MB | 90% | 150 MB | 88% |
Uncovering an Opportunity for Improvement
We were surprised to discover that storage system performance was the best predictor of build compile time improvement. We measured the size of the object files on disk, and found a sizable increase. The total disk space of all object files created for Word under header units was 1.22x the size on disk vs the total size of object files when consuming a PCH. The median increase per obj file was 1.25x and the largest was 8000x larger under header units.
This was due to the compiler duplicating debug information into each of the object files due to Office’s use of the /Z7
compiler flag. This is a change from the toolchain’s ability to look up many kinds of debug information directly from the PCH file. The biggest offenders in Word were a pair of large enums with 10,000 and 30,000 entries. The first is an enum of all command IDs for all Office applications. The other is to idenitfy performance data scenarios. Even when only one enum value was used, all values of corresponding enum was recorded in the obj file.
After removing the include files that brought two enums into Word’s word_shared header, the object sizes on disk dropped by a measurable amount.
PCH vs Header Units | Default | Large Enums Omitted |
Total .obj file size increase | 1.22x | 1.18x |
Median size increase | 1.25x | 1.22x |
Greatest obj size increase | 8000x | 1700x |
As the table demonstrates, removing the large enums improved the size of binaries on disk but also made us anxious for the upcoming work in the MSVC toolchain that would eliminate this duplicate data.
Upcoming improvements in MSVC
Office hit a recent bug where the linker can fail to perform an incremental link and it falls back to performing a full link. If you run into the same issue in your projects, please update to compiler version 17.8.10 or 17.10.0 for the fix.
The other big item is to update the toolchain so that debug information can be gathered directly from an IFC file, and avoid creating huge obj files. Notably, this optimization benefits both named modules as well as header units.
This is a more invasive change than fixes Office has needed for header units in the past as it will require changes to the compiler, linker, and PDB writer—essentially requiring the entire toolset to cooperate. We don’t have a release date for this work, but please stay tuned for an announcement when it’s ready for testing.
What’s next for Office?
In Office we want to continue making low-level libraries into header units. As a prerequisite we need to remove Office’s global ‘inc’ folder, our collection of headers that don’t share a single logical responsibility. Once all these headers are moved to new homes their component libraries can each have a header unit created for them.
Once that’s done, we can migrate our shared PCH to become a header unit in the same way that was done for Word. The shared PCH is consumed by a huge percentage of our shared code, so we expect header unit creation and consumption to greatly accelerate once we reach this milestone.
In addition, we have ongoing work to do to continue striving for consistent build flags, the same issue we’ve highlighted in each of these blog posts. We’ve done a lot of prep work to mark where there are deviations from the global defaults, but the work remains to perform the migration to the common settings. Once we achieve continuity, downstream libraries can also become header units.
Final Thoughts
Moving a part of the Office codebase to utilize header units has been an excellent learning experience and collaborative effort between the compiler team and the Office engineering team. Without a doubt, this experience has shown that not only can header units (and therefore C++ modules) scale to a multi-million-line codebase, but the technology is flexible enough to match that of preexisting technology with 30+ years of improvements and improve upon it for even more throughput.
The MSVC implementation of modules is continuing to improve with each release and the more exposure the compiler gets to community feedback/usage the better the team can improve the overall quality and robustness. We recommend you go out and try to integrate C++ modules or header units into your code and tell us about your experiences. We’re eager to learn from and work with the community to elevate the implementation into something we are all happy to use!
Closing
As always, we welcome your feedback. Feel free to send any comments through e-mail at visualcpp@microsoft.com or through Twitter @visualc. Also, feel free to follow Cameron DaCamara on Twitter @starfreakclone.
If you encounter other problems with MSVC in VS 2019/2022 please let us know via the Report a Problem option, either from the installer or the Visual Studio IDE itself. For suggestions or bug reports, let us know through DevComm.
Did the testing process include clearing the disk cache between runs? Or is the belief that the build is large enough that it would’ve been naturally flushed out during the lengthy compilation process?
The later. Builds were run sequentially, the highest and lowest times discarded, and the remaining 5 timings averaged.
We have captured a header unit issue: https://developercommunity.visualstudio.com/t/C-20-header-unit-problem/10678672
Lol RIP those who have to use those poor, poor, but already _very_ expensive, DevBox VMs.