C++ Inliner Improvements: The Zipliner
Visual Studio 2019 versions 16.3 and 16.4 include improvements to the C++ inliner. Among these is the ability to inline some routines after they have been optimized, referred to as the “Zipliner.” Depending on your application, you may see some minor code quality improvements and/or major build-time (compiler throughput) improvements.
Terry Mahaffey has provided an overview of Visual Studio’s inlining decisions. This details some of the inliner’s constraints and areas for improvement, a few of which are particularly relevant here:
- The inliner is recursive and may often re-do work it has already done. Inline decisions are context sensitive and it is not always profitable to replay its decision-making for the same function.
- The inliner is very budget conscious. It has the difficult job of balancing executable size with runtime performance.
- The inliner’s view of the world is always “pre-optimized.” It has very limited knowledge of copy propagation and dead control paths for example.
Unfortunately, many of the coding patterns and idioms common to heavy generic programming bump into those constraints. Consider the following routine in the Eigen library:
which calls innerSize:
template<typename Derived> class DenseBase
Index innerSize() const
return IsVectorAtCompileTime ? this->size()
: int(IsRowMajor) ? this->cols() : this->rows();
That instantiation of outerStride does nothing but return one of its members. Therefore, it is an excellent candidate for full inline expansion. To realize this win though the compiler must fully evaluate and expand outerStride’s 18 total callees, for every callsite of outerStride in the module. This eats into both the optimizer throughput as well as the inliner’s code-size budget. It also bears mentioning that calls to ‘rows’ and ‘cols’ are inline-expanded as well, even though those are on a statically dead path.
It would be much better if the optimizer just inlined the two-line member return:
?outerStride@?$Matrix@N$0?0$0?0$0A@$0?0$0?0@Eigen@@QEBA_JXZ PROC ; Eigen::Matrix<double,-1,-1,0,-1,-1>::outerStride, COMDAT
rax, QWORD PTR [rcx+8]
Inlining Optimized IR
For a subset of routines the inliner will now expand the already-optimized IR of a routine, bypassing the process of fetching IR, and re-expanding callees. This has the dual purpose of expanding callsites much faster, as well as letting the inliner measure its budget more accurately.
First, the optimizer will summarize that outerStride is a candidate for this faster expansion when it is originally compiled (Remember that c2.dll tries to compile routines before their callers). Then, the inliner may replace calls to that outerStride instantiation with the field access.
The candidates for this faster inline expansion are leaf functions with no locals, which refer to at most two different arguments, globals, or constants. In practice this targets most simple getters and setters.
There are many examples like outerStride in the Eigen library where a large call tree expands into just one or two instructions. Modules that make heavy use of Eigen may see a significant throughput improvement; we measured the optimizer taking up to 25-50% less time for such repros.
The new Zipliner will also enable the inliner to measure its budget more accurately. Eigen developers have long been aware that MSVC does not inline to their specifications (see EIGEN_STRONG_INLINE). Zipliner should help to alleviate some of this concern, as a ziplined routine is now considered a virtually “free” inline.
Give the feature a try
This is enabled by default in Visual Studio 2019 16.3, along with some improvements in 16.4. Please download Visual Studio 2019 and give the new improvements a try. We can be reached via the comments below or via email (email@example.com). If you encounter problems with Visual Studio or MSVC, or have a suggestion for us, please let us know through Help > Send Feedback > Report A Problem / Provide a Suggestion in the product, or via Developer Community. You can also find us on Twitter (@VisualC).