{"id":25350,"date":"2020-01-07T15:36:04","date_gmt":"2020-01-07T15:36:04","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=25350"},"modified":"2020-01-07T15:36:04","modified_gmt":"2020-01-07T15:36:04","slug":"c-inliner-improvements-the-zipliner","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/c-inliner-improvements-the-zipliner\/","title":{"rendered":"C++ Inliner Improvements: The Zipliner"},"content":{"rendered":"<p><span data-contrast=\"auto\">Visual Studio 2019<\/span> <a href=\"https:\/\/visualstudio.microsoft.com\/vs\/\"><span data-contrast=\"none\">versions 16.3 and 16.4<\/span><\/a> <span data-contrast=\"auto\">include improvements to the C++ <\/span><span data-contrast=\"auto\">inliner<\/span><span data-contrast=\"auto\">. Among these is the ability to inline some routines after they have been optimized, referred to as the \u201c<\/span><span data-contrast=\"auto\">Zipliner.\u201d<\/span><span data-contrast=\"auto\"> Depending on your application, you may see some minor code quality improvements and\/or major build-time (compiler throughput) improvements.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h2>C2 Inliner<\/h2>\n<p><span data-contrast=\"auto\">Terry Mahaffey has provided an overview of <\/span><a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/inlining-decisions-in-visual-studio\/\"><span data-contrast=\"none\">Visual Studio\u2019s inlining decisions<\/span><\/a><span data-contrast=\"auto\">.<\/span> <span data-contrast=\"auto\">This details some of the <\/span><span data-contrast=\"auto\">inliner\u2019s<\/span><span data-contrast=\"auto\"> constraints and areas for improvement, a few of which are particularly relevant here:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<ol>\n<li aria-setsize=\"-1\" data-aria-level=\"1\" data-aria-posinset=\"1\" data-listid=\"2\" data-font=\"Calibri,Calibri_MSFontService,Sans-Serif\" data-leveltext=\"%1.\"><span data-contrast=\"auto\">The <\/span><span data-contrast=\"auto\">inliner<\/span><span data-contrast=\"auto\"> is recursive and may often re-do work it has already done. Inline decisions are context sensitive and it is not always profitable to replay its decision-making for the same function.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<li aria-setsize=\"-1\" data-aria-level=\"1\" data-aria-posinset=\"2\" data-listid=\"2\" data-font=\"Calibri,Calibri_MSFontService,Sans-Serif\" data-leveltext=\"%1.\"><span data-contrast=\"auto\">The<\/span> <span data-contrast=\"auto\">inliner<\/span><span data-contrast=\"auto\"> is very <\/span><span data-contrast=\"auto\">budget conscious<\/span><span data-contrast=\"auto\">. It has the difficult job of balancing executable size with runtime performance.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<li aria-setsize=\"-1\" data-aria-level=\"1\" data-aria-posinset=\"1\" data-listid=\"2\" data-font=\"Calibri,Calibri_MSFontService,Sans-Serif\" data-leveltext=\"%1.\"><span class=\"TextRun SCXW152825030 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW152825030 BCX1\">The <\/span><\/span><span class=\"TextRun SCXW152825030 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW152825030 BCX1\">inliner\u2019s<\/span><\/span><span class=\"TextRun SCXW152825030 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW152825030 BCX1\"> view of the world is always \u201cpre-optimize<\/span><\/span><span class=\"TextRun SCXW152825030 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun CommentStart SCXW152825030 BCX1\">d.\u201d <\/span><\/span><span class=\"TextRun SCXW152825030 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW152825030 BCX1\">It has very limited knowledge of copy propagation and dead control paths for example.<\/span><\/span><\/li>\n<\/ol>\n<h2>Modern C++<\/h2>\n<p><span data-contrast=\"auto\">Unfortunately, many of the coding patterns and idioms common to heavy generic programming bump into those constraints. Consider the following routine in the <\/span><a href=\"https:\/\/eigen.tuxfamily.org\/\"><span data-contrast=\"none\">Eigen library<\/span><\/a><span data-contrast=\"auto\">:<\/span><\/p>\n<pre class=\"lang:default decode:true\">Eigen::Matrix&lt;float,-1,1,0,-1,1&gt;::outerStride(void)<\/pre>\n<p>which calls innerSize:<\/p>\n<pre class=\"lang:default decode:true\">template&lt;typename Derived&gt; class DenseBase \r\n... \r\nIndex innerSize() const \r\n{ \r\n    return IsVectorAtCompileTime ? this-&gt;size() \r\n         : int(IsRowMajor) ? this-&gt;cols() : this-&gt;rows(); \r\n}<\/pre>\n<p><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\">That instantiation of <\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 CommentStart SCXW222747456 BCX1\">outerStride<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\"> does nothing but return one of its members. Therefore, it is an excellent candidate for full inline expansion. To realize this win though the compiler must fully evaluate and expand <\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW222747456 BCX1\">outerStride<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW222747456 BCX1\">\u2019s<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\"> 18 total <\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW222747456 BCX1\">callees<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\">, for every <\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW222747456 BCX1\">callsite<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\"> of<\/span><\/span> <span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW222747456 BCX1\">outerStride<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\"> in the module. This eats into both the optimizer throughput as well as the <\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SpellingErrorV2 SCXW222747456 BCX1\">inliner\u2019s<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\"> code-size budget. It also bears mentioning that calls to \u2018rows\u2019 and \u2018cols\u2019 are <\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2 SCXW222747456 BCX1\">inline-expanded<\/span><\/span><span class=\"TextRun SCXW222747456 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW222747456 BCX1\"> as well, even though those are on a statically dead path.<\/span><\/span><\/p>\n<p><span class=\"TextRun SCXW191368790 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW191368790 BCX1\">It would be much better<\/span><\/span><span class=\"TextRun SCXW191368790 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW191368790 BCX1\"> if the optimizer just <\/span><span class=\"NormalTextRun SpellingErrorV2 SCXW191368790 BCX1\">inlined<\/span><span class=\"NormalTextRun SCXW191368790 BCX1\"> the two-line member return<\/span><\/span><span class=\"TextRun SCXW191368790 BCX1\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW191368790 BCX1\">:<\/span><\/span><\/p>\n<pre class=\"lang:default decode:true\">?outerStride@?$Matrix@N$0?0$0?0$0A@$0?0$0?0@Eigen@@QEBA_JXZ PROC ; Eigen::Matrix&lt;double,-1,-1,0,-1,-1&gt;::outerStride, COMDAT \r\nmov\t\r\n    rax, QWORD PTR [rcx+8] \r\n    ret 0<\/pre>\n<h2>Inlining Optimized IR<\/h2>\n<p><span data-contrast=\"auto\">For a subset of <\/span><span data-contrast=\"auto\">routines<\/span><span data-contrast=\"auto\"> the inliner will now expand the <\/span><span data-contrast=\"auto\">already-optimized <\/span><span data-contrast=\"auto\">IR of a routine, bypassing the process of <\/span><span data-contrast=\"auto\">fetching IR, and re-expanding<\/span><span data-contrast=\"auto\"> callees.<\/span><span data-contrast=\"auto\"> This has the dual purpose of expanding <\/span><span data-contrast=\"auto\">callsites<\/span><span data-contrast=\"auto\"> much faster, as well as letting the inliner measure its budget more accurately.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">First, t<\/span><span data-contrast=\"auto\">he<\/span><span data-contrast=\"auto\"> optimizer <\/span><span data-contrast=\"auto\">will summarize<\/span><span data-contrast=\"auto\"> that <\/span>outerStride<span data-contrast=\"auto\"> is a candidate for this faster expansion when it is <\/span><span data-contrast=\"auto\">originally<\/span><span data-contrast=\"auto\"> compiled<\/span><span data-contrast=\"auto\"> (Remember <\/span><span data-contrast=\"auto\">that c2.dll tries to<\/span><span data-contrast=\"auto\"> compile routines before their callers)<\/span><span data-contrast=\"auto\">. <\/span><span data-contrast=\"auto\">Then, t<\/span><span data-contrast=\"auto\">he <\/span><span data-contrast=\"auto\">inliner <\/span><span data-contrast=\"auto\">may<\/span><span data-contrast=\"auto\"> replace calls to that <\/span>outerStride<span data-contrast=\"auto\"> instantiation with <\/span><span data-contrast=\"auto\">the field access.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">T<\/span><span data-contrast=\"auto\">he candidates for this faster inline expansion are leaf functions <\/span><span data-contrast=\"auto\">with no locals<\/span><span data-contrast=\"auto\">, which refer to at most two different arguments, <\/span><span data-contrast=\"auto\">globa<\/span><span data-contrast=\"auto\">l<\/span><span data-contrast=\"auto\">s<\/span><span data-contrast=\"auto\">, or constants. In practice <\/span><span data-contrast=\"auto\">this targets<\/span><span data-contrast=\"auto\"> most simple getters and setters.<\/span><\/p>\n<h2>Benefits<\/h2>\n<p><span data-contrast=\"auto\">There are many examples like <\/span><span data-contrast=\"auto\">outerStride<\/span><span data-contrast=\"auto\"> in the Eigen library where a large call tree expands into just one or two instructions. Modules that make heavy use of Eigen may see a significant throughput improvement; we measured the optimizer taking up to 25-50% less time for such repros.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The new Zipliner will also enable the inliner to measure its budget more accurately. <\/span><span data-contrast=\"auto\">Eigen developers have long been aware that MSVC does not inline to their specifications (see EIGEN_STRONG_INLINE). Zipliner should help to alleviate some of this concern, as a ziplined routine is now considered a virtually \u201cfree\u201d inline.<\/span><\/p>\n<h2>Give the feature a try<\/h2>\n<p><span data-contrast=\"auto\">This is enabled by default in Visual Studio 2019<\/span><span data-contrast=\"auto\"> 16.3<\/span><span data-contrast=\"auto\">, along with some improvements in 16.4. Please <\/span><a href=\"https:\/\/visualstudio.microsoft.com\/vs\/\"><span data-contrast=\"none\">download Visual Studio 2019<\/span><\/a><span data-contrast=\"auto\"> and give the new improvements a try. We can be reached via the comments below or via email (visualcpp@microsoft.com). If you encounter problems with Visual Studio or MSVC, or have a suggestion for us, please let us know through Help &gt; Send Feedback &gt; Report A Problem \/ Provide a Suggestion in the product, or via <\/span><a href=\"https:\/\/developercommunity.visualstudio.com\/\"><span data-contrast=\"none\">Developer Community<\/span><\/a><span data-contrast=\"auto\">. You can also find us on Twitter (<\/span><a href=\"https:\/\/twitter.com\/visualc\"><span data-contrast=\"none\">@<\/span><span data-contrast=\"none\">VisualC<\/span><\/a><span data-contrast=\"auto\">).<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Visual Studio 2019 versions 16.3 and 16.4 include improvements to the C++ inliner. Among these is the ability to inline some routines after they have been optimized, referred to as the \u201cZipliner.\u201d Depending on your application, you may see some minor code quality improvements and\/or major build-time (compiler throughput) improvements.\u00a0 C2 Inliner Terry Mahaffey has [&hellip;]<\/p>\n","protected":false},"author":15688,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[270],"tags":[],"class_list":["post-25350","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-announcement"],"acf":[],"blog_post_summary":"<p>Visual Studio 2019 versions 16.3 and 16.4 include improvements to the C++ inliner. Among these is the ability to inline some routines after they have been optimized, referred to as the \u201cZipliner.\u201d Depending on your application, you may see some minor code quality improvements and\/or major build-time (compiler throughput) improvements.\u00a0 C2 Inliner Terry Mahaffey has [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/25350","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/15688"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=25350"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/25350\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=25350"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=25350"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=25350"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}