{"id":27328,"date":"2021-01-14T15:00:40","date_gmt":"2021-01-14T15:00:40","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=27328"},"modified":"2021-01-14T09:52:41","modified_gmt":"2021-01-14T09:52:41","slug":"build-throughput-series-template-metaprogramming-fundamentals","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/build-throughput-series-template-metaprogramming-fundamentals\/","title":{"rendered":"Build Throughput Series: Template Metaprogramming Fundamentals"},"content":{"rendered":"<p>Template metaprogramming is popular and seen in many code bases. However, it often contributes to long compile times. When investigating build throughput improvement opportunities in large codebases, our finding is that more than one million template specializations and template instantiations is quite common and often provides optimization opportunities for significant improvement.<\/p>\n<p>In this blog post, I will walk through the differences between template specialization and template instantiation and how they are processed in the MSVC compiler.\u00a0 I will cover how to find these bottlenecks related to too many template specializations and instantiations in a different blog post (or you can read <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/profiling-template-metaprograms-with-cpp-build-insights\/\">this blog post<\/a> as a starting point).<\/p>\n<p>Before we start, let us clarify some terms widely used in template metaprogramming.<\/p>\n<ul>\n<li>Primary template\n<ul>\n<li>Partial specialization<\/li>\n<\/ul>\n<\/li>\n<li>Template specialization\n<ul>\n<li>Explicit specialization<\/li>\n<\/ul>\n<\/li>\n<li>Template instantiation\n<ul>\n<li>Implicit template instantiation<\/li>\n<li>Explicit template instantiation<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>They are better explained by an example:<\/p>\n<pre class=\"lang:c++ decode:true\">\/\/ Primary template.\r\ntemplate&lt;typename T&gt; struct Trait {};\r\n\/\/ Partial specialization.\r\ntemplate&lt;typename T&gt; struct Trait&lt;T*&gt; {};\r\n\/\/ Explicit specialization.\r\ntemplate&lt;&gt; struct Trait&lt;int&gt; {};\r\n \r\n\/\/ Implicit template instantiation of template specialization 'Trait&lt;void&gt;' from the primary template.\r\nTrait&lt;void&gt; trait1;\r\n\/\/ Implicit template instantiation of template specialization 'Trait&lt;void*&gt;' from the partial specialization.\r\nTrait&lt;void*&gt; trait2;\r\n\/\/ No template instantiation for explicit specialization.\r\nTrait&lt;int&gt; trait3;\r\n\/\/ Explicit template instantiation of template specialization 'Trait&lt;char&gt;' from the primary template.\r\ntemplate struct Trait&lt;char&gt;;\r\n\/\/ Explicit template instantiation of template specialization 'Trait&lt;char*&gt;' from the partial specialization.\r\ntemplate struct Trait&lt;char*&gt;;<\/pre>\n<p>Template specialization and template instantiation are often used interchangeably. However, the distinction is important when evaluating build throughput.<\/p>\n<p>Let us look at an example:<\/p>\n<pre class=\"lang:c++ decode:true\">template&lt;typename T&gt; struct Vector\r\n{\r\n    void sort() { \/**\/ }\r\n    void clear() { \/**\/ }\r\n};\r\n\r\nVector&lt;int&gt; get_vector();\r\n\r\ntemplate&lt;typename V&gt; void sort_vector(V&amp; v) { v.sort(); }\r\n\r\nvoid test(Vector&lt;long&gt;&amp; v)\r\n{\r\n    ::sort_vector(v); \/\/ I will explain why we use '::' here later.\r\n}<\/pre>\n<p>In the example above, the MSVC compiler does the following:<\/p>\n<pre>Start processing user code\r\n    Process class template 'Vector'\r\n    Process function 'get_vector'\r\n        Specialize 'Vector&lt;int&gt;'\r\n    Process function template 'sort_vector'\r\n    Process function 'test'\r\n        Specialize 'Vector&lt;long&gt;'\r\n        Specialize 'sort_vector&lt;Vector&lt;long&gt;&gt;'\r\n        Instantiate 'sort_vector&lt;Vector&lt;long&gt;&gt;' (delayed)\r\n            Add 'sort_vector&lt;Vector&lt;long&gt;&gt;' to the pending list\r\nEnd processing user code\r\nStart processing the pending list for delayed instantiation\r\n    Iteration 1\r\n        Instantiate 'sort_vector&lt;Vector&lt;long&gt;&gt;'\r\n        Instantiate 'Vector&lt;long&gt;'\r\n        Instantiate 'Vector&lt;long&gt;::sort' (delayed)\r\n            Add 'Vector&lt;long&gt;::sort' to the pending list\r\n    Iteration 2\r\n        Instantiate 'Vector&lt;long&gt;::sort'\r\nEnd processing the pending list\r\n<\/pre>\n<p>You can see that template specialization occurs at an earlier step in processing than template instantiation and is often cheaper.<\/p>\n<p>When you specialize a function template (like <code>sort_vector&lt;Vector&lt;long&gt;&gt; <\/code>in the example), the compiler only processes its declaration and its definition is not processed. \u00a0The compiler will create an internal representation for the specialization and add that to a map. \u00a0If the same specialization is specialized again later, the compiler will find the internal representation from the map and reuse it to avoid duplicated work (known as <em>memoization<\/em>). The definition is processed when the specialization is instantiated.<\/p>\n<p>Similarly, when you specialize a class template its definition is also not processed. Instantiation of class template specialization is a bit more complicated. By default, the member of the class template specialization is not instantiated when the specialization itself is instantiated (like <code>Vector&lt;long&gt;::clear<\/code>). The member is instantiated when it is used (like <code>Vector&lt;long&gt;::sort<\/code>) and MSVC will delay the instantiation if possible.<\/p>\n<p>You may wonder what if I use <code>sort_vector<\/code> in <code>test<\/code>. It will change the processing order.<\/p>\n<ul>\n<li>When qualified name <code>::sort_vector<\/code> is used, it suppresses <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/adl\">argument dependent lookup (ADL)<\/a>.<\/li>\n<li>When unqualified name <code>sort_vector<\/code> is used instead, ADL will compute the associated set of <code>v<\/code> and this forces the instantiation of <code>Vector&lt;long&gt;<\/code>. So, the instantiation is no longer delayed to the phase which processes the pending list.<\/li>\n<\/ul>\n<p>With this information in mind, let us check some common patterns and see which requires template instantiation.<\/p>\n<pre class=\"lang:c++ decode:true\">template&lt;int N&gt; struct Array { static_assert(N &gt; 0, \"\"); };\r\n\r\nstruct Data\r\n{\r\n    Array&lt;1&gt; arr; \/\/ Array&lt;1&gt; is instantiated.\r\n};\r\n\r\nArray&lt;2&gt; transform(Array&lt;3&gt; *); \/\/ Neither Array&lt;2&gt; nor Array&lt;3&gt; is instantiated.\r\n\r\nvoid test()\r\n{\r\n    transform(nullptr); \/\/ Array&lt;2&gt; is instantiated, Array&lt;3&gt; is not instantiated.\r\n}<\/pre>\n<p>The <code>Array&lt;1&gt; <\/code>case: When it is used as the type of a member, the compiler needs to instantiate the specialization to know its information like the size. This is one of the most common reasons why a template specialization is instantiated in a header and is often hard to avoid.<\/p>\n<p>The <code>Array&lt;2&gt;<\/code>\u00a0case: Using a template specialization as the function return type does not require it to be instantiated (if there is no function definition). The same is true if it is used as the type of a function parameter. However, providing the function definition or calling the function will force the instantiation of the return type.<\/p>\n<p>The <code>Array&lt;3&gt;<\/code>\u00a0case: passing <code>nullptr<\/code> as the function argument does not require the instantiation because <code>nullptr<\/code> is always convertible to any pointer type. The same is true if you cast <code>nullptr<\/code> to <code>Array&lt;3&gt; *<\/code>. However, if the function argument is a pointer to a class, the compiler must instantiate <code>Array&lt;3&gt;<\/code> to see whether the conversion is valid.<\/p>\n<p>In the next blog post, we will use some examples from the real-world code bases and find ways to reduce the number of template specializations and template instantiations.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Template metaprogramming is popular and seen in many code bases. However, it often contributes to long compile times. When investigating build throughput improvement opportunities in large codebases, our finding is that more than one million template specializations and template instantiations is quite common and often provides optimization opportunities for significant improvement. In this blog post, [&hellip;]<\/p>\n","protected":false},"author":6968,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-27328","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus"],"acf":[],"blog_post_summary":"<p>Template metaprogramming is popular and seen in many code bases. However, it often contributes to long compile times. When investigating build throughput improvement opportunities in large codebases, our finding is that more than one million template specializations and template instantiations is quite common and often provides optimization opportunities for significant improvement. In this blog post, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/27328","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/6968"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=27328"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/27328\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=27328"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=27328"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=27328"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}