{"id":26104,"date":"2020-05-28T14:00:50","date_gmt":"2020-05-28T14:00:50","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=26104"},"modified":"2020-05-28T13:49:07","modified_gmt":"2020-05-28T13:49:07","slug":"profiling-template-metaprograms-with-cpp-build-insights","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/profiling-template-metaprograms-with-cpp-build-insights\/","title":{"rendered":"Profiling template metaprograms with C++ Build Insights"},"content":{"rendered":"<p>The use of templates in C++ programs can sometimes lead to longer builds. C++ Build Insights puts tools at your disposal for analyzing template usage patterns and their associated impact on build time. In this article, we show you how to use the <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/introducing-c-build-insights\/\">vcperf analysis tool<\/a> and the <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build-insights\/reference\/sdk\/overview?view=vs-2019\">C++ Build Insights SDK<\/a> to understand and fix problematic template patterns. We demonstrate the practical use of these tools through a case study in which we reduced build time for the Sprout open-source metaprogramming library by 25%. We hope these methods will help you approach your template endeavors with confidence!<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>How to obtain and use vcperf<\/strong><\/span><\/p>\n<p>The examples in this article make use of <em>vcperf<\/em>, a tool that allows you to capture a trace of your build and to view it in the <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/test\/wpt\/windows-performance-analyzer\">Windows Performance Analyzer (WPA)<\/a>. The latest version is available in Visual Studio 2019.<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>1. Follow these steps to obtain and configure <em>vcperf <\/em>and WPA:<\/strong><\/span><\/p>\n<ol>\n<li>Download and install the latest <a href=\"https:\/\/visualstudio.microsoft.com\/downloads\/\">Visual Studio 2019<\/a>.<\/li>\n<li>Obtain WPA by downloading and installing the <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/get-started\/adk-install\">latest Windows ADK<\/a>.<\/li>\n<li>Copy the <em>perf_msvcbuildinsights.dll<\/em> file from your Visual Studio 2019\u2019s MSVC installation directory to your newly installed WPA directory. This file is the C++ Build Insights WPA add-in, which must be available to WPA for correctly displaying the C++ Build Insights events.\n<ol style=\"list-style-type: lower-alpha;\">\n<li>MSVC\u2019s installation directory is typically: <code>C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\<span class=\"pln\">{<\/span><span class=\"typ\">Edition<\/span><span class=\"pun\">}<\/span>\\VC\\Tools\\MSVC\\{Version}\\bin\\Hostx64\\x64<\/code>.<\/li>\n<li>WPA\u2019s installation directory is typically: <code>C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit<\/code>.<\/li>\n<\/ol>\n<\/li>\n<li>Open the <em>perfcore.ini<\/em> file in your WPA installation directory and add an entry for the <em>perf_msvcbuildinsights.dll<\/em> file. This tells WPA to load the C++ Build Insights add-in on startup.<\/li>\n<\/ol>\n<p>You can also obtain the latest <em>vcperf<\/em> and WPA add-in by cloning and building the <a href=\"https:\/\/github.com\/microsoft\/vcperf\">vcperf GitHub repository<\/a>. Feel free to use your built copy in conjunction with Visual Studio 2019!<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>2. Follow these steps to collect a trace of your build:<\/strong><\/span><\/p>\n<ol>\n<li>Open an elevated <strong><em>x64 Native Tools Command Prompt for VS 2019<\/em><\/strong>.<\/li>\n<li>Obtain a trace of your build:\n<ol>\n<li>Run the following command: <code>vcperf \/start \/level3 MySessionName<\/code>. The <code>\/level3<\/code> option enables the collection of template events.<\/li>\n<li>Build your C++ project from anywhere, even from within Visual Studio (<em>vcperf<\/em> collects events system-wide).<\/li>\n<li>Run the following command: <code>vcperf \/stop \/templates MySessionName outputFile.etl<\/code>. This command will stop the trace, analyze all events, including template events, and save everything in the <em>outputFile.etl<\/em> trace file.<\/li>\n<\/ol>\n<\/li>\n<li>Open the trace you just collected in WPA.<\/li>\n<\/ol>\n<p><span style=\"font-size: 18pt;\"><strong>Viewing template information in WPA<\/strong><\/span><\/p>\n<p>The most time-consuming activity related to templates for the compiler is to instantiate them. C++ Build Insights provides a WPA view called <strong><em>Template Instantiations<\/em><\/strong> that allows you to see the instantiation time of the top templates in your program. After opening your trace in WPA, you can open this view by dragging it from the <strong><em>Graph Explorer<\/em><\/strong> pane to the <strong><em>Analysis<\/em><\/strong> window, as shown below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26127 size-full\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/DragDropTemplateInstantiations.gif\" alt=\"Image that demonstrates how to open the Template Instantiations view in WPA.\" width=\"839\" height=\"651\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>A note on the presence of template events in a vcperf trace and WPA<\/strong><\/span><\/p>\n<p>If you do not see a <strong><em>Template Instantiations<\/em><\/strong> view in the <strong><em>Graph Explorer<\/em><\/strong> pane, make sure that you have completed the WPA configuration steps from the <strong><em>How to obtain and use vcperf<\/em><\/strong> section correctly, and that you have passed the right arguments to <em>vcperf<\/em> when starting and stopping your trace.<\/p>\n<p>For scalability reasons, <em>vcperf<\/em> only writes in the output trace the templates that take the most time to instantiate in your build. If template instantiations were not a significant factor in your build times, <em>vcperf<\/em> will omit template information, and the <strong><em>Template Instantiations<\/em><\/strong> view will not appear.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Case study: speeding up the Sprout template metaprogramming library<\/strong><\/span><\/p>\n<p>In this case study, we use a real open source project from GitHub to show how you can use <em>vcperf<\/em> and WPA to diagnose and treat long build times in template metaprogramming code. Specifically, we demonstrate how we&#8217;ve used these tools to reduce the build time for the Sprout library by around 25%.<\/p>\n<p>Use these steps if you would like to follow along:<\/p>\n<ol>\n<li>Clone the <a href=\"https:\/\/github.com\/bolero-MURAKAMI\/Sprout\">Sprout C++ Libraries repository<\/a> from GitHub.<\/li>\n<li>Checkout the following commit: <code>6b5addba9face0a<\/code>.<\/li>\n<li>Obtain a trace for a full rebuild of the Sprout tests:\n<ol>\n<li>Open an elevated<strong> <em>x64 Native Tools Command Prompt for VS 2019.<\/em><\/strong><\/li>\n<li>Run the following command: <code>vcperf \/start \/level3 Sprout<\/code><\/li>\n<li>Build the Sprout tests by running the following command:<code> cl \/std:c++latest \/D_HAS_DEPRECATED_IS_LITERAL_TYPE=1 \/D_SILENCE_CXX17_IS_LITERAL_TYPE_DEPRECATION_WARNING \/EHsc \/I. \/constexpr:steps100000000 .\\testspr\\sprout.cpp<\/code>.<\/li>\n<li>Run the following command: <code>vcperf \/stop \/templates Sprout sprout.etl<\/code>. This will save a trace of the build in <em>sprout.etl<\/em>. The trace will contain template information.<\/li>\n<\/ol>\n<\/li>\n<li>Open the trace in WPA.<\/li>\n<\/ol>\n<p>We open the <strong><em>Build Explorer<\/em><\/strong> and <strong><em>Template Instantiations<\/em><\/strong> views one on top of the other. The <strong><em>Build Explorer<\/em><\/strong> view indicates that the build lasted around 13.5 seconds. This can be seen by looking at the time axis at the bottom of the view (labeled A). The <strong><em>Template Instantiations<\/em><\/strong> view shows a locus of condensed template instantiation activity somewhere between time 8 and 10.5 (labeled B).<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26106\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/word-image-13.gif\" alt=\"Image of WPA with the Build Explorer and Template Instantiation view open. The Build Explorer view shows a total duration of around 13.5 seconds. The Template Instantiations view shows an area that contains many time-consuming template instantiations.\" width=\"839\" height=\"453\" \/><\/p>\n<p>By default, all template specializations are grouped by the primary template\u2019s name. For example, the <code>std::vector&lt;int&gt;<\/code> and <code>std::vector&lt;double&gt;<\/code> specializations would both be grouped under the <code>std::vector<\/code> primary template name. In our case, we would like to know if there is one specialization in particular that causes an issue, so we reorganize the view\u2019s columns such that the entries are grouped by <strong><em>Specialization Name<\/em><\/strong> instead. This operation is shown below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26128 size-full\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/Reconfigure.gif\" alt=\"Image that shows how to configure the Template Instantiations view to group entries by full specialization name instead of primary template name.\" width=\"839\" height=\"609\" \/><\/p>\n<p>We notice that there is one large instantiation of the <code>sprout::tpp::all_of<\/code> template that is responsible for around 2.15 seconds. There are also 511 instantiations of <code>sprout::tpp::detail::all_of_impl<\/code>. We surmise that <code>sprout::tpp::all_of<\/code> is a root template that calls into a highly recursive <code>sprout::tpp::detail::all_of_impl<\/code> template. This discovery is shown below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26108\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/word-image-15.gif\" alt=\"Image of the Template Instantiations view in WPA, which shows that the longest template instantiation is for the sprout::tpp::all_of struct. The duration of this template instantiation alone is 2.16 seconds.\" width=\"839\" height=\"459\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Investigating the codebase<\/strong><\/span><\/p>\n<p>We find that the culprit <code>sprout::tpp::all_of<\/code> template instantiation is triggered by a call to <code>operator()<\/code> of the following type in <em>sprout\\random\\shuffle_order.hpp<\/em>:<\/p>\n<pre class=\"prettyprint\">typedef sprout::random::shuffle_order_engine&lt;\r\n    sprout::random::minstd_rand0, 256&gt; knuth_b;<\/pre>\n<p>This type internally contains a compile-time array of 256 elements, which ends up being passed to the <code>default_remake_container<\/code> function in the <em>sprout\\container\\container_construct_traits.hpp<\/em> header. This function has the following three template definitions. Function bodies have been replaced with comments for simplicity.<\/p>\n<pre class=\"prettyprint\">template&lt;typename Container, typename Cont, typename... Args&gt;\r\ninline SPROUT_CONSTEXPR typename std::enable_if&lt;\r\n    sprout::is_fixed_container&lt;Container&gt;::value,\r\n    typename sprout::container_construct_traits&lt;Container&gt;::copied_type\r\n&gt;::type\r\ndefault_remake_container(Cont&amp;&amp;, \r\n    typename sprout::container_traits&lt;Container&gt;::difference_type, \r\n    Args&amp;&amp;... args) \r\n{\r\n    \/\/ Default case\r\n}\r\n\r\ntemplate&lt;typename Container, typename Cont, typename... Args&gt;\r\ninline SPROUT_CONSTEXPR typename std::enable_if&lt;\r\n    !sprout::is_fixed_container&lt;Container&gt;::value\r\n        &amp;&amp; !(sizeof...(Args) == 2 &amp;&amp; sprout::tpp::all_of&lt;\r\n                sprout::is_input_iterator&lt;\r\n                    typename std::remove_reference&lt;Args&gt;::type&gt;...&gt;::value)\r\n        ,\r\n    typename sprout::container_construct_traits&lt;Container&gt;::copied_type\r\n&gt;::type\r\ndefault_remake_container(Cont&amp;&amp;, \r\n    typename sprout::container_traits&lt;Container&gt;::difference_type, \r\n    Args&amp;&amp;... args) \r\n{\r\n    \/\/ Default case\r\n}\r\n\r\ntemplate&lt;typename Container, typename Cont, typename InputIterator&gt;\r\ninline SPROUT_CONSTEXPR typename std::enable_if&lt;\r\n    !sprout::is_fixed_container&lt;Container&gt;::value,\r\n    typename sprout::container_construct_traits&lt;Container&gt;::copied_type\r\n&gt;::type\r\ndefault_remake_container(Cont&amp;&amp; cont, \r\n    typename sprout::container_traits&lt;Container&gt;::difference_type, \r\n    InputIterator first, InputIterator last) \r\n{\r\n    \/\/ Special case when we have a non-fixed container with 2\r\n    \/\/ input iterator args.\r\n}<\/pre>\n<p>These definitions are all enabled or disabled based on certain conditions using the <code>std::enable_if<\/code> standard type traits class. Can you spot the <code>sprout::tpp::all_of<\/code> call in the second definition\u2019s <code>std::enable_if<\/code> condition? We replicate it below:<\/p>\n<pre class=\"prettyprint\">!(sizeof...(Args) == 2 &amp;&amp; \r\n    sprout::tpp::all_of&lt;sprout::is_input_iterator&lt;\r\n        typename std::remove_reference&lt;Args&gt;::type&gt;...&gt;::value)<\/pre>\n<p>Looking at the condition as a whole, we can see that <code>sprout::tpp::all_of<\/code> does not need to be evaluated if <code>default_remake_container<\/code> is called with a number of arguments different from 2. In our case we have 256 arguments and we know that the condition will be false regardless of what <code>sprout::tpp::all_of<\/code> returns. In the eyes of the compiler, this does not matter. It will still evaluate <code>sprout::tpp::all_of<\/code> on our 256 arguments when trying to resolve the call to <code>default_remake_container<\/code>, resulting in a large, time-consuming recursive template instantiation.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Finding a new solution<\/strong><\/span><\/p>\n<p>We fix this situation by adding a level of indirection between <code>default_remake_container<\/code> and the <code>sprout::tpp::all_of<\/code> call. We first branch on the number of arguments:<\/p>\n<pre class=\"prettyprint\">template&lt;typename Container, typename Cont, typename... Args&gt;\r\ninline SPROUT_CONSTEXPR typename std::enable_if&lt;\r\n    sizeof...(Args) == 2,\r\n    typename sprout::container_construct_traits&lt;Container&gt;::copied_type\r\n&gt;::type\r\ndefault_remake_container(Cont&amp;&amp; cont, \r\n    typename sprout::container_traits&lt;Container&gt;::difference_type, \r\n    Args&amp;&amp;... args) \r\n{\r\n    return default_remake_container_two_args&lt;Container&gt;(\r\n        SPROUT_FORWARD(Cont, cont), SPROUT_FORWARD(Args, args)...);\r\n}\r\n\r\ntemplate&lt;typename Container, typename Cont, typename... Args&gt;\r\ninline SPROUT_CONSTEXPR typename std::enable_if&lt;\r\n    sizeof...(Args) != 2,\r\n    typename sprout::container_construct_traits&lt;Container&gt;::copied_type\r\n&gt;::type\r\ndefault_remake_container(Cont&amp;&amp; cont, \r\n    typename sprout::container_traits&lt;Container&gt;::difference_type, \r\n    Args&amp;&amp;... args) \r\n{\r\n    \/\/ Default case\r\n}<\/pre>\n<p>Only when an argument count of 2 is confirmed do we evaluate <code>sprout::tpp::all_of<\/code> through a new function called <code>default_remake_container_two_args<\/code>:<\/p>\n<pre class=\"prettyprint\">template&lt;typename Container, typename Cont, typename... Args&gt;\r\ninline SPROUT_CONSTEXPR typename std::enable_if&lt;\r\n    sprout::is_fixed_container&lt;Container&gt;::value || \r\n        !sprout::tpp::all_of&lt;sprout::is_input_iterator&lt;\r\n            typename std::remove_reference&lt;Args&gt;::type&gt;...&gt;::value\r\n        ,\r\n    typename sprout::container_construct_traits&lt;Container&gt;::copied_type\r\n&gt;::type\r\ndefault_remake_container_two_args(Cont&amp;&amp;, Args&amp;&amp;... args) {\r\n    \/\/ Default case\r\n}\r\n\r\ntemplate&lt;typename Container, typename Cont, typename InputIterator&gt;\r\ninline SPROUT_CONSTEXPR typename std::enable_if&lt;\r\n    !sprout::is_fixed_container&lt;Container&gt;::value,\r\n    typename sprout::container_construct_traits&lt;Container&gt;::copied_type\r\n&gt;::type\r\ndefault_remake_container_two_args(Cont&amp;&amp; cont, \r\n    InputIterator first, InputIterator last) \r\n{\r\n    \/\/ Special case when we have a non-fixed container with 2 \r\n    \/\/ input iterator args.\r\n}<\/pre>\n<p>You can see the full details of this code change in <a href=\"https:\/\/github.com\/kevcadieux\/Sprout\/commit\/5c2cfeefa113580291bbfabb9dcb96b00b5339c4\">our fork on GitHub<\/a>.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Evaluating the final result<\/strong><\/span><\/p>\n<p>After fixing the code, we collect a second trace of the compilation using the same steps that we covered at the beginning of the case study. We notice that the build time has decreased by around 25% to a total of approximately 9.7 seconds. The Template Instantiations view is also gone, meaning that template instantiations are no longer an issue in this build. Victory!<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26116\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/word-image-16.gif\" alt=\"Image of the Build Explorer view in WPA, which shows that the Sprout build has been reduced to around 9.7 seconds.\" width=\"749\" height=\"196\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Identifying problematic template instantiations using the C++ Build Insights SDK<\/strong><\/span><\/p>\n<p>Recursive, time-consuming template instantiations are not an uncommon problem in codebases that use template metaprogramming, so we want to identify these problems faster in the future without having to go through the hassle of launching WPA and inspecting a trace manually. Luckily, most analysis tasks performed manually with <em>vcperf<\/em> and WPA can also be performed programmatically using the C++ Build Insights SDK. To illustrate this point, we\u2019ve prepared the <strong><em>RecursiveTemplateInspector<\/em><\/strong> SDK sample. It prints out the most time-consuming template instantiation hierarchies in a build, along with statistics about them such as recursion tree depth, total number of instantiations and the root specialization name that started it all.<\/p>\n<p>Let\u2019s repeat the Sprout case study from the previous section, but this time by using the <strong><em>RecursiveTemplateInspector <\/em><\/strong>to see what it finds<em>. <\/em>Use these steps if you want to follow along:<\/p>\n<ol>\n<li>Clone the <a href=\"https:\/\/github.com\/microsoft\/cpp-build-insights-samples\">C++ Build Insights SDK samples GitHub repository<\/a> on your machine.<\/li>\n<li>Build the <strong><em>Samples.sln <\/em><\/strong>solution, targeting the desired architecture (x86 or x64), and using the desired configuration (debug or release). The sample\u2019s executable will be placed in the out\/{architecture}\/{configuration}\/RecursiveTemplateInspector folder, starting from the root of the repository.<\/li>\n<li>Follow the steps from the <strong><em>Case study: speeding up a template metaprogramming library<\/em><\/strong> section to collect a trace of the Sprout compilation. Use the <code>vcperf \/stopnoanalyze Sprout sprout-raw.etl<\/code> command instead of the <code>\/stop<\/code> command when stopping your trace.<\/li>\n<li>Pass the <em>sprout-raw.etl<\/em> trace as the first argument to the <strong><em>RecursiveTemplateInspector<\/em><\/strong> executable.<\/li>\n<\/ol>\n<p>As shown below, <strong><em>RecursiveTemplateInspector<\/em><\/strong> correctly identifies the <code>sprout::tpp::all_of<\/code> template instantiation that has been causing us problems, which triggers other instantiations recursively for a total of 4043 instantiations.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26117\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/word-image-17.gif\" alt=\"Image that shows the output of the RecursiveTemplateInspector sample for the problematic Sprout codebase in a command-line prompt. The top template instantiation hierarchy identified by the sample is for the sprout::tpp::all_of struct. The instantiation lasts for 2168 milliseconds, has a maximum recursion tree depth of 20, and is comprised of 4043 instantiations. \" width=\"622\" height=\"412\" \/><\/p>\n<p>Rerunning <strong><em>RecursiveTemplateInspector <\/em><\/strong>on our fixed codebase shows that all problematic template instantiations are gone. The ones that remain have a low duration and can be ignored.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26118\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/word-image-10.png\" alt=\"Image that shows the output of the RecursiveTemplateInspector sample for the fixed Sprout codebase in a command-line prompt. The sprout::tpp::all_of template instantiation has disappeared from the list. The new top template instantiations have a low duration and can be ignored.\" width=\"622\" height=\"412\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/word-image-10.png 622w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/05\/word-image-10-300x199.png 300w\" sizes=\"(max-width: 622px) 100vw, 622px\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Understanding the sample code<\/strong><\/span><\/p>\n<p>We first filter all stop activity and simple events by asking the C++ Build Insights SDK to forward what we need to the <code>OnTemplateRecursionTreeBranch<\/code>, and <code>OnSymbolName<\/code> functions. The name of the functions has no effect on how the C++ Build Insights SDK will filter the events; only their parameters matter. Continue reading to understand what is actually being filtered.<\/p>\n<pre class=\"prettyprint\">AnalysisControl OnStopActivity(const EventStack&amp; eventStack)\r\n    override\r\n{\r\n    MatchEventStackInMemberFunction(eventStack, this,\r\n        &amp;RecursiveTemplateInspector::OnTemplateRecursionTreeBranch);\r\n\r\n    return AnalysisControl::CONTINUE;\r\n}\r\n\r\nAnalysisControl OnSimpleEvent(const EventStack&amp; eventStack)\r\n    override\r\n{\r\n    MatchEventStackInMemberFunction(eventStack, this,\r\n        &amp;RecursiveTemplateInspector::OnSymbolName);\r\n\r\n    return AnalysisControl::CONTINUE;\r\n}<\/pre>\n<p>We use the <code>OnTemplateRecursionTreeBranch<\/code> function to capture each branch of the template instantiation recursion tree one by one. Because C++ Build Insights events are represented as stacks, capturing recursive relationships between events is something that can be done with little effort. The <code>TemplateInstantiationGroup<\/code> capture class automatically unwinds the event stack, and presents all template instantiations that occurred in the parsing thread as a vector-like C++ container ordered from root to leaf. Because we tied the <code>OnTemplateRecursionTreeBranch<\/code> function to the stop activity event, we will always receive the branches in a given recursion tree at the point where the parsing thread is coming back from the deepest level. We take advantage of this fact to compute the maximum depth of the recursion tree as we process all its branches. Once the root instantiation itself reaches its stop event, we wrap up the instantiation tree by storing the total instantiation time for the tree, along with the translation unit in which it occurred.<\/p>\n<pre class=\"prettyprint\">void OnTemplateRecursionTreeBranch(FrontEndPass fe, \r\n    TemplateInstantiationGroup recursionTreeBranch)\r\n{\r\n    const TemplateInstantiation&amp; root = recursionTreeBranch[0];\r\n    const TemplateInstantiation&amp; current = recursionTreeBranch.Back();\r\n\r\n    auto&amp; info = rootSpecializations_[root.SpecializationSymbolKey()];\r\n\r\n    auto&amp; visitedSet = info.VisitedInstantiations;\r\n\r\n    if (visitedSet.find(current.EventInstanceId()) == visitedSet.end())\r\n    {\r\n        \/\/ We have a new unvisited branch. Update the max depth of the\r\n        \/\/ recursion tree.\r\n\r\n        info.MaxDepth = std::max(info.MaxDepth, recursionTreeBranch.Size());\r\n\r\n        for (size_t idx = recursionTreeBranch.Size(); idx-- &gt; 0;)\r\n        {\r\n            const TemplateInstantiation&amp; ti = recursionTreeBranch[idx];\r\n\r\n            auto p = visitedSet.insert(ti.EventInstanceId());\r\n\r\n            bool wasVisited = !p.second;\r\n\r\n            if (wasVisited)\r\n            {\r\n                \/\/ Stop once we reach a visited template instantiation,\r\n                \/\/ because its parents will also have been visited.\r\n                break;\r\n            }\r\n\r\n            ++info.InstantiationCount;\r\n        }\r\n    }\r\n\r\n    if (recursionTreeBranch.Size() != 1) {\r\n        return;\r\n    }\r\n\r\n    \/\/ The end of a hierarchy's instantiation corresponds to the stop\r\n    \/\/ event of the root specialization's instantiation. When we reach\r\n    \/\/ that point, we update the total instantiation time of the hierarchy.\r\n\r\n    info.TotalInstantiationTime = root.Duration();\r\n\r\n    info.File = fe.InputSourcePath() ? fe.InputSourcePath() :\r\n        fe.OutputObjectPath();\r\n\r\n    visitedSet.clear();\r\n}<\/pre>\n<p>Template instantiation events don\u2019t contain the name of the symbol that was instantiated. Deriving the name is a costly operation, and doing so while measuring an instantiation would skew the time measurement. Instead, a numeric key is issued, which we can later use to match with the appropriate name by listening to the <code>SymboName<\/code> event. The <code>OnSymbolName<\/code> function does just that, and stores the name of every root template instantiation.<\/p>\n<pre class=\"prettyprint\">void OnSymbolName(SymbolName symbolName)\r\n{\r\n    auto it = rootSpecializations_.find(symbolName.Key());\r\n\r\n    if (it == rootSpecializations_.end()) {\r\n        return;\r\n    }\r\n\r\n    it-&gt;second.RootSpecializationName = symbolName.Name();\r\n}<\/pre>\n<p>At the end of our analysis, we traverse all root template instantiations, sort them by longest duration, and dump the top ones.<\/p>\n<pre class=\"prettyprint\">AnalysisControl OnEndAnalysis() override\r\n{\r\n    using namespace std::chrono;\r\n\r\n    auto topSpecializations = GetTopInstantiations();\r\n    \r\n    if (specializationCountToDump_ == 1) {\r\n        std::cout &lt;&lt; \"Top template instantiation hierarchy:\";\r\n    }\r\n    else {\r\n        std::cout &lt;&lt; \"Top \" &lt;&lt; specializationCountToDump_ &lt;&lt; \r\n            \" template instantiation \" &lt;&lt; \"hierarchies\";\r\n    }\r\n        \r\n    std::cout &lt;&lt; std::endl &lt;&lt; std::endl;\r\n\r\n    for (auto&amp; info : topSpecializations)\r\n    {\r\n        std::wcout &lt;&lt; \"File:           \" &lt;&lt; \r\n            info.File &lt;&lt; std::endl;\r\n        std::cout  &lt;&lt; \"Duration:       \" &lt;&lt; \r\n            duration_cast&lt;milliseconds&gt;(\r\n                info.TotalInstantiationTime).count() &lt;&lt; \r\n            \" ms\" &lt;&lt; std::endl;\r\n        std::cout  &lt;&lt; \"Max Depth:      \" &lt;&lt; \r\n            info.MaxDepth &lt;&lt; std::endl;\r\n        std::cout  &lt;&lt; \"Instantiations: \" &lt;&lt; \r\n            info.InstantiationCount &lt;&lt; std::endl;\r\n        std::cout  &lt;&lt; \"Root Name:      \" &lt;&lt; \r\n            info.RootSpecializationName &lt;&lt; std::endl &lt;&lt; std::endl;\r\n    }\r\n\r\n    return AnalysisControl::CONTINUE;\r\n}<\/pre>\n<p><span style=\"font-size: 18pt;\"><strong>Tell us what you think!<\/strong><\/span><\/p>\n<p>We hope the information in this article has helped you understand how you can use the tools provided by C++ Build Insights to profile your usage of templates.<\/p>\n<p>Give <em>vcperf<\/em> a try today by downloading the latest version of <a href=\"https:\/\/visualstudio.microsoft.com\/downloads\/\">Visual Studio 2019<\/a>, or by cloning the tool directly from the <a href=\"https:\/\/github.com\/microsoft\/vcperf\">vcperf Github repository<\/a>. Try out the <strong><em>RecursiveTemplateInspector<\/em><\/strong> sample from this article by cloning the <a href=\"https:\/\/github.com\/microsoft\/cpp-build-insights-samples\">C++ Build Insights samples repository<\/a> from GitHub, or refer to the official <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build-insights\/reference\/sdk\/overview?view=vs-2019\">C++ Build Insights SDK documentation<\/a> to build your own analysis tools.<\/p>\n<p>Have you found problematic template instantiations using <em>vcperf<\/em> or the C++ Build Insights SDK? Let us know in the comments below, on Twitter <a href=\"https:\/\/twitter.com\/visualc\">(@VisualC<\/a>), or via email at <a href=\"mailto:visualcpp@microsoft.com\">visualcpp@microsoft.com<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p><em>This article contains code snippets from <a href=\"https:\/\/github.com\/bolero-MURAKAMI\/Sprout\">Sprout<\/a>, Copyright (c) 2011-2019 Bolero MURAKAMI, distributed under the <a href=\"https:\/\/github.com\/bolero-MURAKAMI\/Sprout\/blob\/master\/LICENSE_1_0.txt\">Boost Software License, Version 1.0<\/a>.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The use of templates in C++ programs can sometimes lead to longer builds. C++ Build Insights puts tools at your disposal for analyzing template usage patterns and their associated impact on build time. In this article, we show you how to use the vcperf analysis tool and the C++ Build Insights SDK to understand and [&hellip;]<\/p>\n","protected":false},"author":6966,"featured_media":26128,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,239],"tags":[],"class_list":["post-26104","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus","category-diagnostics"],"acf":[],"blog_post_summary":"<p>The use of templates in C++ programs can sometimes lead to longer builds. C++ Build Insights puts tools at your disposal for analyzing template usage patterns and their associated impact on build time. In this article, we show you how to use the vcperf analysis tool and the C++ Build Insights SDK to understand and [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/26104","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/6966"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=26104"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/26104\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/26128"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=26104"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=26104"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=26104"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}