{"id":26218,"date":"2020-07-21T15:02:45","date_gmt":"2020-07-21T15:02:45","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=26218"},"modified":"2020-07-21T15:04:19","modified_gmt":"2020-07-21T15:04:19","slug":"improving-code-generation-time-with-cpp-build-insights","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/improving-code-generation-time-with-cpp-build-insights\/","title":{"rendered":"Improving code generation time with C++ Build Insights"},"content":{"rendered":"<p>Code generation time is a significant contributor to total build time for optimized builds. During this step of the compilation process, source code is passed through several algorithms that transform your program into an optimized version that can be executed more efficiently. In general, the optimization of a C++ function happens quickly and poses no problem. In exceptional cases, however, some functions can become large and complex enough to put pressure on the optimizer and noticeably slow down your builds. In this article, we show how you can use C++ Build Insights to determine if slow code generation is a problem for you. We demonstrate two ways to diagnose these issues: first manually with the <strong><em>vcperf<\/em><\/strong> analysis tool, and then programmatically with the <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build-insights\/reference\/sdk\/overview?view=vs-2019\">C++ Build Insights SDK<\/a>. Throughout the tutorial, we show these techniques being used to improve the build time of Chakra, the open-source JavaScript engine, by 7%.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>How to obtain and use vcperf<\/strong><\/span><\/p>\n<p>The examples in this article make use of <em>vcperf<\/em>, a tool that allows you to capture a trace of your build and to view it in the <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/test\/wpt\/windows-performance-analyzer\">Windows Performance Analyzer (WPA)<\/a>. The latest version is available in Visual Studio 2019.<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>1. Follow these steps to obtain and configure <em>vcperf <\/em>and WPA:<\/strong><\/span><\/p>\n<ol>\n<li>Download and install the latest <a href=\"https:\/\/visualstudio.microsoft.com\/downloads\/\">Visual Studio 2019<\/a>.<\/li>\n<li>Obtain WPA by downloading and installing the <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/get-started\/adk-install\">latest Windows ADK<\/a>.<\/li>\n<li>Copy the <code>perf_msvcbuildinsights.dll<\/code> file from your Visual Studio 2019\u2019s MSVC installation directory to your newly installed WPA directory. This file is the C++ Build Insights WPA add-in, which must be available to WPA for correctly displaying the C++ Build Insights events.\n<ol>\n<li>MSVC\u2019s installation directory is typically: <code>C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\{Edition}\\VC\\Tools\\MSVC\\{Version}\\bin\\Hostx64\\x64<\/code>.<\/li>\n<li>WPA\u2019s installation directory is typically: <code>C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit<\/code>.<\/li>\n<\/ol>\n<\/li>\n<li>Open the <code>perfcore.ini<\/code> file in your WPA installation directory, and add an entry for the <code>perf_msvcbuildinsights.dll<\/code> file. This tells WPA to load the C++ Build Insights plugin on startup.<\/li>\n<\/ol>\n<p>You can also obtain the latest <em>vcperf<\/em> and WPA add-in by cloning and building the <a href=\"https:\/\/github.com\/microsoft\/vcperf\">vcperf GitHub repository<\/a>. Feel free to use your built copy in conjunction with Visual Studio 2019!<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>2. Follow these steps to collect a trace of your build:<\/strong><\/span><\/p>\n<ol>\n<li>Open an elevated <strong><em>x64 Native Tools Command Prompt for VS 2019<\/em><\/strong>.<\/li>\n<li>Obtain a trace of your build:\n<ol>\n<li>Run the following command: <code>vcperf \/start MySessionName<\/code>.<\/li>\n<li>Build your C++ project from anywhere, even from within Visual Studio (<em>vcperf<\/em> collects events system-wide).<\/li>\n<li>Run the following command: <code>vcperf \/stop MySessionName outputFile.etl<\/code>. This command will stop the trace, analyze all events, and save everything in the <em>outputFile.etl<\/em> trace file.<\/li>\n<\/ol>\n<\/li>\n<li>Open the trace you just collected in WPA.<\/li>\n<\/ol>\n<p><span style=\"font-size: 18pt;\"><strong>Using the Functions view in WPA<\/strong><\/span><\/p>\n<p>C++ Build Insights has a dedicated view to help diagnose slow code generation time: the <em><strong>Functions<\/strong><\/em> view. After opening your trace in WPA, you can access the view by dragging it from the <em><strong>Graph Explorer<\/strong><\/em> pane to the <em><strong>Analysis<\/strong><\/em> window, as shown below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26239 size-full\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/DragDrop.gif\" alt=\"An image that shows how to open the Functions view in WPA, by clicking on the Functions thumbnail in the Graph Explorer pane and dragging it to the Analysis pane.\" width=\"837\" height=\"649\" \/><\/p>\n<p>The Functions view offers 3 presets that you can select from when navigating your build trace:<\/p>\n<ol>\n<li>Timelines<\/li>\n<li>Activity Statistics<\/li>\n<li>Force Inlinees<\/li>\n<\/ol>\n<p>Click on the drop-down menu at the top of the view to select the one you need. This step is illustrated below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26220\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-1.gif\" alt=\"An image that shows the preset selection drop-down box for the Functions view.\" width=\"597\" height=\"221\" \/><\/p>\n<p>In the next 3 sections, we cover each of these presets in turn.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Preset #1: Timelines<\/strong><\/span><\/p>\n<p>When the <em>Timelines<\/em> preset is active, focus your attention on the graph section at the top of the view. It gives an overview of where function code generation bottlenecks occur in your parallel build. Each timeline represents a thread. The timeline numbers match the ones in the <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/finding-build-bottlenecks-with-cpp-build-insights\/\">Build Explorer view<\/a>. In this graph, a colored bar represents a function being optimized. The longer the bar, the more time was consumed optimizing this function. Hover over each colored bar to see the name of the function being optimized. The position of the bar on the x axis indicates the time at which the function optimization started. Place a <em>Functions<\/em> view underneath a <em>Build Explorer<\/em> view to understand how the code generation of a function affects the overall build, and if it\u2019s a bottleneck. The <em>Timelines<\/em> preset is shown in the image below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26221\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-2.gif\" alt=\"An image that shows an overview of the Timelines preset for the Functions view.\" width=\"775\" height=\"432\" \/><\/p>\n<p><strong>N.B. <\/strong>Accurate parallelism for code generation is only available starting with Visual Studio 2019 version 16.4. In earlier versions, all code generation threads for a given compiler or linker invocation are placed on one timeline. In version 16.4 and above, each code generation thread within a compiler or linker invocation is placed on its own timeline.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Preset #2: Activity Statistics<\/strong><\/span><\/p>\n<p>The <em>Activity Statistics<\/em> preset shows code generation statistics for the functions in your build. When using this preset, focus your attention on the table at the bottom of the view. By default, functions are sorted by their code generation duration in descending order. Use this preset if you want to quickly identify the functions that take the most time to optimize in your entire build. If you are interested in only a section of your build, click and drag the mouse over the desired time span in the graph section at the top of the view. The values in the table below the graph will automatically adjust to the selected time span. The table displays statistics such as: code generation time, file or DLL in which the function is found, and the compiler or linker invocation that was being executed during the generation of a function. Use the <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/finding-build-bottlenecks-with-cpp-build-insights\/\">Build Explorer view<\/a> to get more information on an invocation, if desired. Similar to the <em>Timelines<\/em> preset, the colored bars in the graph section at the top of the view indicate the time and duration when the code generation for a given function occurred, except that the information is grouped by function name instead of by thread. The <em>Activity Statistics<\/em> preset is shown below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26222\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-3.gif\" alt=\"An image that shows an overview of the Activity Statistics preset for the Functions view.\" width=\"775\" height=\"486\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Preset #3: Force Inlinees<\/strong><\/span><\/p>\n<p>The inlining of large function call graphs into a single root function can result in very large functions that take a long time to optimize. An overabundant use of the <code>__forceinline<\/code> keyword is a common cause of this problem, so C++ Build Insights includes a dedicated preset to identify force-inlining issues quickly. When using the <em>Force Inlinees<\/em> preset, focus your attention on the table at the bottom of the view. Expand a function and its <strong><em>ForceInlinee<\/em><\/strong> node to see all the other functions that have been force-inlined in it. Functions that don\u2019t have any force-inlinees are filtered out. The <strong><em>String Value<\/em><\/strong> field contains the name of the force-inlinee, and the <strong><em>Numeric Value<\/em><\/strong> field indicates how much this force-inlinee caused the root-level function to grow in size. The <em>Numeric Value<\/em> is roughly equivalent to the number of intermediate instructions in the function being inlined, so higher is worse. The force-inlinees are sorted by size (i.e. <em>Numeric Value<\/em>) in descending order, allowing you to quickly see the worst offenders. Use this information to try removing some <code>__forceinline<\/code> keywords on large functions. These functions don\u2019t suffer as much from call overhead and are less likely to cause a performance degradation if inlining is omitted. Be careful when using the <em>Numeric Value<\/em> field, as the same function can be force-inlined multiple times and <em>Numeric Value<\/em> is a sum aggregation by default. Expand the <em>String Value<\/em> column for a force-inlinee to see the individual sizes of all inlinees of the same name. The <em>Force Inlinees<\/em> preset is shown below.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26223\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-4.gif\" alt=\"An image that shows an overview of the Force Inlinees preset for the Functions view.\" width=\"775\" height=\"487\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>A note on the function names displayed in the Functions view<\/strong><\/span><\/p>\n<p>The <em>Functions<\/em> view shows mangled C++ function names. You can use the <strong><em>undname<\/em><\/strong> utility to demangle the names. To do so, right click on the cell containing the name in WPA, click on <strong><em>Copy Cell<\/em><\/strong>, and pass that value to <em>undname<\/em>, as shown below. The <em>undname<\/em> utility is available in an <em>x64 Native Tools Command Prompt for VS 2019<\/em>.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26224\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-5.gif\" alt=\"A screenshot of an x64 Native Tools Command Prompt for VS 2019 after the undname utility has been executed for a function named LgoEvaluateZero. The output in the command prompt shows that after decoration, it is much easier to understand important pieces of information about LgoEvaluateZero, such as its return value and parameters.\" width=\"792\" height=\"294\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Putting it all together: using the Functions view to speed up Chakra builds<\/strong><\/span><\/p>\n<p>In this case study, we use the Chakra open-source JavaScript engine from GitHub to demonstrate how <em>vcperf<\/em> can be used to achieve a 7% build time improvement.<\/p>\n<p>Use these steps if you would like to follow along:<\/p>\n<ol>\n<li>Clone the <a href=\"https:\/\/github.com\/microsoft\/ChakraCore\">ChakraCore GitHub repository<\/a>.<\/li>\n<li>Change the directory to the root of the freshly cloned repository and run the following command: <code>git checkout c72b4b7<\/code>. This is the commit that was used for the case study below.<\/li>\n<li>Open the <code>Build\\Chakra.Core.sln<\/code> solution file, starting from the root of the repository.<\/li>\n<li>Obtain a trace for a full rebuild of the solution:\n<ol>\n<li>Open an elevated command prompt with <em>vcperf<\/em> on the PATH.<\/li>\n<li>Run the following command: <code>vcperf \/start Chakra<\/code><\/li>\n<li>Rebuild the <em>x64 Test<\/em> configuration of the <code>Build\\Chakra.Core.sln<\/code> solution file in Visual Studio 2019.<\/li>\n<li>Run the following command: <code>vcperf \/stop Chakra chakra.etl<\/code>. This will save a trace of the build in <em>chakra.etl<\/em>.<\/li>\n<\/ol>\n<\/li>\n<li>Open the trace in WPA.<\/li>\n<\/ol>\n<p>We bring in the <em>Build Explorer<\/em> and <em>Functions<\/em> views and place them one on top of the other. The <em>Functions<\/em> view shows two long code generation activities, labeled <strong><em>A<\/em><\/strong> and <strong><em>B<\/em><\/strong> below. These activities align with an invocation bottleneck shown in the <em>Build Explorer<\/em> view above. We surmise that reducing code generation time for <em>A<\/em> and <em>B<\/em> should help overall build time, given that they are on the critical path. Let&#8217;s investigate further.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26225\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-6.gif\" alt=\"An image that shown a Build Explorer view and a Functions view placed one on top of the other. In the Functions view, a long, dark blue colored bar (labeled B) is followed by another light blue bar (labeled A). These two bars form a critical path in the timeline, which indicates that it may be possible to improve build time by speeding up the optimization of functions A and B.\" width=\"775\" height=\"448\" \/><\/p>\n<p>We switch over to the <em>Activity Statistics<\/em> preset in the <em>Functions<\/em> view, and find out that the two functions corresponding to <em>A<\/em> and <em>B<\/em> are named <code>infos_<\/code> and <code>GetToken<\/code>, respectively.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26226\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-7.gif\" alt=\"An image that shows the Activity Statistics preset for the Functions view mentioned previously. This preset shows that functions A and B are named infos_ and GetToken, respectively.\" width=\"775\" height=\"422\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Fixing infos_: evaluating initializers at compile time<\/strong><\/span><\/p>\n<p>Looking at the code for Chakra, we find that <code>infos_<\/code> is a large global array of <code>Opcode::Info<\/code> objects. It is defined in <code>lib\\wabt\\src\\opcode.cc<\/code>, as shown below. The <code>Opcode::Info<\/code> elements are initialized with the 450+ entries found in <code>lib\\wabt\\src\\opcode.def<\/code>.<\/p>\n<pre class=\"prettyprint\">\/\/ static\r\nOpcode::Info Opcode::infos_[] = {\r\n#define WABT_OPCODE(rtype, type1, type2, type3, mem_size, prefix, code, Name, \\\r\n                    text)                                                     \\\r\n  {text,        Type::rtype, Type::type1,                                     \\\r\n   Type::type2, Type::type3, mem_size,                                        \\\r\n   prefix,      code,        PrefixCode(prefix, code)},\r\n#include \"src\/opcode.def\"\r\n#undef WABT_OPCODE\r\n\r\n    {\"&lt;invalid&gt;\", Type::Void, Type::Void, Type::Void, Type::Void, 0, 0, 0, 0},\r\n};<\/pre>\n<p>How is this array related to the <code>infos_<\/code> function that we\u2019re seeing in our Chakra trace? And why is this function slow to generate?<\/p>\n<p>Global variables sometimes cannot be initialized at compile time because their initialization involves the execution of some code (e.g. a constructor). In this case, the compiler generates a function known as a dynamic initializer which will be called during program startup to properly initialize the variable. You can easily recognize dynamic initializers in the <em>Functions<\/em> view because their mangled name always starts with <code>??__E<\/code>.<\/p>\n<p>The Chakra trace that we captured earlier tells us that a dynamic initializer function was generated for <code>infos_<\/code>. The reason why this function takes a long time to generate is because the initialization code for the 450+-element <code>infos_<\/code> array is very large and is causing the compiler\u2019s optimization phase to take more time to complete.<\/p>\n<p>In the case of <code>infos_<\/code>, all the information required to initialize its elements is known at compile time. It is possible to prevent the generation of a dynamic initializer by enforcing compile-time initialization as follows:<\/p>\n<ol>\n<li>(optionally) making the <code>infos_<\/code> array <code>constexpr<\/code>; and<\/li>\n<li>making the <code>PrefixCode<\/code> function <code>constexpr<\/code>.<\/li>\n<\/ol>\n<p>The reason why step 1 is a good idea is that compilation will fail if <code>infos_<\/code> ever changes by mistake in a way that prevents compile-time initialization. Without it, the compiler will silently revert back to generating a dynamic initializer function. Step 2 is required because <code>PrefixCode<\/code> is called during the initialization of each <code>Opcode::Info<\/code> element, and compile-time initialization of <code>infos_<\/code> cannot happen if any part of its initialization is not <code>constexpr<\/code>.<\/p>\n<p>See the <a href=\"https:\/\/github.com\/kevcadieux\/ChakraCore\/commit\/9e6cb6261b83100e02ed294904d593d980dfbf6c\">code for this fix on GitHub<\/a>.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Fixing GetToken: using the reduced optimizer for large functions<\/strong><\/span><\/p>\n<p>The C code for <code>GetToken<\/code> is generated (that\u2019s right, the C code itself is generated) by <a href=\"https:\/\/re2c.org\/\">re2c<\/a>, the open-source software lexer generator. The resulting function is very large and suffers from long optimization time due to its size. Because the C code is generated, it might not be trivial to modify it in a way that would fix our build time issue. When situations like this arise, you can use the <code>ReducedOptimizeHugeFunctions<\/code> switch. This switch prevents the optimizer from using expensive optimizations on large functions, resulting in improved optimization time. You can set the threshold for when the reduced optimizer kicks in by using the <code>ReducedOptimizeThreshold:#<\/code> switch. <code>#<\/code> is the number of instructions that the function must have before triggering the reduced optimizer. The default value is 20,000.<\/p>\n<ul>\n<li>When building with link-time code generation (LTCG) pass these switches to the linker using <code>\/d2:\u201d-ReducedOptimizeHugeFunctions\u201d<\/code> and <code>\/d2:\u201d-ReducedOptimizeThreshold:#\u201d<\/code>.<\/li>\n<li>When building without LTCG, pass these switches to the compiler using <code>\/d2ReducedOptimizeHugeFunctions<\/code> and <code>\/d2ReducedOptimizeThreshold:#<\/code>.<\/li>\n<\/ul>\n<p>For our Chakra example, we use Visual Studio 2019 to modify the ChakraCore project\u2019s properties and add the <code>\/d2:\u201d-ReducedOptimizeHugeFunctions\u201d<\/code> switch, as shown below. Because the configuration we are building makes use of LTCG, the switch is added to the linker.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26228\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-8.gif\" alt=\"An image that shown the properties window for the ChakraCore project in Visual Studio. The selected page is Configuration Properties -&gt; Linker -&gt; Command Line. The \/d2:&quot;-ReducedOptimizeHugeFunctions&quot; option has been added inside the Additional Options text box.\" width=\"786\" height=\"544\" \/><\/p>\n<p><strong>N.B. <\/strong>Using the <code>ReducedOptimizeHugeFunctions<\/code> switch may reduce the performance of generated code for large functions. If using this switch for performance-critical code, consider profiling your code before and after the change to make sure the difference is acceptable.<\/p>\n<p>See the <a href=\"https:\/\/github.com\/kevcadieux\/ChakraCore\/commit\/59b7d0fe556d343096e0a2c05cb3f33f7e052755\">code for this fix on GitHub<\/a>.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Alternative solutions: splitting code and removing __forceinline<\/strong><\/span><\/p>\n<p>It\u2019s possible that you can\u2019t use the methods described above for your project. This can be because:<\/p>\n<ol>\n<li>you have large global arrays that can\u2019t be initialized at compile-time; or<\/li>\n<li>the performance degradation incurred with the use of the <code>ReducedOptimizeHugeFunctions<\/code> switch is unacceptable.<\/li>\n<\/ol>\n<p>Slow code generation time for a function is almost always due to the function being large. Any approach that reduces the size of the function will help. Consider the following alternative solutions:<\/p>\n<ol>\n<li>Manually split a very large function into 2 or more subfunctions that are called individually. This splitting technique can also be used for large global arrays.<\/li>\n<li>Use the <em>Force Inlinees<\/em> preset of the <em>Functions<\/em> view to see if an overuse of the <code>__forceinline<\/code> keyword may be to blame. If so, try removing <code>__forceinline<\/code> from the largest force-inlined functions.<\/li>\n<\/ol>\n<p><strong>N.B.<\/strong> Alternative solution 2 may cause a performance degradation if <code>__forceinline<\/code> is removed from small functions that are frequently force-inlined. Using this solution on large force-inlined functions is preferred.<\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Evaluating our Chakra solution<\/strong><\/span><\/p>\n<p>We capture another trace after fixing the dynamic initializer for <code>infos__<\/code> and using the <code>ReducedOptimizeHugeFunctions<\/code> switch for <code>GetToken<\/code>. A comparison of the <em>Build Explorer<\/em> view before and after the change reveals that total build time went from 124 seconds to 115 seconds, a 7% reduction.<\/p>\n<p><strong><em>Build Explorer view before applying solution:<\/em><\/strong><\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26229\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-9.gif\" alt=\"An image that shows the Build Explorer view for an initial Chakra build. The Build Explorer view is set to the Timelines preset. The entire build timeline has been selected, and the mouse pointer is hovering over the selection. A tooltip is shown next to the mouse pointer, telling us that the duration of the build was 124 seconds.\" width=\"775\" height=\"217\" \/><\/p>\n<p><strong><em>Build Explorer view after applying solution:<\/em><\/strong><\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26230\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-10.gif\" alt=\"An image that shows the Build Explorer view for the improved Chakra build. The Build Explorer view is set to the Timelines preset. The entire build timeline has been selected, and the mouse pointer is hovering over the selection. A tooltip is shown next to the mouse pointer, telling us that the duration of the build was 115 seconds.\" width=\"775\" height=\"217\" \/><\/p>\n<p>The <em>Activity Statistics<\/em> preset in our <em>Functions<\/em> view for the fixed trace shows that <code>infos__<\/code> is no longer present and that <code>GetToken<\/code>\u2019s code generation time has gone down to around 5 seconds.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26231\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-11.gif\" alt=\"An image that shows the Functions view for the optimized Chakra build. The Functions view is set to the Activity Statistics preset. The image is focused on the table section of the Activity Statistics preset, which shows that the infos_ function is no longer an issue, and that GetToken's optimization time was reduced to 5 seconds.\" width=\"775\" height=\"222\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Identifying slow code generation using the C++ Build Insights SDK<\/strong><\/span><\/p>\n<p>Sometimes, it may be useful to detect functions with long code generation time automatically, without having to inspect a trace in WPA. For example, you may want to flag problematic functions during continuous integration (CI) or locally as a post-build step. The <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build-insights\/reference\/sdk\/overview?view=vs-2019\">C++ Build Insights SDK<\/a> enables these scenarios. To illustrate this point, we\u2019ve prepared the <strong><em>FunctionBottlenecks<\/em> <\/strong>SDK sample. When passed a trace, it prints out a list of functions that have a duration that is at least 1 second and longer than 5% of the duration of their containing <em>cl.exe<\/em> of <em>link.exe<\/em> invocation. The list of functions is sorted by duration in descending order. Since generous use of <code>__forceinline<\/code> is a common cause of functions that are slow to optimize, an asterisk is placed next to each entry where force inlining may be a problem.<\/p>\n<p>Let\u2019s repeat the Chakra case study from the previous section, but this time by using the <em>FunctionBottlenecks<\/em> sample to see what it finds<em>. <\/em>Use these steps if you want to follow along:<\/p>\n<ol>\n<li>Clone the <a href=\"https:\/\/github.com\/microsoft\/cpp-build-insights-samples\">C++ Build Insights SDK samples GitHub repository<\/a> on your machine.<\/li>\n<li>Build the `<em>Samples.sln` <\/em>solution, targeting the desired architecture (x86 or x64), and using the desired configuration (debug or release). The sample\u2019s executable will be placed in the <code>out\/{architecture}\/{configuration}\/FunctionBottlenecks<\/code> folder, starting from the root of the repository.<\/li>\n<li>Follow the steps from the <em><strong>Putting it all together: using the Functions view to speed up Chakra builds<\/strong><\/em> section to collect a trace of the Chakra solution. Use the <code>\/stopnoanalyze<\/code> command instead of the <code>\/stop<\/code> command when stopping your trace. The <code>\/stopnoanalyze<\/code> command is used for obtaining a trace that is compatible with the SDK.<\/li>\n<li>Pass the collected trace as the first argument to the <em>FunctionBottlenecks<\/em> executable.<\/li>\n<\/ol>\n<p>As shown below, when passed the trace for the unmodified project, <em>FunctionBottlenecks<\/em> correctly identifies the <code>GetToken<\/code> function and the dynamic analyzer for the <code>infos_<\/code> array.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26232\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-12.gif\" alt=\"An image that shows an elevated x64 Native Tools Command Prompt for VS 2019. FunctionBottlenecks.exe is executed on a file called chakra-base.etl, which is the trace file for our initial Charka build. The output of the executable shows a list of 4 problematic functions. The top one is GetToken, with an optimization time of 38 seconds. The second one is EmitExpr, with an optimization time of 7 seconds. The third function is infos_, with an optimization time of 4 seconds. The fourth and last function is LanguageTypeConversionTest, with an optimization time of 1 second.\" width=\"816\" height=\"209\" \/><\/p>\n<p>After applying the solutions described above in this article, the <em>FunctionBottlenecks<\/em> sample confirms that the situation is improved: <code>GetToken<\/code> has gone down from 38 to 5 seconds, and <code>infos_<\/code> is no longer an issue.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-26233\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/07\/word-image-13.gif\" alt=\"An image that shows an elevated x64 Native Tools Command Prompt for VS 2019. FunctionBottlenecks.exe is executed on a file called chakra-diff.etl, which is the trace file for our Chakra build that has been improved using the techniques described in this article. The output of the executable shows a list of 4 problematic functions. The top one is EmitExpr, with an optimization time of 7 seconds. The second one is GetToken, with an optimization time of 5 seconds. The third function is LanguageTypeConversionTest, with an optimization time of 1 seconds. The fourth and last function is ArrayBufferTest, with an optimization time of 1 second.\" width=\"816\" height=\"209\" \/><\/p>\n<p><span style=\"font-size: 18pt;\"><strong>Going over the sample code<\/strong><\/span><\/p>\n<p>The <em>FunctionBottlenecks<\/em> analyzer processes a trace in 2 passes. The first pass is accomplished by two member functions. The first is <code>OnStopInvocation<\/code> and is used to cache the duration of all invocations.<\/p>\n<pre class=\"prettyprint\">void OnStopInvocation(Invocation invocation)\r\n{\r\n    using namespace std::chrono;\r\n\r\n    \/\/ Ignore very short invocations\r\n    if (invocation.Duration() &lt; std::chrono::seconds(1)) {\r\n        return;\r\n    }\r\n\r\n    cachedInvocationDurations_[invocation.EventInstanceId()] =\r\n        duration_cast&lt;milliseconds&gt;(invocation.Duration());\r\n}<\/pre>\n<p>The second is <code>ProcessForceInlinee<\/code> and is used to cache the aggregated size of all functions that were force-inlined into a given top-level function.<\/p>\n<pre class=\"prettyprint\">void ProcessForceInlinee(Function func, ForceInlinee inlinee)\r\n{\r\n    forceInlineSizeCache_[func.EventInstanceId()] += \r\n        inlinee.Size();\r\n}<\/pre>\n<p>The second pass is where we decide whether a function is a bottleneck based on the information gathered in pass 1. This is done in <code>OnStopFunction<\/code>.<\/p>\n<pre class=\"prettyprint\">void OnStopFunction(Invocation invocation, Function func)\r\n{\r\n    using namespace std::chrono;\r\n\r\n    auto itInvocation = cachedInvocationDurations_.find(\r\n        invocation.EventInstanceId());\r\n\r\n    if (itInvocation == cachedInvocationDurations_.end()) {\r\n        return;\r\n    }\r\n\r\n    auto itForceInlineSize = forceInlineSizeCache_.find(\r\n        func.EventInstanceId());\r\n\r\n    unsigned forceInlineSize =\r\n        itForceInlineSize == forceInlineSizeCache_.end() ?\r\n            0 : itForceInlineSize-&gt;second;\r\n\r\n    milliseconds functionMilliseconds = \r\n        duration_cast&lt;milliseconds&gt;(func.Duration());\r\n\r\n    double functionTime = static_cast&lt;double&gt;(\r\n        functionMilliseconds.count());\r\n\r\n    double invocationTime = static_cast&lt;double&gt;(\r\n        itInvocation-&gt;second.count());\r\n\r\n    double percent = functionTime \/ invocationTime;\r\n\r\n    if (percent &gt; 0.05 &amp;&amp; func.Duration() &gt;= seconds(1))\r\n    {\r\n        identifiedFunctions_[func.EventInstanceId()]= \r\n            { func.Name(), functionMilliseconds, percent, \r\n              forceInlineSize };\r\n    }\r\n}<\/pre>\n<p>As shown above, the bottleneck functions are added to the <code>identifiedFunctions_<\/code> container. This container is a <code>std::unordered_map<\/code> that holds values of type <code>IdentifiedFunction<\/code>.<\/p>\n<pre class=\"prettyprint\">struct IdentifiedFunction\r\n{\r\n    std::string Name;\r\n    std::chrono::milliseconds Duration;\r\n    double Percent;\r\n    unsigned ForceInlineeSize;\r\n\r\n    bool operator&lt;(const IdentifiedFunction&amp; other) const {\r\n        return Duration &gt; other.Duration;\r\n    }\r\n};<\/pre>\n<p>We use the <code>OnEndAnalysis<\/code> callback from the <code>IAnalyzer<\/code> interface to sort the identified functions by duration in descending order and print the list to standard output.<\/p>\n<pre class=\"prettyprint\">AnalysisControl OnEndAnalysis() override\r\n{\r\n    std::vector&lt;IdentifiedFunction&gt; sortedFunctions;\r\n\r\n    for (auto&amp; p : identifiedFunctions_) {\r\n        sortedFunctions.push_back(p.second);\r\n    }\r\n\r\n    std::sort(sortedFunctions.begin(), sortedFunctions.end());\r\n\r\n    for (auto&amp; func : sortedFunctions)\r\n    {\r\n        bool forceInlineHeavy = func.ForceInlineeSize &gt;= 10000;\r\n\r\n        std::string forceInlineIndicator = forceInlineHeavy ?\r\n            \", *\" : \"\";\r\n\r\n        int percent = static_cast&lt;int&gt;(func.Percent * 100);\r\n\r\n        std::string percentString = \"(\" + \r\n            std::to_string(percent) + \"%\" + \r\n            forceInlineIndicator + \")\";\r\n\r\n        std::cout &lt;&lt; std::setw(9) &lt;&lt; std::right &lt;&lt; \r\n            func.Duration.count();\r\n        std::cout &lt;&lt; \" ms \";\r\n        std::cout &lt;&lt; std::setw(9) &lt;&lt; std::left &lt;&lt; \r\n            percentString;\r\n        std::cout &lt;&lt; \" \" &lt;&lt; func.Name &lt;&lt; std::endl;\r\n    }\r\n\r\n    return AnalysisControl::CONTINUE;\r\n}<\/pre>\n<p><span style=\"font-size: 18pt;\"><strong>Tell us what you think!<\/strong><\/span><\/p>\n<p>We hope the information in this article has helped you understand how you can use the <em>Functions<\/em> view with <em>vcperf<\/em> and WPA to diagnose slow code generation in your builds. We also hope that the provided SDK sample will serve as a good foundation on which to build your own analyzers.<\/p>\n<p>Give <em>vcperf<\/em> a try today by downloading the latest version of <a href=\"https:\/\/visualstudio.microsoft.com\/downloads\/\">Visual Studio 2019<\/a>, or by cloning the tool directly from the <a href=\"https:\/\/github.com\/microsoft\/vcperf\">vcperf GitHub repository<\/a>. Try out the <em>FunctionBottlenecks<\/em> sample from this article by cloning the <a href=\"https:\/\/github.com\/microsoft\/cpp-build-insights-samples\">C++ Build Insights samples repository<\/a> from GitHub, or refer to the official <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build-insights\/reference\/sdk\/overview?view=vs-2019\">C++ Build Insights SDK documentation<\/a> to build your own analysis tools.<\/p>\n<p>Have you found code generation issues in your builds using <em>vcperf<\/em> or the C++ Build Insights SDK? Let us know in the comments below, on Twitter <a href=\"https:\/\/twitter.com\/visualc\">(@VisualC<\/a>), or via email at <a href=\"mailto:visualcpp@microsoft.com\">visualcpp@microsoft.com<\/a>.<\/p>\n<p><em>This article contains code snippets from <a href=\"https:\/\/github.com\/WebAssembly\/wabt\">WABT: The WebAssembly Binary Toolkit<\/a>, Copyright (c) 2015-2020 WebAssembly Community Group participants, distributed under the\u00a0<a href=\"http:\/\/www.apache.org\/licenses\/LICENSE-2.0\">Apache License, Version 2.0<\/a>.<\/em><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Code generation time is a significant contributor to total build time for optimized builds. During this step of the compilation process, source code is passed through several algorithms that transform your program into an optimized version that can be executed more efficiently. In general, the optimization of a C++ function happens quickly and poses no [&hellip;]<\/p>\n","protected":false},"author":6966,"featured_media":25198,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,239],"tags":[],"class_list":["post-26218","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus","category-diagnostics"],"acf":[],"blog_post_summary":"<p>Code generation time is a significant contributor to total build time for optimized builds. During this step of the compilation process, source code is passed through several algorithms that transform your program into an optimized version that can be executed more efficiently. In general, the optimization of a C++ function happens quickly and poses no [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/26218","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/6966"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=26218"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/26218\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/25198"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=26218"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=26218"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=26218"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}