{"id":23977,"date":"2019-03-19T07:00:25","date_gmt":"2019-03-19T07:00:25","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=23977"},"modified":"2024-09-10T07:57:26","modified_gmt":"2024-09-10T07:57:26","slug":"game-performance-and-compilation-time-improvements-in-visual-studio-2019","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/game-performance-and-compilation-time-improvements-in-visual-studio-2019\/","title":{"rendered":"Game performance and compilation time improvements in Visual Studio 2019"},"content":{"rendered":"<p>The C++ compiler in Visual Studio 2019 includes several new optimizations and improvements geared towards increasing the performance of games and making game developers more productive by reducing the compilation time of large projects. Although the focus of this blog post is on the game industry, these improvements apply to most C++ applications and C++ developers.<\/p>\n<h5>Compilation time improvements<\/h5>\n<p>One of the focus points of the C++ toolset team in the VS 2019 release is improving linking time, which in turn allows faster iteration builds and quicker debugging. Two significant changes to the linker help speed up the generation of debug information (PDB files):<\/p>\n<ul>\n<li>Type pruning in the backend removes type information that is not referenced by any variables and reduces the amount of work the linker must do during type merging.<\/li>\n<li>Speed up type merging by using a fast hash function to identify identical types.<\/li>\n<\/ul>\n<p>The table below shows the speedup measured in linking a large, popular AAA game:<\/p>\n<table style=\"width: 652px;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 118.85px;\">\n<p style=\"text-align: center;\"><strong>Debug build\n<\/strong><strong>configuration<\/strong><\/p>\n<\/td>\n<td style=\"width: 171.37px; text-align: center;\"><strong>Linking time (sec)\n<\/strong><strong>VS 2017 (15.9)<\/strong><\/td>\n<td style=\"width: 171.37px; text-align: center;\"><strong>Linking time (sec)\n<\/strong><strong>VS 2019 (16.0)<\/strong><\/td>\n<td style=\"width: 189.41px; text-align: center;\"><strong>Linking time speedup<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 118.85px;\">\/DEBUG:full<\/td>\n<td style=\"width: 171.37px;\">\n<p style=\"text-align: center;\">392.1<\/p>\n<\/td>\n<td style=\"width: 171.37px;\">\n<p style=\"text-align: center;\">163.3<\/p>\n<\/td>\n<td style=\"width: 189.41px;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\"><strong>2.40x<\/strong><\/span><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 118.85px; text-align: center;\">\/DEBUG:fastlink<\/td>\n<td style=\"width: 171.37px; text-align: center;\">72.3<\/td>\n<td style=\"width: 171.37px; text-align: center;\">31.2<\/td>\n<td style=\"width: 189.41px;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\"><strong>2.32x<\/strong><\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>More details and additional benchmarks can be found in <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/linker-throughput-improvement-in-visual-studio-2019\/\" target=\"_blank\" rel=\"noopener\">this blog post<\/a>.<\/p>\n<h5>Vector (SIMD) expression optimizations<\/h5>\n<p>One of the most significant improvements in the code optimizer is handling of vector (SIMD) intrinsics, both from source code and as a result of automated vectorization. In VS 2017 and prior, most vector operations would go through the main optimizer without any special handling, similar to function calls, although they are represented as intrinsics &#8211; special functions known to the compiler. Starting with VS 2019, most expressions involving vector intrinsics are optimized just like regular integer\/float code using the <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/new-code-optimizer\/\">SSA optimizer<\/a>.<\/p>\n<p>Both float (eg. <em>_mm_add_ps<\/em>) and integer (eg. <em>_mm_add_epi32<\/em>) versions of the intrinsics are supported, targeting the SSE\/SSE2 and AVX\/AVX2 instruction sets. Some of the performed optimizations, among many others:<\/p>\n<ul>\n<li>constant folding<\/li>\n<li>arithmetic simplifications, including reassociation<\/li>\n<li>handling of cmp, min\/max, abs, extract operations<\/li>\n<li>converting vector to scalar operations if profitable<\/li>\n<li>patterns for shuffle and pack operations<\/li>\n<\/ul>\n<p>Other optimizations, such as common sub-expression elimination, can now take advantage of a better understanding of load\/store vector operations, which are handled like regular loads\/stores. Several ways of initializing a vector register are recognized and the values are used during the expression simplifications (eg.<em> _mm_set_ps, _mm_set_ps1, _mm_setr_ps, _mm_setzero_ps<\/em> for float values).<\/p>\n<p>Another important addition is the generation of fused multiply-add (FMA) for vector intrinsics when the \/arch:AVX2 compiler flag is used \u2013 previously it was done only for scalar float code. This allows the CPU to compute the expression <em>a*b + c<\/em> in fewer cycles, which can be a significant speedup in math-heavy code, as one of the examples below is showing.<\/p>\n<p>The following code exemplifies both the generation of FMA with \/arch:AVX2 and the expression optimizations when \/fp:fast is used:<\/p>\n<p><span style=\"color: #000080;\"><code><span style=\"color: #000080;\">__m128 test(float a, float b) { <\/span><\/code><\/span>\n<span style=\"color: #0000ff;\"><code>\u00a0 \u00a0 <span style=\"color: #000080;\">__m128 va = _mm_set1_ps(a); <\/span><\/code><\/span>\n<span style=\"color: #0000ff;\"><code>\u00a0 \u00a0 <span style=\"color: #000080;\">__m128 vb = _mm_set1_ps(b); <\/span><\/code><\/span>\n<span style=\"color: #0000ff;\"><code>\u00a0 \u00a0 <span style=\"color: #000080;\">__m128 vd = _mm_set1_ps(-b);<\/span><\/code><\/span><\/p>\n<p><span style=\"color: #0000ff;\"><code>\u00a0 \u00a0 <span style=\"color: #000080;\">\/\/ Computes (va * vb) + (va * -vb) <\/span><\/code><\/span>\n<span style=\"color: #0000ff;\"><code>\u00a0 \u00a0 <span style=\"color: #0000ff;\"><span style=\"color: #000080;\">return _mm_add_ps(_mm_mul_ps(va, vb),<\/span><\/span><\/code><span style=\"color: #000080;\"><code>_<span style=\"color: #000080;\">mm_mul_ps(va, vd)); <\/span><\/code><\/span><\/span>\n<span style=\"color: #000080;\"><code><span style=\"color: #0000ff;\"><span style=\"color: #000080;\">}<\/span><\/span><\/code><\/span><\/p>\n<table style=\"width: 953px;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 386.48px;\">\n<p style=\"text-align: left;\">No simplifications are done; FMA not generated.<\/p>\n<\/td>\n<td style=\"width: 565.52px;\">VS 2017 \/arch:AVX2 \/fp:fast<strong>\n<\/strong> <span style=\"color: #000080;\"><code><span style=\"color: #000080;\">vmovaps xmm3, xmm0<\/span>\n<\/code><code><span style=\"color: #000080;\">vbroadcastss xmm3, xmm0<\/span><\/code><\/span>\n<span style=\"color: #000080;\"><code><span style=\"color: #000080;\">vxorps xmm0, xmm1, DWORD PTR __xmm@80000000800000008000000080000000<\/span><\/code><\/span>\n<span style=\"color: #000080;\"><code><span style=\"color: #000080;\">vbroadcastss xmm0, xmm0<\/span><\/code><\/span>\n<span style=\"color: #000080;\"><code><strong><span style=\"color: #000080;\">vmulps<\/span><\/strong><span style=\"color: #000080;\"> xmm2, xmm0, xmm3<\/span><\/code><\/span>\n<span style=\"color: #000080;\"><code><span style=\"color: #000080;\">vbroadcastss xmm1, xmm1<\/span><\/code><\/span>\n<span style=\"color: #000080;\"><code><strong><span style=\"color: #000080;\">vmulps<\/span><\/strong><span style=\"color: #000080;\"> xmm0, xmm1, xmm3<\/span><\/code><\/span>\n<span style=\"color: #000080;\"><code><strong><span style=\"color: #000080;\">vaddps<\/span><\/strong><span style=\"color: #000080;\"> xmm0, xmm2, xmm0<\/span><\/code><\/span>\n<span style=\"color: #000080;\"><code><span style=\"color: #000080;\">ret 0<\/span><\/code><\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 386.48px;\">No simplifications done \u2013 not legal under \/fp:precise; FMA generated.<\/td>\n<td style=\"width: 565.52px;\">VS 2019 \/arch:AVX2<strong>\n<\/strong> <code><span style=\"color: #000080;\">vmovaps xmm2, xmm0<\/span><\/code>\n<code><span style=\"color: #000080;\">vbroadcastss xmm2, xmm0<\/span><\/code>\n<code><span style=\"color: #000080;\">vmovaps xmm0, xmm1<\/span><\/code>\n<code><span style=\"color: #000080;\">vbroadcastss xmm0, xmm1<\/span><\/code>\n<code><span style=\"color: #000080;\">vxorps xmm1, xmm1, DWORD PTR __xmm@80000000800000008000000080000000<\/span><\/code>\n<code><span style=\"color: #000080;\">vbroadcastss xmm1, xmm1<\/span><\/code>\n<code><strong><span style=\"color: #000080;\">vmulps<\/span><\/strong><span style=\"color: #000080;\"> xmm0, xmm0, xmm2<\/span><\/code>\n<code><span style=\"color: #800000;\"><strong>vfmadd231ps<\/strong><\/span><span style=\"color: #000080;\"> xmm0, xmm1, xmm2<\/span><\/code>\n<code><span style=\"color: #000080;\">ret 0<\/span><\/code><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 386.48px;\">Entire expression simplified to \u201creturn 0\u201d since \/fp:fast allows applying the usual arithmetic rules.<\/td>\n<td style=\"width: 565.52px;\">VS 2019 \/arch:AVX2 \/fp:fast<\/p>\n<p><code><span style=\"color: #800000;\"><strong>vxorps<\/strong><\/span><span style=\"color: #000080;\"> xmm0, xmm0, xmm0<\/span><\/code>\n<code><span style=\"color: #000080;\">ret 0<\/span><\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>More examples can be found in this older <a href=\"http:\/\/www.liranuna.com\/sse-intrinsics-optimizations-in-popular-compilers\/\">blog post<\/a>, which discusses the SIMD generation of several compilers \u2013 VS 2019 now handles all the cases as expected, and a lot more!<\/p>\n<h5>Benchmarking the vector optimizations<\/h5>\n<p>For measuring the benefit of the vector optimizations, Xbox ATG (Advanced Technology Group) provided a benchmark based on code from Unreal Engine 4 for commonly used mathematical operations, such as SIMD expressions, vector\/matrix transformations and sin\/cos\/sqrt functions. The tests are a combination of cases where the values are constants and cases where the values are unknown at compile time. This tests the common scenario where the values are not known at compile-time, but also the situation that arises usually after inlining when some values turn out to be constants.<\/p>\n<p>The table below shows the speedup of the tests grouped into four categories, the execution time (milliseconds) being the sum of all tests in the category. The next table shows the improvements for a few individual tests when using unknown, random values &#8211; the versions that use constants are folded now as expected.<\/p>\n<table style=\"width: 441px;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 119.88px;\">\n<p style=\"text-align: center;\"><strong>Category<\/strong><\/p>\n<\/td>\n<td style=\"width: 122.57px; text-align: center;\"><strong>VS 2017 (ms)<\/strong><\/td>\n<td style=\"width: 104.67px; text-align: center;\"><strong>VS 2019 (ms)<\/strong><\/td>\n<td style=\"width: 92.88px; text-align: center;\">\n<p style=\"text-align: center;\"><strong>Speedup<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 119.88px; text-align: center;\">Math<\/td>\n<td style=\"width: 122.57px; text-align: center;\">482<\/td>\n<td style=\"width: 104.67px; text-align: center;\">366<\/td>\n<td style=\"width: 92.88px; text-align: center;\"><span style=\"color: #008000;\"><strong>27.36%<\/strong><\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 119.88px; text-align: center;\">Vector<\/td>\n<td style=\"width: 122.57px; text-align: center;\">337<\/td>\n<td style=\"width: 104.67px; text-align: center;\">238<\/td>\n<td style=\"width: 92.88px; text-align: center;\"><span style=\"color: #008000;\"><strong>34.43%<\/strong><\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 119.88px; text-align: center;\">Matrix<\/td>\n<td style=\"width: 122.57px; text-align: center;\">3168<\/td>\n<td style=\"width: 104.67px; text-align: center;\">3158<\/td>\n<td style=\"width: 92.88px; text-align: center;\"><span style=\"color: #008000;\"><strong>0.32%<\/strong><\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 119.88px; text-align: center;\">Trigonometry<\/td>\n<td style=\"width: 122.57px; text-align: center;\">3268<\/td>\n<td style=\"width: 104.67px; text-align: center;\">1882<\/td>\n<td style=\"width: 92.88px; text-align: center;\"><span style=\"color: #008000;\"><strong>53.83%<\/strong><\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<table style=\"width: 443px; height: 264px;\" border=\"1\">\n<tbody>\n<tr style=\"height: 44px;\">\n<td style=\"width: 126.29px; height: 44px;\">\n<p style=\"text-align: center;\"><strong>Test<\/strong><\/p>\n<\/td>\n<td style=\"width: 115.26px; height: 44px; text-align: center;\"><strong>VS 2017 (ms)<\/strong><\/td>\n<td style=\"width: 110.25px; height: 44px; text-align: center;\"><strong>VS 2019 (ms)<\/strong><\/td>\n<td style=\"width: 90.2px; height: 44px;\">\n<p style=\"text-align: center;\"><strong>Speedup<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 44px;\">\n<td style=\"width: 126.29px; height: 44px;\">\n<p style=\"text-align: center;\">VectorDot3<\/p>\n<\/td>\n<td style=\"width: 115.26px; height: 44px; text-align: center;\">\n<p style=\"text-align: center;\">42<\/p>\n<\/td>\n<td style=\"width: 110.25px; height: 44px; text-align: center;\">\n<p style=\"text-align: center;\">39<\/p>\n<\/td>\n<td style=\"width: 90.2px; height: 44px; text-align: center;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\"><strong>7.4%<\/strong><\/span><\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 44px;\">\n<td style=\"width: 126.29px; height: 44px;\">\n<p style=\"text-align: center;\">MatrixMultiply<\/p>\n<\/td>\n<td style=\"width: 115.26px; height: 44px;\">\n<p style=\"text-align: center;\">204<\/p>\n<\/td>\n<td style=\"width: 110.25px; height: 44px;\">\n<p style=\"text-align: center;\">194<\/p>\n<\/td>\n<td style=\"width: 90.2px; height: 44px;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\"><strong>5%<\/strong><\/span><\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 44px;\">\n<td style=\"width: 126.29px; height: 44px;\">\n<p style=\"text-align: center;\">VectorCRTSin<\/p>\n<\/td>\n<td style=\"width: 115.26px; height: 44px;\">\n<p style=\"text-align: center;\">421<\/p>\n<\/td>\n<td style=\"width: 110.25px; height: 44px;\">\n<p style=\"text-align: center;\">402<\/p>\n<\/td>\n<td style=\"width: 90.2px; height: 44px;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\"><strong>4.6%<\/strong><\/span><\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 44px;\">\n<td style=\"width: 126.29px; height: 44px;\">\n<p style=\"text-align: center;\">NormalizeSqrt<\/p>\n<\/td>\n<td style=\"width: 115.26px; height: 44px;\">\n<p style=\"text-align: center;\">82<\/p>\n<\/td>\n<td style=\"width: 110.25px; height: 44px;\">\n<p style=\"text-align: center;\">77<\/p>\n<\/td>\n<td style=\"width: 90.2px; height: 44px;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\"><strong>7.4%<\/strong><\/span><\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 44px;\">\n<td style=\"width: 126.29px; height: 44px;\">NormalizeInvSqrt<\/td>\n<td style=\"width: 115.26px; height: 44px;\">\n<p style=\"text-align: center;\">106<\/p>\n<\/td>\n<td style=\"width: 110.25px; height: 44px;\">\n<p style=\"text-align: center;\">97<\/p>\n<\/td>\n<td style=\"width: 90.2px; height: 44px;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\"><strong>8.8%<\/strong><\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h5>Improvements in Unreal Engine 4 &#8211; Infiltrator Demo<\/h5>\n<p>To ensure that our efforts benefit actual games and not just micro-benchmarks, we used the <a href=\"https:\/\/www.unrealengine.com\/marketplace\/en-US\/infiltrator-demo\">Infiltrator Demo<\/a> as a representative for an AAA game based on Unreal Engine 4.21. Being mostly a cinematic sequence rendered in real-time, with complex graphics, animations and physics, the execution profile is similar to an actual game; at the same time it is a great target for getting the stable, reproducible results needed to investigate performance and measure the impact of compiler improvements.<\/p>\n<p>The main way of measuring a game\u2019s performance is using the frame time. Frame times can be viewed as the inverse of FPS (frames per second), representing the time it takes to prepare one frame to be displayed, lower values being better. The two main threads in Unreal Engine are the gaming thread and rendering thread \u2013 this work focuses mostly on the gaming thread performance.<\/p>\n<p>There are four builds being tested, all based on the default Unreal Engine settings, which use <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/support-for-unity-jumbo-files-in-visual-studio-2017-15-8-experimental\/\">unity (jumbo) builds<\/a> and have \/fp:fast \/favor:AMD64 enabled. Note that the AVX2 instruction set is being used, except for one build that keeps the default AVX:<\/p>\n<ul>\n<li>VS 2017 (15.9) with \/arch:AVX2<\/li>\n<li>VS 2019 (16.0) with \/arch:AVX2<\/li>\n<li>VS 2019 (16.0) with \/arch:AVX2 and \/LTCG, to showcase the benefit\nof using <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build\/reference\/gl-whole-program-optimization?view=vs-2017\">link time code generation<\/a><\/li>\n<li>VS 2019 (16.0) with \/arch:AVX, to showcase the benefit of using AVX2 over AVX<\/li>\n<\/ul>\n<p><strong>Testing details:<\/strong><\/p>\n<ul>\n<li>To capture frame times, a custom <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/desktop\/etw\/event-tracing-portal\">ETW<\/a> provider was integrated into the game to report the values to <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/test\/wpt\/\">Xperf<\/a> running in the background. Each build of the game has one warm-up run, then 10 runs of the entire game with ETW tracing enabled. The final frame time is computed, for each 0.5 second interval, as the average of these 10 runs. The process is automated by a script that starts the game once and after each iteration restarts the level from the beginning. Out of the 210 seconds (3:30m) long demo, the first 170 seconds are captured.<\/li>\n<li>Test PC configuration:\n<ul>\n<li>AMD Ryzen 2700x CPU (8 cores\/16 threads) fixed at 3.4Ghz to eliminate potential noise in the measurements from dynamic frequency scaling<\/li>\n<li>AMD Radeon RX 470 GPU<\/li>\n<li>32 GB DDR4-2400 RAM<\/li>\n<li>Windows 10 1809<\/li>\n<\/ul>\n<\/li>\n<li>The game runs at a resolution of 640&#215;480 to reduce the impact the GPU rendering has<\/li>\n<\/ul>\n<p><strong>Results:<\/strong><\/p>\n<p>The chart below shows the measured frame times up to second 170 for the four tested builds of the game. Frame time ranges from 4ms to 15ms in the more graphic intensive part around seconds <a href=\"https:\/\/youtu.be\/dO2rM-l-vdQ?t=165\">155-165<\/a>. To make the difference between builds more obvious, the \u201cfastest\u201d and \u201cslowest\u201d sections are zoomed in. As mentioned before, a lower frame time value is better.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-23979\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-1.png\" alt=\"Graph showing the frame time over the duration of the game\" width=\"1675\" height=\"572\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-1.png 1675w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-1-300x102.png 300w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-1-768x262.png 768w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-1-1024x350.png 1024w\" sizes=\"(max-width: 1675px) 100vw, 1675px\" \/><\/p>\n<p>The following table summarizes the results, both as an average over the entire game and by focusing on the \u201cslow\u201d section, where the largest improvement can be seen:<\/p>\n<table style=\"width: 564px;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 100.35px;\">\n<p style=\"text-align: center;\"><strong>Improvement<\/strong><\/p>\n<\/td>\n<td style=\"width: 155.55px; text-align: center;\"><b>VS 2019 AVX2 <\/b>\n<strong>vs. VS 2017 AVX2<\/strong><\/td>\n<td style=\"width: 167.6px; text-align: center;\"><b>VS 2019 LTCG AVX2 <\/b>\n<strong>vs. VS 2019 AVX2<\/strong><\/td>\n<td style=\"width: 139.5px; text-align: center;\"><b>VS 2019 AVX <\/b>\n<strong>vs. VS 2019 AVX2<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 100.35px;\">\n<p style=\"text-align: center;\">Average<\/p>\n<\/td>\n<td style=\"width: 155.55px;\">\n<p style=\"text-align: center;\"><strong><span style=\"color: #008000;\">0.7%<\/span><\/strong><\/p>\n<\/td>\n<td style=\"width: 167.6px;\">\n<p style=\"text-align: center;\"><span style=\"color: #008000;\">0.9%<\/span><\/p>\n<\/td>\n<td style=\"width: 139.5px;\">\n<p style=\"text-align: center;\"><span style=\"color: #800000;\">-1.8%<\/span><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 100.35px; text-align: center;\">Largest<\/td>\n<td style=\"width: 155.55px; text-align: center;\"><strong><span style=\"color: #008000;\">2.8%<\/span><\/strong><\/td>\n<td style=\"width: 167.6px; text-align: center;\"><span style=\"color: #008000;\">3.2%<\/span><\/td>\n<td style=\"width: 139.5px;\">\n<p style=\"text-align: center;\"><span style=\"color: #800000;\">-8.5%<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<ul>\n<li>VS 2019 improves frame time up to 2.8% over VS 2017<\/li>\n<li>An LTCG build improves frame time up to 3.2% compared to the default unity build<\/li>\n<li>Using AVX2 over AVX shows a significant frame time improvement, up to 8.5%, in large part a result of the compiler automatically generating FMA instructions for scalar, and now in 16.0, vector operations.<\/li>\n<\/ul>\n<p>The performance in different parts of the game can be seen easier by computing the speedup of one build relative to another, as a percentage. The following charts show the results when comparing the frame times for the 16.0\/15.9 and AVX\/AVX2 builds &#8211; the X axis is the time in the game, Y axis is the frame time improvement percentage: <img decoding=\"async\" class=\"alignnone wp-image-23986\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-8.png\" alt=\"Image showing the improvement between 16.0 and 15.9\" width=\"1653\" height=\"611\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-8.png 1653w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-8-300x111.png 300w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-8-768x284.png 768w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-8-1024x379.png 1024w\" sizes=\"(max-width: 1653px) 100vw, 1653px\" \/><\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-23988\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-10.png\" alt=\"Image showing the improvement between 16.0 AVX2 and 16.0 AVX\" width=\"1651\" height=\"607\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-10.png 1651w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-10-300x110.png 300w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-10-768x282.png 768w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2019\/03\/word-image-10-1024x376.png 1024w\" sizes=\"(max-width: 1651px) 100vw, 1651px\" \/><\/p>\n<h5>More optimizations<\/h5>\n<p>Besides the vector instruction optimizations, VS 2019 has several new optimizations that help both games and C++ programs in general:<\/p>\n<ul>\n<li>Useless struct\/class copies are being removed in several more cases, including copies to output parameters and functions returning an object. This optimization is especially effective in C++ programs that pass objects by value.<\/li>\n<li>Added a more powerful analysis for extracting information about variables from control flow (if\/else\/switch statements), used to remove branches that can be proven to be always true or false and to improve the variable range estimation.<\/li>\n<li>Unrolled, constant-length memsets will now use 16-byte store instructions (or 32 byte for \/arch:AVX).<\/li>\n<li>Several new scalar FMA patterns are identified with \/arch:AVX2. These include the following common expressions: (x + 1.0) * y; (x \u2013 1.0) * y; (1.0 \u2013 x) * y; (-1.0 \u2013 x) * y.<\/li>\n<li>A more comprehensive list of backend improvements can be found in this <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/msvc-backend-updates-in-visual-studio-2019-preview-2\/\">blog post<\/a>.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>We\u2019d love for you to\u00a0<a href=\"https:\/\/visualstudio.microsoft.com\/vs\/preview\/\">download Visual Studio 2019<\/a>\u00a0and give it a try. As always, we welcome your feedback. We can be reached via the comments below or via email (<a href=\"mailto:visualcpp@microsoft.com\">visualcpp@microsoft.com<\/a>). If you encounter problems with Visual Studio or MSVC, or have a suggestion for us, please let us know through\u00a0<strong>Help &gt; Send Feedback &gt; Report A Problem \/ Provide a Suggestion<\/strong>\u00a0in the product, or via\u00a0<a href=\"http:\/\/developercommunity.visualstudio.com\/\">Developer Community<\/a>. You can also find us on Twitter (<a href=\"https:\/\/twitter.com\/visualc\">@VisualC<\/a>) and Facebook (msftvisualcpp).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The C++ compiler in Visual Studio 2019 includes several new optimizations and improvements geared towards increasing the performance of games and making game developers more productive by reducing the compilation time of large projects. Although the focus of this blog post is on the game industry, these improvements apply to most C++ applications and C++ [&hellip;]<\/p>\n","protected":false},"author":318,"featured_media":23984,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3946,218],"tags":[],"class_list":["post-23977","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-backend","category-performance"],"acf":[],"blog_post_summary":"<p>The C++ compiler in Visual Studio 2019 includes several new optimizations and improvements geared towards increasing the performance of games and making game developers more productive by reducing the compilation time of large projects. Although the focus of this blog post is on the game industry, these improvements apply to most C++ applications and C++ [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/23977","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/318"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=23977"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/23977\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/23984"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=23977"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=23977"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=23977"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}