{"id":4203,"date":"2009-11-02T10:56:00","date_gmt":"2009-11-02T10:56:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2009\/11\/02\/visual-c-code-generation-in-visual-studio-2010\/"},"modified":"2019-02-18T18:45:45","modified_gmt":"2019-02-18T18:45:45","slug":"visual-c-code-generation-in-visual-studio-2010","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/visual-c-code-generation-in-visual-studio-2010\/","title":{"rendered":"Visual C++ Code Generation in Visual Studio 2010"},"content":{"rendered":"<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Hello, I&rsquo;m Ten Tzen, a Compiler Architect on the Visual C++ Compiler Code Generation team. Today, I&rsquo;m going to introduce some noteworthy improvements in Visual Studio 2010. <\/p>\n<p><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b>Faster LTCG Compilation<\/b>:&nbsp; LTCG (Link Time Code Generation) <span>allows the compiler to perform better optimizations with information on all modules in the program (for more details s<\/span>ee <\/font><\/font><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/0zza0de8(VS.100).aspx\"><font face=\"Calibri\" color=\"#0000ff\" size=\"3\">here<\/font><\/a><font size=\"3\"><font face=\"Calibri\">).&nbsp; To merge information from all modules, LTCG compilation generally takes longer than non-LTCG compilation, particularly for large applications.&nbsp; In VS2010, we improved the information merging process and sped up LTCG compilation significantly. An LTCG build of Microsoft SQL Server (an application with .text size greater than 50MB) is sped up by ~30%. <\/p>\n<p><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b>Faster Pogo Instrumentation run<\/b>:&nbsp; Profile Guided Optimization (PGO) is an approach to optimization where the compiler uses profile information to make better optimization decisions for the program. &nbsp;See <\/font><\/font><a href=\"http:\/\/blogs.msdn.com\/vcblog\/archive\/2008\/11\/12\/pogo.aspx\"><font face=\"Calibri\" size=\"3\">here<\/font><\/a><font face=\"Calibri\" size=\"3\"> or <\/font><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/aa289170(VS.71).aspx\"><font face=\"Calibri\" color=\"#0000ff\" size=\"3\">here<\/font><\/a><font size=\"3\"><font face=\"Calibri\"> for an introduction of PGO.&nbsp; One major drawback of PGO is that the instrumented run is usually several times slower than a regular optimized run.&nbsp; In VS2010, we s<span>upport a no-lock version of the instrumented binaries.&nbsp; With that the scenario (PGI) runs are about 1.7X faster.&nbsp;<\/span><\/p>\n<p><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b><span>Code size reduction for X64 target: <\/span><\/b><span>Code size is a crucial factor to performance especially for applications that are performance-sensitive to the behavior of instruction cache or working set.&nbsp; In VS2010, several effective optimizations are introduced or improved for X64 architecture. Some of the improvements are listed below:<\/p>\n<p><\/span><\/font><\/font><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">More aggressively use RBP as the frame pointer to access local variables. RBP-relative address mode is one byte shorter than RSP-relative. <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font face=\"Calibri\" size=\"3\">Enable tail merge optimizations with the presence of C++ EH or Windows SEH (see <\/font><\/span><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/1deeycx5(VS.100).aspx\"><font face=\"Calibri\" color=\"#0000ff\" size=\"3\">here<\/font><\/a><span><font face=\"Calibri\" size=\"3\"> and <\/font><\/span><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms680657(VS.85).aspx\"><font face=\"Calibri\" color=\"#0000ff\" size=\"3\">here<\/font><\/a><span><font size=\"3\"><font face=\"Calibri\"> for EH or SEH).<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Combine successive constant stores to one store.&nbsp; <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Recognize more cases where we can emit 32-bit instruction for 64-bit immediate constants. <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Recognize more cases where we can use a 32-bit move instead of a 64-bit move. <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Optimize the code sequence of C++ EH destructor funclets.<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoNormal\"><span><font size=\"3\"><font face=\"Calibri\">Altogether, we have observed code size reduction in the range of 3% to 10% with various Microsoft products such as the Windows kernel components, SQL, Excel, etc.<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b><span>Improvements for &ldquo;Speed&rdquo;:&nbsp; <\/span><\/b>As usual, there are also many code quality tuning and improvements done across different code generation areas for &ldquo;speed&rsquo;.&nbsp; In this release, we have focuse\nd more on the X64 target.&nbsp; The following are s<span>ome of the important changes that have contributed to these improvements:<\/p>\n<p><\/span><\/font><\/font><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Identify and use CMOV instruction when beneficial in more situations<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">More effectively combine induction variable to reduce register pressure<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Improve detection of region constants for strength reduction in a loop<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Improve scalar replacement optimization in a loop<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Improvement of avoiding store forwarding stall <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Use XMM registers for memcpy intrinsic<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Improve Inliner heuristics to identify and make more beneficial inlining decisions <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Overall, we see an 8% i<span>mprovement as measured by integer benchmarks and a few % points on the floating point suites for X64. <\/span>&nbsp;<\/p>\n<p><\/font><\/font><\/p>\n<p class=\"MsoNormal\"><b><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/b><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b><span>Better SIMD code generation for X86 and X64 targets<\/span><\/b><span>:&nbsp; The quality of SSE\/SSE2 SIMD code is crucial to game, audio, video and graphic developers.&nbsp; Unlike inline asm which inhibits compiler optimization of surrounding code, intrinsics were designed to allow more effective optimization and still give developers access to low-level control of the machine.&nbsp; In VS2010, we have added several simple but effective optimizations that focus on SIMD intrinsic quality and performance.&nbsp; Some of the improvements are listed below:<\/p>\n<p><\/span><\/font><\/font><\/p>\n<p class=\"MsoListParagraph\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Break false dependency:&nbsp; The scalar convert instructions (CVTSI2SD, CVTSI2SS, CVTSS2SD, or CVTSD2SS) do not modify the upper bits of the destination register. This causes a false dependency which could significantly affect performance. To break the false dependence of memory to register conversions, VS2010 compiler inserts MOVD\/MOVSS\/MOVSD to zero-out the upper bits and use the corresponding packed conversion.&nbsp; For instance, <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoNormal\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><span><font face=\"Calibri\">cvtsi2ss xmm0, mem-operand&nbsp;&nbsp; <\/font><\/span><span>&agrave;<\/span><\/font><font size=\"3\"><span><font face=\"Calibri\"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; movd xmm0, mem-operand<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cvtdq2ps xmm0, xmm0<\/p>\n<p>For register to register conversions, XORPS is inserted to break the false dependency.<\/p>\n<p>cvtsd2ss xmm1, xmm0&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/font><\/span><span>&agrave;<\/span><\/font><span><font size=\"3\"><font face=\"Calibri\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; xorps xmm1, xmm1<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&amp;nbsp\n;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cvtsd2ss xmm1, xmm0 <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoNormal\"><span><font size=\"3\"><font face=\"Calibri\">Even though this optimization may increase code size we have observed a significant positive performance improvement on several real world code and benchmark programs.&nbsp; <\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoNormal\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><font size=\"3\"><span><font face=\"Calibri\">Perform vectorization for constant vector initializations: In VS2008, a simple initialization statement, such as<\/font><\/span><b><span> <\/span><\/b><span>__m128 x = { 1, 2, 3, 4 },<\/span><span> <\/span><span><font face=\"Calibri\">would require ~10 instructions. With<\/font><\/span><span> <\/span><span><font face=\"Calibri\">VS2010, it&rsquo;s optimized down to a couple of instructions.&nbsp; This can apply to dimensional initialization as well.&nbsp; The instructions generated for initialization statements like <\/font><\/span><span>__m128 x[] = {{1,2,3,4}, {5,6}} or __m128 t2[][2]= {{{1,2},{3,4,5}}, {{6},{7,8,9}}}; <\/span><span><font face=\"Calibri\">&nbsp;are greatly reduced with VS2010.&nbsp; <\/p>\n<p><\/font><\/span><\/font><\/p>\n<p class=\"MsoNormal\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><font size=\"3\"><span><font face=\"Calibri\">Optimize __mm_set_**(), __mm_setr_**() and __mm_set1_**() intrinsic family.&nbsp; In VS2008, a series of unpack instructions are used to do the combining of scalar values. When all arguments are constants, this can be achieved with a single vector instruction.&nbsp; For example, the single statement, <\/font><\/span><span>return _mm_set_epi16(0, 1, 2, 3, -4, -5, 6, 7)<\/span><span><font face=\"Calibri\">, would require ~20 instructions to implement in previous releases while it&rsquo;s only one instruction is required in&nbsp; VS2010.&nbsp; <\/p>\n<p><\/font><\/span><\/font><\/p>\n<p class=\"MsoNormal\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Better register allocation for XMM registers thus removing many redundant loads, stores and moves.<span><\/p>\n<p><\/span><\/font><\/font><\/p>\n<p class=\"MsoListParagraph\"><span><span><font size=\"3\">&middot;<\/font><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><\/span><\/span><span><font size=\"3\"><font face=\"Calibri\">Enable Compare &amp; JCC CSE (Common Sub-expression Elimination) for SSE compares.&nbsp; For example, the code sequence below at left will be optimized to the code sequence at right:<\/p>\n<p><\/font><\/font><\/span><\/p>\n<p class=\"MsoListParagraph\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoListParagraph\"><font size=\"3\"><span><font face=\"Calibri\">ECX, CC1 = PCMPISTRI &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ECX, CC1 = PCMPISTRI<br \/>JCC(EQ) CC1 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; JCC(EQ) CC1<br \/>ECX, CC2 = PCMPISTRI &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/font><\/span><span>&agrave;<\/span><\/font><font size=\"3\"><font face=\"Calibri\"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; JCC(ULT) CC2 <br \/>JCC(ULT) CC2 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; JCC(P) CC3 <br \/>ECX, CC3 = PCMPISTRI<br \/>JCC(P) CC3<\/span><span><\/p>\n<p><\/span><\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b>Support for AVX in Intel and AMD processors:&nbsp;&nbsp; <\/b>Intel AVX (Intel Advanced Vector Extensions) is a 256 bit instruction set extension to SSE and is designed for applications that are floating point intensive (See <\/font><\/font><a href=\"http:\/\/software.intel.com\/en-us\/avx\/\"><font face=\"Calibri\" color=\"#0000ff\" size=\"3\">here<\/font><\/a><font face=\"Calibri\" size=\"3\"> and <\/font><a href=\"http:\/\/forums.amd.com\/devblog\/blogpost.cfm?threadid=112934&amp;catid=208\"><font face=\"Calibri\" color=\"#0000ff\" size=\"3\">here<\/font><\/a><font size=\"3\"><font face=\"Calibri\"> for detailed information from Intel and AMD respectively).&nbsp; In VS2010 release, all AVX features and instructions are fully supported via intrinsic and \/arch:AVX.&nbsp; Many optimizations have been added to improve the code quality of AVX code generation which will be described with more details in an upcoming blog post.&nbsp;In addition to AVX support in the compiler, the Microsoft Macro Assembler (MASM) in VS2010 also supports the Intel AVX instruction set for x86 and x64.<\/p>\n<p><\/font><\/font><\/p>\n<p class=\"MsoNormal\"><a class=\"\" name=\"_GoBack\"><\/a><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNorm\nal\"><b><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/b><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b>More precise Floating Point computation with \/fp:fast: <\/b>To achieve maximum speed, the compiler is allowed to optimize floating point computation aggressively under <\/font><\/font><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/e7s85ffb(VS.100).aspx\"><font face=\"Calibri\" color=\"#0000ff\" size=\"3\">\/fp:fast option<\/font><\/a><font size=\"3\"><font face=\"Calibri\">.&nbsp; The consequence is that the floating point computation errors can accumulate and a result could be so inaccurate that it could severely affect the outcome of programs.&nbsp; For example, we observed that more than half of the programs in the floating points benchmark suite fail with \/fp:fast in VS2008 on the X64 targets.&nbsp; In order to make \/fp:fast more useful, we &ldquo;down-tuned&rdquo; a couple of optimizations in VS2010. This change could slightly affect the performance of some programs that were previously built with \/fp:fast but will improve their accuracy.&nbsp; And if your programs were failing with \/fp:fast in earlier releases, you may see better results with VS2010.<\/p>\n<p><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\"><b>Conclusion<\/b>: The Visual C++ team cares about the performance of applications built with our compiler and we continue to work with customers and CPU vendors to improve code generation. If you see issues or opportunities for improvements, please let us know though <\/font><\/font><a href=\"http:\/\/connect.microsoft.com\/\"><font face=\"Calibri\" size=\"3\">Connect<\/font><\/a><font size=\"3\"><font face=\"Calibri\"> or through our blog.<\/p>\n<p><\/font><\/font><\/p>\n<p class=\"MsoNormal\"><b><span><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/span><\/b><\/p>\n<p class=\"MsoNormal\"><b><\/p>\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<p><\/b><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello, I&rsquo;m Ten Tzen, a Compiler Architect on the Visual C++ Compiler Code Generation team. Today, I&rsquo;m going to introduce some noteworthy improvements in Visual Studio 2010. &nbsp; Faster LTCG Compilation:&nbsp; LTCG (Link Time Code Generation) allows the compiler to perform better optimizations with information on all modules in the program (for more details see [&hellip;]<\/p>\n","protected":false},"author":289,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[17,18,19,20,21],"class_list":["post-4203","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus","tag-code-generation","tag-floating-point","tag-link-time-code-generation","tag-pgo","tag-simd"],"acf":[],"blog_post_summary":"<p>Hello, I&rsquo;m Ten Tzen, a Compiler Architect on the Visual C++ Compiler Code Generation team. Today, I&rsquo;m going to introduce some noteworthy improvements in Visual Studio 2010. &nbsp; Faster LTCG Compilation:&nbsp; LTCG (Link Time Code Generation) allows the compiler to perform better optimizations with information on all modules in the program (for more details see [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/4203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/289"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=4203"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/4203\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=4203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=4203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=4203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}