{"id":104397,"date":"2020-10-26T07:00:00","date_gmt":"2020-10-26T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=104397"},"modified":"2020-10-26T10:20:17","modified_gmt":"2020-10-26T17:20:17","slug":"20201026-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20201026-00\/?p=104397","title":{"rendered":"I told the Microsoft Visual C++ compiler not to generate AVX instructions, but it did it anyway!"},"content":{"rendered":"<p>A customer passed the <tt>\/arch:SSE2<\/tt> flag to the Microsoft Visual C++ compiler, which means &#8220;Enable use of instructions available with SSE2-enabled CPUs.&#8221; In particular, the customer did <i>not<\/i> pass the <tt>\/arch:SSE4<\/tt> flag,\u00b9 so they did not enable the use of SSE4 instructions.<\/p>\n<p>And then they did this:<\/p>\n<pre>#include &lt;mmintrin.h&gt;\r\n\r\nvoid something()\r\n{\r\n    __m128i v = _mm_load_si128(&amp;mem);\r\n    ... more SSE2 stuff ...\r\n    v = _mm_insert_epi32(v, alpha, 3);\r\n    ... more SSE2 stuff ...\r\n}\r\n<\/pre>\n<p>The <code>_mm_insert_epi32()<\/code> intrinsic maps to the <code>PINSRD<\/code> instruction, which is an SSE4 instruction, not SSE2.<\/p>\n<p>To the customer&#8217;s surprise, this code not only compiled, it even ran! The customer wanted to know what is happening. Did the compiler convert the <code>_mm_insert_epi32()<\/code> into an equivalent series of SSE2 instructions?<\/p>\n<p>No, the compiler didn&#8217;t do that. You explicitly requested an SSE4 instruction, so the compiler honored your request. The <tt>\/arch:SSE2<\/tt> flag tells the compiler not to use any instructions beyond SSE2 in its own code generation, say during autovectorization or optimized <code>memcpy<\/code>. But if you invoke it explicitly, then you get what you wrote.<\/p>\n<p>I guess the option could be more accurately (and verbosely) named &#8220;Enable <i>automatic<\/i> use of instructions available with SSE2-enabled CPUs.&#8221; Because what this controls is whether the compiler will use those instructions of its own volition.<\/p>\n<p>The customer happened to test their program on a CPU that supported SSE4, so the instruction worked. If they had run it on a a CPU that supported SSE2 but not SSE4, it would have crashed.<\/p>\n<p>The reason SSE4 intrinsics are still allowed even in SSE2 mode is that you might have identified some performance-sensitive operations and written two versions of the code, one that uses SSE2 intrinsics, and another that uses SSE4 intrinsics, choosing between the two at runtime based on a processor capability check.<\/p>\n<p>The compiler won&#8217;t generate any SSE4 instructions on its own, so your code is safe on SSE2 systems. When you detect an SSE4 system, you can explicitly call the SSE4 code paths.<\/p>\n<p>\u00b9 <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20201026-00\/?p=104397#comment-137328\"> As commenter Danielix Klimax noted<\/a>, there is no actual <tt>\/arch:SSE4<\/tt> option. Please interpret the remark in the spirit it was intended. (&#8220;The custom did not pass any flags that would enable SSE4 instructions.&#8221;)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Well, you explicitly generate them.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-104397","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Well, you explicitly generate them.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/104397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=104397"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/104397\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=104397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=104397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=104397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}