October 26th, 2020

I told the Microsoft Visual C++ compiler not to generate AVX instructions, but it did it anyway!

A customer passed the /arch:SSE2 flag to the Microsoft Visual C++ compiler, which means “Enable use of instructions available with SSE2-enabled CPUs.” In particular, the customer did not pass the /arch:SSE4 flag,¹ so they did not enable the use of SSE4 instructions.

And then they did this:

#include <mmintrin.h>

void something()
{
    __m128i v = _mm_load_si128(&mem);
    ... more SSE2 stuff ...
    v = _mm_insert_epi32(v, alpha, 3);
    ... more SSE2 stuff ...
}

The _mm_insert_epi32() intrinsic maps to the PINSRD instruction, which is an SSE4 instruction, not SSE2.

To the customer’s surprise, this code not only compiled, it even ran! The customer wanted to know what is happening. Did the compiler convert the _mm_insert_epi32() into an equivalent series of SSE2 instructions?

No, the compiler didn’t do that. You explicitly requested an SSE4 instruction, so the compiler honored your request. The /arch:SSE2 flag tells the compiler not to use any instructions beyond SSE2 in its own code generation, say during autovectorization or optimized memcpy. But if you invoke it explicitly, then you get what you wrote.

I guess the option could be more accurately (and verbosely) named “Enable automatic use of instructions available with SSE2-enabled CPUs.” Because what this controls is whether the compiler will use those instructions of its own volition.

The customer happened to test their program on a CPU that supported SSE4, so the instruction worked. If they had run it on a a CPU that supported SSE2 but not SSE4, it would have crashed.

The reason SSE4 intrinsics are still allowed even in SSE2 mode is that you might have identified some performance-sensitive operations and written two versions of the code, one that uses SSE2 intrinsics, and another that uses SSE4 intrinsics, choosing between the two at runtime based on a processor capability check.

The compiler won’t generate any SSE4 instructions on its own, so your code is safe on SSE2 systems. When you detect an SSE4 system, you can explicitly call the SSE4 code paths.

¹ As commenter Danielix Klimax noted, there is no actual /arch:SSE4 option. Please interpret the remark in the spirit it was intended. (“The custom did not pass any flags that would enable SSE4 instructions.”)

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Discussion is closed. Login to edit/delete existing comments.

  • Michael Entin

    Is there some macro to define before including to only declare prototypes of intrinsics available for specific technology?
    Customer desire to have this compile-time checked is very understandable, and it should be simple to define such macro, and sprinkle with conditional preprocessor to only enable appropriate intrinsics.

  • ‪ ‪

    GCC and Clang behave as customers expect.
    Well, I prefer the behavior of MSVC.

    • Vk TestA

      Why is it preferable? I think in this particular case “explicit is better than implicit”. And it’s not hard to mark the functions where you really need something not supported in the mode selected by command-line switch with [[gnu::target(“sse4.1”)]]

      • Danielix Klimax

        Hm, how does it handle non-SSE instructions (3DNow and XOP/FMA4, (V)AES, SHA and other special cases).

  • Danielix Klimax

    A thing: There is no /arch: SSE4. Compiler only supports IA32, SSE, SSE2, AVX, AVX2 and AVX512. (First three are valid only in x86 compilation) Meaning that customer wouldn’t be able at all to use SSE3, SSSE3 and SSE4.x if arch flag worked the way they thought…

  • Yukkuri Reimu

    It surprises me that people need this explained

    • Vk TestA

      People need this explained because it's surprising. Consider sane compiler, e.g. clang: https://godbolt.org/z/6K57Tj

      [[gnu::target("default")]] void something(int alpha)
      {
      __m128i v = _mm_load_si128(&mem);
      // ... SSE2 version ...
      }

      [[gnu::target("sse4.1")]] void something(int alpha)
      {
      __m128i v = _mm_load_si128(&mem);
      // using SSE4.1 is OK here.
      mem = _mm_insert_epi32(v, alpha, 3);
      }

      void test_something() {
      // The right function...

      Read more