September 4th, 2024

MSVC Backend Updates in Visual Studio 2022 version 17.11

Sarthak Tamboli
Software Engineer

Visual Studio 2022 17.11 brings new optimizations, intrinsics, features, and improvements to the MSVC backend. Check out the highlights below: 

  • Performance improvements and additional functionality for all architectures: 
    • The SLP vectorizer can now recognize when vectors need to be permuted and when elements of a vector are defined by different operations. 
    • Updated corruption handling in DIA. 
    • Fixed race conditions in linker while emitting debug information. 
    • Added a new dead code elimination phase to reduce binary bloat. 
    • Added a strength reduction phase to replace expensive instructions which equivalent cheaper instructions in the lexical loop optimizer. 
    • Improved /guard:cf compilation speed for many try-catch regions. 
    • Process clang-cl OBJ files faster 
  • Performance improvements and additional functionality for ARM64: 
    • Fixed pointer parameter alignment during conversion of by-value parameters to by-reference. 
    • Support popcnt instructions, __popcnt, popcnt16, and popcnt64, thanks to our friends at ARM. 
    • The vectorizer now recognizes and uses ARM64 intrinsics, when possible. 
    • Generate better instructions for VCREATE using FMOV, thanks to our friends at ARM. 
    • Add RBIT instrinics, _bitrev and _bitrev64. 
    • Support expanding memcmp when comparison length is constant, thanks to our friends at ARM. 
    • Adds support for Load-Acquire RCpc instructions v2 (FEAT_LRCPC2), /feature:rcpc2 
    • Add support for CMP, XZR, and Xm in disassembler, thanks to our friends at ARM. 
    • Generate better code for bitfield selection and assignment, thanks to our friends at ARM. 
  • Improvements to ARM64EC: 
    • Considers hybrid-patchable functions during long-branch optimizations /OPT:LBR. 
    • Now emits virtual functions with __declspec(hybrid_patchable). 
    • Fixed imports with fast-forward sequences. 
    • Thunk generation in C++ files, using _Arm64XGenerateThunk(), does not require a prototype. Still requires #include <intrin.h>. 
    • Fixed pointer parameter alignment during conversion of by-value parameter to by-reference. 
    • Sped up delay loading a DLL by enhancing page protection.
  • Performance improvements and additional functionality for x86 and x64: 
    • Optimized vector reduction for E-Core CPUs, thanks to our friends at Intel. 
    • Fix float conversions on x86 for standard compliance, thanks to our friends at Intel. 
    • More loops get vectorized, thanks to our friends at AMD. 
    • New functionality on x64: 
      • Optimize FMA generation for blended code, removes /favor:ATOM restriction , thanks to our friends at Intel. 
      • Add support for USER_MSR intrinsics, urdmsr and uwrmsr, thanks to our friends at Intel. 
      • Add instructions in assembler and disassembler, thanks to our friends at Intel: 
        • Flexible Return and Event Delivery (FRED) includes ERETS and ERETU 
        • LKGS which is required for FRED 
      • Enable vector loop unroller to be more aggressive, as required, thanks to our friends at AMD. 
      • Add support for saturating add and subtract intrinsics:
        • _sat_add_i8 
        • _sat_add_i16 
        • _sat_add_i32 
        • _sat_add_i64 
        • _sat_add_u8 
        • _sat_add_u16 
        • _sat_add_u32 
        • _sat_add_u64 
        • _sat_sub_i8 
        • _sat_sub_i16 
        • _sat_sub_i32 
        • _sat_sub_i64 
        • _sat_sub_u8 
        • _sat_sub_u16 
        • _sat_sub_u32 
        • _sat_sub_u64 

Do you want to experience the new improvements in the C++ backend? Please download the latest Visual Studio 2022 and give it a try! Any feedback is welcome. We can be reached via the comments below, Developer Community, X  (@VisualC), or email at visualcpp@microsoft.com. 

Stay tuned for more information on updates to the latest Visual Studio. 

Category
BackendC++

Author

Sarthak Tamboli
Software Engineer

1 comment

Leave a comment

Newest
Newest
Popular
Oldest
  • 8618700198618 10 hours ago

    I noticed that “The SLP vectorizer can now recognize when vectors need to be permuted and when elements of a vector are defined by different operations.” It’s interesting to see these enhancements, but I’ve encountered an issue in version 11.10 where it optimizes a portion of my code, leading to a performance drop by a factor of five. I detected that this is due to the SLP optimization. Could you please advise if there is a way to verify that the optimized result is correct?

    Thank you very much for your help.

Feedback