JCC Erratum Mitigation in MSVC

Avatar

Gautham

The content of this blog was provided by Gautham Beeraka from Intel Corporation.

Intel recently announced Jump Conditional Code (JCC) Erratum which can occur in some of its processors. The MSVC team has been working with Intel to provide a software fix in the compiler to mitigate the performance impact of the microcode update that prevents the erratum.

Introduction

There are three things one should know about JCC erratum:

  1. What the erratum is, if and how it affects you.
  2. Microcode update which prevents the erratum, if you have it and its side effects.
  3. MSVC compiler support to mitigate the side effects of the microcode update.

Each of these topics are explained below.

JCC Erratum

The processors listed in Intel’s white paper referenced above have an erratum which can occur under certain conditions that involve jump instructions overlaying a cache-line boundary. This erratum can result in unpredictable behavior for the software running on these processors. If your software runs on these processors, you are affected by this erratum.

Microcode Update

Applying a microcode update (MCU) can prevent JCC erratum. The MCU works by preventing the jump instructions that overlay or end on 32-byte boundary as shown in the figure below from being cached in the decoded uop cache. The MCU affects conditional jumps, macro-fused conditional jumps, direct unconditional jump, indirect jump, direct/indirect call and return.

Examples of instructions which straddle 32-bit alignment

The MCU will be distributed through Windows Update. We will update this blog once we have more information on the Windows Update. Note that the MCU is not specific to Windows and applies to other operating systems also.

Applying the MCU can regress performance of software running on the patched machines. Based on our measurements, we see an impact between 0-3%. The impact was higher on a few outlier microbenchmarks.

Software Mitigation in MSVC compiler

To mitigate the performance impact, developers can build their code with the software fix enabled by /QIntel-jcc-erratum switch in MSVC compiler. We observed that the performance regressions become negligible after rebuilding with this fix. The switch can increase code size which was about 3% based on our measurements.

How to enable the software mitigation?

Starting from Visual Studio 2019 version 16.5 Preview 2, developers can apply the software mitigation for the performance impact of the MCU. To enable software mitigation for JCC erratum for your code, simply select “Yes” under the “Code Generation” section of the project Property Pages:

Screenshot of the Enable Intel JCC Erratum Mitigation in the property pages

A few undocumented compiler flags are also available to restrict the scope of the software mitigation as shown below. These flags can be useful to experiment with, but we are not committed to service them in future releases.

  1. /d2QIntel-jcc-erratum-partial – This applies the mitigation only inside loops in a function.
  2. /d2QIntel-jcc-erratum:<file.txt> – This applies the mitigation only to functions specified within file.txt.
  3. /d2QIntel-jcc-erratum-partial:<file.txt> – This applies the mitigation only to loops in the functions specified within file.txt.

The function names given in <file.txt> are the decorated function names as used by the compiler.

To enable these flags, add them to the “Additional Options” under the “Command Line” section of the project Property Pages:

Screenshot of adding /d2Qintel-jecc-erratum-partial to the additional compiler flags

All these switches work only in release builds and are incompatible with /clr switches. In the event multiple /d2QIntel-jcc-erratum* switches have been given, full processing (all branches) is favored over partial (loop branches only) processing. If any of the switches specifies a functions file, the processing is limited to just those functions.

What does the software mitigation do?

The software mitigation in the compiler detects all affected jumps in the code (the jumps that overlay or end at 32-byte boundary) and aligns them to start at this boundary. This is done by adding benign segment override prefixes to the instructions before the jump. The size of the resultant instructions increases but is less than 15 bytes. In situations where prefixes cannot be added, NOPs are used. The example below shows how the compiler generates code when the mitigation is on and off.

Sample C++ code:

for (int i = 0; i < length; i++) {
		sum += arr[i] + c;
}

Code without /QIntel-jcc-erratum

(/O2 /FAsc)

Code with /QIntel-jcc-erratum

(/O2 /FAsc /QIntel-jcc-erratum)

$LL8@test1:

00010 44 8b 0c 91    mov r9d, DWORD PTR [rcx+rdx*4]

00014 48 ff c2           inc rdx

00017 45 03 c8         add r9d, r8d

0001a 41 03 c1         add eax, r9d

0001d 49 3b d2         cmp rdx, r10

00020 7c ee              jl SHORT $LL8@test1

$LL8@test1:

00010 3e 3e 3e 44 8b 0c 91    mov  r9d, DWORD PTR [rcx+rdx*4]

00017 48 ff c2                          inc  rdx

0001a 45 03 c8                        add  r9d, r8d

0001d 41 03 c1                        add  eax, r9d

00020 49 3b d2                        cmp  rdx, r10

00023 7c eb                             jl   SHORT $LL8@test1

 

In the example above, the CMP and JL instructions are macro-fused and overlay a 32-byte boundary. The mitigation pads the first instruction in the block, the MOV instruction with 0x3E prefix to align the CMP instruction to begin on a 32-byte boundary.

What is the performance story?

We did evaluate the performance impact of the MCU and fix in the MSVC compiler. The numbers stated below use the following test PC configuration.

Processor – Intel® Core™ i9 9900K @ 3.60GHz

Operating System – Private build of Windows with the MCU applicable to this processor.

Benchmark suite – SPEC CPU® 2017

Based on our measurements, we see regressions ranging from 0-3% after applying the MCU. We also saw regressions going up to 10% on some outlier microbenchmarks.

Applying the software mitigation through the /QIntel-jcc-erratum switch in MSVC compiler makes the regressions negligible. This switch applies the mitigation globally to all modules built with it and increases code size. We measured an average of 3% code size bloat.

We measured that applying the mitigation only in loops through the /d2QIntel-jcc-erratum-partial switch also makes the performance regressions negligible but with lesser code size increase. We measured an average of 1.5% code size bloat with the partial mitigation. You can further reduce the code size impact and get most of the performance back by applying the mitigations only to hot functions through the /d2QIntel-jcc-erratum:<file.txt> and /d2QIntel-jcc-erratum-partial:<file.txt> switches.

We also measured that the performance impact of /QIntel-jcc-erratum switch on processors that are not affected by the erratum is negligible. However, as codebases vary greatly, we advise developers to evaluate the impact of /QIntel-jcc-erratum in the context of their applications and workloads.

Closing Notes

If your software can run on the machines with processors affected by the JCC erratum and versions of Windows with the MCU, we encourage you to profile your code and check for performance regressions. You can use Windows Performance Toolkit or Intel® VTune ™ Profiler to profile your code. You can detect if the MCU is affecting performance by following steps in Intel’s white paper. If you are affected, recompile with /QIntel-jcc-erratum or other switches listed above to mitigate the effects.

Your feedback is key to deliver the best experience. If you have any questions, please feel free to ask us below. You can also send us your comments through e-mail. If you encounter problems with the experience or have suggestions for improvement, please Report A Problem or reach out via Developer Community. You can also find us on Twitter @VisualC.

 

2 comments

Comments are closed. Login to edit/delete your existing comments

    • Avatar
      Me Gusta

      There is an exhaustive list of families in the PDF file linked at the start.
      If you want to have a general idea, then the processors that are based upon Skylake or one of the refreshes are what are affected. So if the processor is based on a microarchitecture with lake in the name then it is most likely affected.