AddressSanitizer continue_on_error

Jim Radigan

Visual Studio 17.6 comes with new functionality in the Address Sanitizer runtime which provides a new “checked build” for C and C++. This new runtime mode diagnoses and reports hidden memory safety errors, with zero false positives, as your app runs.

Introduction

C++ memory safety errors are a top concern for the industry. In Visual Studio 17.6, we deliver a new experimental Address Sanitizer feature: continue_on_error (COE). We’ll remove the experimental label in 17.8. You compile as before, by simply adding the compiler flag -fsanitize=address. With 17.6 you can enable the COE functionality by setting environment variables from the command line.

To stream unique memory safety errors to stdout(1) or stderr(2):

  • set ASAN_OPTIONS=continue_on_error=1
  • set ASAN_OPTIONS=continue_on_error=2

To stream to a log file of your choice:

  • set COE_LOG_FILE=your.file.log

When you opt into the new continue on error (COE) feature, your application automatically diagnoses and reports unique memory safety errors as it runs. At program exit, the runtime produces a final summary that follows the unique detailed reports normally produced by the Address Sanitizer.

The compiler instruments your binaries to work with the address sanitizer runtime to diagnose hidden memory safety errors. You can add the -fsanitize=address -Zi compiler flags and set the ASAN_OPTIONS or COE_LOG_FILE environment variable with values shown previously. You can then build and run your existing tests to exercise your code to find hidden memory-safety errors.

This new COE functionality provides a “checked build” for C and C++ that finds hidden memory safety errors with zero false positives.

Hidden memory safety errors

The source code in Figure 1 which follows creates a buffer overflow due to the off-by-one error in the loop exit test. The code should check for ii < sz, but instead checks for ii <= sz. When the example runs, it is secure by coincidence. That’s because of the over-allocation and alignment done by most C++ runtime implementations. When sz % 16 == 0, the final write to local corrupts data. In other cases only read/write to the “malloc slop” which is due to the C Runtime (CRT), padding allocations out to a 0 mod 16 aggregate boundary.

Errors will only be observable if the following page is unmapped, or upon some subsequent use of corrupted data. All other cases are silent in the Figure 1 example that follows.

#include <stdlib.h> 
char* func(char* buf, size_t sz) { 
    char* local = (char*)malloc(sz); 
    for (auto ii = 0; ii <= sz; ii++) { // bad loop exit test 
        local[ii] = ~buf[ii]; // Two memory safety errors 
    } 
    return local; 
} 

char original[10] = { 0,1,2,3,4,5,6,7,8,9 }; 

void main() {   
    char* inverted_buf= func(original,10); 
}

Figure 1

In this example, where the parameter sz is 10 and the original buffer is 10-bytes, there are two memory safety errors: one is an out-of-bounds load from buf and the other is an out-of-bounds store to local. With continue_on_error, you will see both errors in the summary, and the program will run to completion. Here’s the summary:

Terminal showing two memory safety issues being found

Figure 2

Note that continue_on_error reports two distinct errors that occur on the same source line. The first error reads memory at a global address in the .data section, and the other writes to memory allocated from the heap.

Description

The default Address Sanitizer runtime behavior terminates your application after reporting the first error encountered while running your program. It does not allow the “bad” machine instruction to execute. COE is a customer-requested change significantly different compared to the “one-n-done” behavior of the existing Address Sanitizer runtime. The new Address Sanitizer runtime diagnoses and reports errors, but then executes subsequent instructions.

The new COE functionality allows an application to continue running while reporting unique memory safety errors to a log file or to the command line. When enabled, COE tries to automatically return control back to the application after reporting each memory safety error, except for an access violation (AV) or failed memory allocation. With COE, you can compile and deploy an existing application into limited production to find memory safety issues while running for days (albeit slower).

By adding compiler flags and setting an environment variable, you can immediately improve correctness and security. Your existing tests will still pass but will also uncover hidden memory safety errors. The compiler option (-fsanitizer=address) and runtime environment flag can be used to introduce a new “shipping gate.” Subsequently COE can then be used with all your existing tests. The developer gets a simple, well-defined, pass/fail for shipping any C or C++ app on Windows.

Internally we have found that using this technology significantly reduces memory safety errors. If all your existing tests pass, but this new feature reports a memory safety error or a leak, don’t ship your new code or integrate it into a parent branch.

Example

#include <cstdio>
#include <string>

struct Base {
    //virtual ~Base() = default;
};

struct Derived : public Base {
    std::wstring Value = L"Leaked if Base destructor is not virtual!";
};

constexpr size_t PointDims = 3;
double pointsInGlobalData[PointDims] = { 1.0, 2.0, 3.0 };

int main() {

  pointsInGlobalData[3] = 3.0;

  for (int i = 0; i < 2; i++) {
    double pointOnStack[PointDims] = { 1.0, 2.0, 3.0 };
    pointOnStack[-1] = 3.0;
    pointOnStack[PointDims] = 0.0;

    double* pointOnHeap = new double[PointDims + 100000];
    pointOnHeap[-1] = 4.0;
    delete[] pointOnHeap;

    double* pointDouble = new double[PointDims] { 1.0, 2.0, 3.0 };
    pointDouble[PointDims] = 4.0; // overflow
    delete[] pointDouble;         // we continue
    Base* base = new Derived();
    delete base; // missing virtual destructor

    constexpr size_t buff_size = 128;
    char* buffer = new char[buff_size];
    std::memset(buffer, '\0', buff_size);
    std::memset(&buffer[buff_size - 28], '=', 30);
  }
  wprintf_s(L"Loop completed! \r\n");
}

Figure 3

With continue_on_error, the program in Figure 3 above, produces the summary in Figure 4. That summary is printed after streaming all unique detailed error reports which are produced using the existing default mode of the Address Sanitizer. The existing default mode is “one-n-done”. The previous Address Sanitizer only prints one detailed error report, and then exits your process. With continue_on_error, we continue to execute after various memory safety errors. This summary illustrates continuing after many memory safety errors:

Console showing 7 unique memory safety issues being found

Figure 4

Beneath the first red box at the top of Figure 4, there are three files sorted by error. This is followed by the file, function, and line displayed beneath the second box. The third box calls out eight unique errors, where “unique” is defined in terms of a hash function which uses call stacks and error descriptions. Use of the term unique is discussed in the next section.

The detailed error reports (omitted in the screen capture in Figure 4) are printed before this summary and contain shadow bytes with all the details for each error in the summary.

Unique

The uniqueness of an error (to limit streaming duplicates) is determined by an internal hash function that uses the type of error and the call stack(s) at the time of error. A detailed individual error report includes stack trace(s). Here are the two detailed errors in the “secure by coincidence” example in Figure 1. The call stacks and types of errors are used to internally create an internal C++ error object, which is then hashed to a unique integer. The global-buffer-overflow in Figure 5 only has one call stack and is a different type of error object from the heap-buffer-overflow in Figure 6, which has two call stacks.

Figure 5

Figure 6

The runtime will hash each occurrence of an error at runtime in order to prevent duplication. Consider a memory-safety error that’s executed 10000 times in a loop. The detailed error will be reported once, but its “hit count” of 1000 will be reported in the summary.

Not continuing.

Two examples where the continue on error feature cannot continue are:

  • Malloc is given an undefined argument, such as a negative number.
  • There’s an access violation while trying to read or write to memory that hasn’t been allocated, or to which it doesn’t have access.

Consider the following program which has an access violation because it tries to read from location 0x13:

#include <stdio.h>

void main()
{
    unsigned int* local_ptr = (unsigned int*) 0x13;
    printf("use of undefined address %p [%x]\n", local_ptr, *local_ptr);
}

Figure 7

On Windows 11, when the example in Figure 7 above, is compiled with -fsanitize=address -Zi, you’ll see the following error message in Figure 8, below.

CONTINUE CANCELLED - Deadly Signal. Shutting down.

Figure 8

We choose to “gracefully cancel” the attempt to continue from access violations that are not caught with a user’s structured exception handling.

Matching undefined behavior

We haven’t been able to do a complete audit that would allow us to “match” undefined behaviors for C and C++. The following example in Figure 9, makes this tangible. At the commented line in the following code example, code generation from the compiler and the runtime implementation are both different for _alloca when compiling with -fsanitize=address -Zi:

#include <cstdio>
#include <cstring>
#include <malloc.h>
#include <excpt.h>
#include <windows.h>

#define RET_FINISH 0
#define RET_STACK_EXCEPTION 1
#define RET_OTHER_EXCEPTION 2

int foo_redundant(unsigned long arg_var) {

  char *a;
  int ret = -1;

  __try
  {
    if ((arg_var+3) > arg_var) {
      // Call to alloca using parameter from main
      a = (char *) _alloca(arg_var); 
      memset(a, 0, 10);
    }
    ret = RET_FINISH;
  }
  __except(1)
  {
    ret = RET_OTHER_EXCEPTION;
    int i = GetExceptionCode();
    if (i == EXCEPTION_STACK_OVERFLOW) {
      ret = RET_STACK_EXCEPTION;
    }
  }  
  return ret;
}

void main(){ 
  int cnt = 0;
  if (foo_redundant(0xfffffff0) == RET_STACK_EXCEPTION)
    cnt++; //increment count of exceptions handled.

  if (cnt == 1)
    printf("pass\n");
  else
    printf("fail\n");
}

Figure 9

The previous example, in Figure 9, prints pass without -fsanitize=address. That’s because cnt==1 due to an exception. It will fail when compiled with that flag and run with the Address Sanitizer runtime. In main() we pass a large number to foo_redundant, which is passed to _alloca().

With the Address Sanitizer in continue_on_error (COE) mode, this program runs to completion, but prints fail. The code generation from the compiler must match the ABI for the Address Sanitizer runtime. For the Address Sanitizer runtime, the compiler grows the allocation size and aligns it 0 mod 32 (a cache line). That math will cause an integer overflow (i.e., wrap around) creating a reasonable, small positive number as the parameter to _alloca.

There will be no stack overflow exception to process the __except handler.`

We have not had time to document or clearly define the subtle differences when a program has undefined behavior. This was a reason for releasing continue_on_error as experimental at first.

NOTE: The frequency with which this concern has become visible, has been rare with our testing infrastructure.

Top Concern – don’t ship without it.

There were six categories of C++ memory safety errors in the 2021 Common Weakness Enumeration (CWE) Top 25 Most Dangerous Software Weaknesses. The best Remote Code Execution Bug was a 20 year old heap-buffer-overflow. This award in 2022 went to BugHunter010 at Cyber KunLun Lab. This engineer discovered a heap-buffer-overflow vulnerability in the RPC protocol with CVSS score 9.8. This bug has existed in the Windows system for more than 20 years. There are new C++ memory safety bugs introduced daily because traditional testing can’t expose these types of bugs without compiler and runtime support.

This new feature is designed to enable developers to implement a simple new gate for shipping C++ on Windows. Using this technology will significantly reduce memory safety errors. If your tests pass but continue_on_error reports any hidden memory safety errors, you should not ship or integrate new code into the development branch.

We intend that continue_on_error be used as pass/fail gate, for all CI/CD pipelines using C and C++.

Call To Action

We invite you to install the 17.6 version of Visual Studio or later, try out the continue on error feature, and give us feedback.

 

3 comments

Discussion is closed. Login to edit/delete existing comments.

  • Oliver Schneider 0

    Could someone from the C++ Team please post about the state of affairs for an OpenMP > 2.0 redistributable?!

    This topic hasn’t been officially discussed since VS2019 and it appears no redistributable DLL exists as of yet.

    Is there a workaround (e.g. using LLVM or Intel OpenMP) that would allow interoperability later on with Microsoft’s own flavor (offering not necessarily a seamless migration, but a “low bar” migration).

    Will there be compatibility with “/clr” (C++/CLI) or is it not even planned, for example? I understand from previous statements that this was one of the sore points.

    • Natalia GlagolevaMicrosoft employee 0

      The MSVC OpenMP 2.0 runtime is redistributable as vcomp140[d].dll and is used with the compiler flag /openmp (see /openmp (Enable OpenMP Support) | Microsoft Learn).
      OpenMP 3.1 support (almost all features) is available, but the runtime is not redistributable. The runtime is libomp140[d]..dll, based on the LLVM runtime and is used with the compiler flag /openmp:llvm. At this time we do not plan to support OpenMP 3.1 and beyond with /clr, and we’re not able to prioritize making libomp140 runtime redistributable, but please make requests through https://developercommunity.visualstudio.com/home to help guide our future priorities.
      Yes, you can use llvm runtime instead of the libomp140 supplied with compiler (the LLVM sources have a change to rename libomp to libomp140 for a better debugging experience) – no, it’s not a supported scenario, so if you’ll do this, to open a bug please verify that the bug exists with stock libomp140. If you want to try this, please reach us via Developer Community, we’d like to know more about your needs.

      Natalia Glagoleva,
      Visual C/C++ Compiler Team

  • Olaf van der Spek 0

    > If your tests pass but continue_on_error reports any hidden memory safety errors, you should not ship or integrate new code into the development branch.

    Why can’t the tests be run with regular abort-on-error asan?

Feedback usabilla icon