Task Exception Handling in .NET 4.5

For the .NET Framework 4.5 Developer Preview, a lot of work has been done to improve the Task Parallel Library (TPL), in terms of functionality, in terms of performance, and in terms of integration with the rest of the .NET Framework. With all of this work, we’ve strived for a very high compatibility bar, which means your applications that use TPL in .NET 4 should “just work” when they upgrade to run against .NET 4.5 (if they don’t, please let us know, as it’s something we’ll want to fix). Even with such a high compatibility bar, however, there are a few interesting behaviors to be aware of when it comes to Tasks and exception handling.

Unobserved Exceptions

Those of you familiar with Tasks in .NET 4 will know that the TPL has the notion of “unobserved” exceptions. This is a compromise between two competing design goals in TPL: to support marshaling unhandled exceptions from the asynchronous operation to the code that consumes its completion/output, and to follow standard .NET exception escalation policies for exceptions not handled by the application’s code. Ever since .NET 2.0, exceptions that go unhandled on newly created threads, in ThreadPool work items, and the like all result in the default exception escalation behavior, which is for the process to crash. This is typically desirable, as exceptions indicate something has gone wrong, and crashing helps developers to immediately identify that the application has entered an unreliable state. Ideally, tasks would follow this same behavior. However, tasks are used to represent asynchronous operations with which code later joins, and if those asynchronous operations incur exceptions, those exceptions should be marshaled over to where the joining code is running and consuming the results of the asynchronous operation. That inherently means that TPL needs to backstop these exceptions and hold on to them until such time that they can be thrown again when the consuming code accesses the task. Since that prevents the default escalation policy, .NET 4 applied the notion of “unobserved” exceptions to complement the notion of “unhandled” exceptions. An “unobserved” exception is one that’s stored into the task but then never looked at in any way by the consuming code. There are many ways of observing the exception, including Wait()’ing on the Task, accessing a Task<TResult>’s Result, looking at the Task’s Exception property, and so on. If code never observes a Task’s exception, then when the Task goes away, the TaskScheduler.UnobservedTaskException gets raised, giving the application one more opportunity to “observe” the exception. And if the exception still remains unobserved, the exception escalation policy is then enabled by the exception going unhandled on the finalizer thread.

In .NET 4.5, Tasks have significantly more prominence than they did in .NET 4, as they’re baked in to the C# and Visual Basic languages as part of the new async features supported by the languages. This in effect moves Tasks out of the domain of experienced developers into the realm of everyone. As a result, it also leads to a new set of tradeoffs about how strict to be around exception handling. An example of how this could affect developers is highlighted in the below code snippet:

Task op1 = FooAsync();
Task op2 = BarAsync();
await op1;
await op2;

In this code, the developer is launching two asynchronous operations to run in parallel, and is then asynchronously waiting for each using the new await language feature (for multiple reasons, it would be better if this code were written with a single “await Task.WhenAll(op1, op2)” statement, rather than individually awaiting each of op1 and op2, but that ideal doesn’t significantly decrease the likelihood that the above code will still be written). If while testing this method neither op1 or op2 faults, there’s no problem as there are no exceptions. If just op1 faults but op2 completes successfully, there’s no problem: op1’s exception will propagate out of the await and everything will proceed as expected. And if just op2 faults but op1 completes successfully, again, no problem. However, consider what will happen if both op1 and op2 fault. Awaiting op1 will propagate op1’s exception, and therefore op2 will never be awaited. As a result, op2’s exception will not be observed, and the process would eventually crash.

To make it easier for developers to write asynchronous code based on Tasks, .NET 4.5 changes the default exception behavior for unobserved exceptions. While unobserved exceptions will still cause the UnobservedTaskException event to be raised (not doing so would be a breaking change), the process will not crash by default. Rather, the exception will end up getting eaten after the event is raised, regardless of whether an event handler observes the exception. This behavior can be configured, though. A new CLR configuration flag may be used to revert back to the crashing behavior of .NET 4, e.g.

<configuration>
    <runtime>
        <ThrowUnobservedTaskExceptions enabled=”true”/>
    </runtime>
</configuration>

Note that this change doesn’t mean developers should be careless about ignoring unhandled exceptions… it just means the runtime is a bit more forgiving than it used to be. My recommendation is that any developer building library components should run their tests with this flag enabled, and should make sure that no exceptions are going unobserved in the components they build. That way, the application developer consuming these components can make the best decision for their app as to whether to set this flag or not: it would be unfortunate if the application developer wanted to be strict about enforcing all exceptions to be observed, but couldn’t because the library they consumed wasn written by developers that weren’t careful enough.

“Task.Result” vs “await task”

When you use Task.Wait() or Task.Result on a task that faults, the exception that caused the Task to fault is propagated, but it’s not thrown directly… rather, it’s wrapped in an AggregateException object, which is then thrown. There were two primary motivations for wrapping all exceptions like this. First, as of .NET 4, there was no good way in managed code to throw an exception that had previously been thrown without overwriting important information stored in that exception. Namely, throwing an exception with “throw e;” overwrites the exceptions’ stack trace and its “Watson bucket” information (the data collected and uploaded to help application developers after deployment find the root cause of the most common crashes in their applications) with details about the throw site, leading to very poor debuggability when dealing with exceptions marshaled across threads. For this reason, it’s been a .NET design guideline that in cases like this, where an exception’s propagation needs to be interrupted, it should be wrapped in another exception object. You can see that with reflection, for example, where exceptions are wrapped in a TargetInvocationException. For better or worse, this is a way to preserve the relevant information as part of an inner exception. And thus for TPL, where marshaling exceptions across threads is the name of the game, we wrap exceptions before propagating them.

We could have picked various wrapper exception types, such as TargetInvocationException, but we chose AggregateException because of the second motivation: tasks may fault with more than one exception, and thus need to store multiple. Whether because of child tasks that fault, or because of combinators like Task.WhenAlll, a single task may represent multiple operations, and more than one of those may fault. In such a case, and with the goal of not losing exception information (which can be important for post-mortem debugging), we want to be able to represent multiple exceptions, and thus for the wrapper type we chose AggregateException.

That explains why we chose the design we did for .NET 4. Now, let’s assume for a moment that we didn’t have to deal with the first issue above, that of overwriting an exception’s information. In that case, we have a choice to make, since whether the task actually stores multiple exceptions in an aggregate is separate from whether multiple exceptions are propagated in aggregate when you wait on a task. We have three primary options here: only propagate the first exception even if there are multiple, always propagate all exceptions in an aggregate even if there’s only one, and propagate a single exception if there’s one or an aggregate if there’s multiple. We very quickly ruled out the last option, as in many scenarios it leads to needing duplicate catch blocks: for cases when multiple exceptions occur, you need a handler for an aggregate exception (and that handler needs to be able to special case each of the inner exceptions), and for cases when there’s only one exception, you need a separate catch block for each of the specialized exceptions… as a result, you end up having the same logic duplicated in two places. In our experience, it’s also just more difficult to reason about. That leaves the options of always propagating the first (by some definition of “first”) or always propagating an aggregate. When designing Task.Wait in .NET 4, we chose the latter. That decision was influenced by the need to not overwrite details, but also by the primary use case for tasks at the time, that of fork/join parallelism, where the potential for multiple exceptions is quite common.

While similar to Task.Wait at a high level (i.e. forward progress isn’t made until the task completes), “await task” represents a very different primary set of scenarios. Rather than being used for fork/join parallelism, the most common usage of “await task” is in taking a sequential, synchronous piece of code and turning it into a sequential, asynchronous piece of code. In places in your code where you perform a synchronous operation, you replace it with an asynchronous operation represented by a task and “await” it. As such, while you can certainly use await for fork/join operations (e.g. utilizing Task.WhenAll), it’s not the 80% case. Further, .NET 4.5 sees the introduction of System.Runtime.ExceptionServices.ExceptionDispatchInfo, which solves the problem of allowing you to marshal exceptions across threads without losing exception details like stack trace and Watson buckets. Given an exception object, you pass it to ExceptionDispatchInfo.Create, which returns an ExceptionDispatchInfo object that contains a reference to the Exception object and a copy of the its details. When it’s time to throw the exception, the ExceptionDispatchInfo’s Throw method is used to restore the contents of the exception and throw it without losing the original information (the current call stack information is appended to what’s already stored in the Exception).

Given that, and again having the choice of always throwing the first or always throwing an aggregate, for “await” we opt to always throw the first. This doesn’t mean, though, that you don’t have access to the same details. In all cases, the Task’s Exception property still returns an AggregateException that contains all of the exceptions, so you can catch whichever is thrown and go back to consult Task.Exception when needed. Yes, this leads to a discrepancy between exception behavior when switching between “task.Wait()” and “await task”, but we’ve viewed that as the significant lesser of two evils.

As I mentioned previously, we have a very high compatibility bar, and thus we’ve avoided breaking changes. As such, Task.Wait retains its original behavior of always wrapping. However, you may find yourself in some advanced situations where you want behavior similar to the synchronous blocking employed by Task.Wait, but where you want the original exception propagated unwrapped rather than it being encased in an AggregateException. To achieve that, you can target the Task’s awaiter directly. When you write “await task;”, the compiler translates that into usage of the Task.GetAwaiter() method, which returns an instance that has a GetResult() method. When used on a faulted Task, GetResult() will propagate the original exception (this is how “await task;” gets its behavior). You can thus use “task.GetAwaiter().GetResult()” if you want to directly invoke this propagation logic.