May 22nd, 2009

.NET 4 Cancellation Framework

A very interesting addition to .NET 4 is a set of new types that specifically assist with building cancellation-aware applications and libraries. The new types enable rich scenarios for convenient and safe cancellation, and help simplify situations that used to be be difficult and error-prone and non-composable.

The details of the new types are described below, but lets begin with some of the motivating principles that the new types were designed to support:

  1. When any unit of work is commenced, it should have a consistent means to support early termination in response to a cancellation request.
  2. If some unit of work controls various other moving pieces, it should be able to conveniently chain cancellation requests to them.
  3. Blocking calls should be able to support cancellation.
  4. Calls to complex operations, such as MoveNext() on a PLINQ enumerator, should have simple yet comprehensive cancellation support.
  5. When infrastructure makes calls back to potentially long-running user code, it should be possible for the user code to observe and respond to cancellation requests in a cooperative fashion.
  6. Cancellation should be an obvious part of an API with clean and consistent semantics.
  7. Cancellation should not be forceful but instead cooperative.

In many prevailing systems, cancellation has been a secondary feature that rarely gets treated in sufficient detail to enable all of the above principles in a comprehensive fashion. The new types introduced to .NET 4 raise cancellation to be a primary concept for .NET APIs and one that can be cleanly and easily incorporated into any system.

Previous approaches

Cancellation is a feature that finds its way into many APIs and applications either by design or by necessity. An interesting classification of common approaches is Herb Sutter’s article “Interrupt Politely”, which outlines four distinct techniques. Those four techniques can be distilled down to the two major techniques: The first is to use asynchronous, forced cancellation such as thread-abort, process-kill, or AppDomain-unload. Any asynchronous technique has the common trait that the target of cancellation is caught completely unaware and cannot easily ensure data safety. For many reasons, but particularly due to the risk of partial updates to shared data, the use of asynchronous techniques is strongly discouraged.

The second general technique is cooperative whereby a request for cancellation is communicated to the target in some manner and the target of cancellation observes and enacts the request itself. This is inherently safer, but it does require that all targets of cancellation are fully participating in the process. The most commonly implemented style for this approach involves every potential target object tracking some state that represents whether cancellation has been requested although as we shall see this has some inherent problems and can be improved.

.NET libraries in general follow the cooperative approach and various examples can be found; for example, the BackgroundWorker class in Windows Forms has a member called CancellationPending which is set to true by calling the method
BackgroundWorker.CancelAsync(). Similarly, the classes based on System.Net.WebRequest provide an Abort() method to cancel the current operation and prevent further operations from commencing. A BackgroundWorker has a clear notion of the work that should be canceled, but a WebRequest may have either an active GetResponse() or GetRequestStream() call, and so the target of the cancellation request is not as clear. More generally, if a single object can be used to run distinct operations or if it can run many operations simultaneously, then it becomes very awkward to track cancellation requests on the object itself. And if the object allows the cancellation request to be reset either before or after enactment of a cancellation request, then there are inherent race conditions that can lead to subtle problems.

So although cooperative cancellation is far superior to asynchronous cancellation, the common approach has the following issues:

  1. Data is often imbued with a notion of cancellation, and this is a problematic abstraction. A better model is that specific occurrences of an operation may be cancelable, but data is just data.
  2. Cancellation achieved by direct calls to Cancel() and Abort() methods on specific targets is not groupable or chainable.
  3. It is often necessary to perform some work in addition to setting a Boolean flag when enacting cancellation, but this support is not standardized or ubiquitous.
  4. If an operation on an object makes a callback to user code, it is often awkward to forward the cancellation request made on the parent object to the user code.
  5. If a class tracks cancellation state but allows multiple concurrent operations, or if it allows operations to commence after an earlier operation was canceled , then race-conditions become an inherent issue that frequently leads to subtle bugs.
  6. When cancellation is requested, identified and enacted, the resulting behaviors are not consistent. Some methods return as if nothing happened, some throw OperationCanceledException, some throw custom exceptions, and some return a custom error code.

It is to address these issues that .NET 4 introduces a new approach to cooperative cancellation. And as we will examine in a future blog post, this approach is perfectly suited to the needs of Parallel Extensions and has been adopted across all our types.

New cancellation types

The new types provide a cooperative cancellation framework that is rich and general purpose. Some specific design principles of the new approach are:

  1. Only specific framework types should represent a request for cancellation.
  2. Long-running or blocking methods should accept an explicit parameter that will inform the method of a cancellation request.  This is in contrast to mechanisms whereby operations automatically attach themselves to an ambient cancellation-request object, such as one that has been stashed in thread-local-storage. Mechanisms that involve automatic discovery and attachment generally become awkward when the exact specification of the rules are fleshed out.
  3. The response made when cancellation is enacted should be to throw a consistent exception type.

Two new types form the basis of the framework: A CancellationToken is a struct that represents a ‘potential request for cancellation’. This struct is passed into method calls as a parameter and the method can poll on it or register a callback to be fired when cancellation is requested. A CancellationTokenSource is a class that provides the mechanism for initiating a cancellation request and it has a Token property for obtaining an associated token. It would have been natural to combine these two classes into one, but this design allows the two key operations (initiating a cancellation request vs. observing and responding to cancellation) to be cleanly separated. In particular, methods that take only a CancellationToken can observe a cancellation request but cannot initiate one.

When a call is made to CancellationTokenSource.Cancel(), the IsCancellationRequested property of any associated CancellationToken will become true and methods can poll on this to observe the cancellation request. If polling is not possible or not convenient, a callback can be attached via CancellationToken.Register(). A CancellationToken can be copied and passed around and all the copies will simply point back to the original CancellationTokenSource which holds all the actual state regarding cancellation requests, callback lists, etc.

imageWhen methods see that CancellationToken.IsCancellationRequested == true, they should respond to this by throwing an OperationCanceledException(cancellationToken). Throwing this exception communicates that the method finished without fully completing its stated purpose and states which CancellationToken it observed. Note that the existing OperationCanceledException class has been augmented with constructors and a property for CancellationToken.

We can now look at some specific examples: the first example shows how to create a CancellationTokenSource, pass its associated CancellationToken to a method, and how to poll on that token and correctly throw an OperationCanceledException. Note that cancellation doesn’t need to be checked after every single statement, but it might be checked once per few thousand instructions or so. A general guideline is that cancellation should be checked as frequently as possible subject only to avoiding significant performance impact. Because polling on ct.IsCancellationRequested is very cheap (a single volatile read and a handful of IL instructions), cancellation can be tested frequently without a significant performance impact.

EventHandler externalEvent;
void Example1()
{
   CancellationTokenSource cts = new CancellationTokenSource();
   externalEvent +=
      (sender, obj) => { cts.Cancel(); }; //wire up an external requester
   try
   {
      int val = LongRunningFunc(cts.Token);
   }
   catch (OperationCanceledException)
   {
      //cleanup after cancellation if required…
   }
}

private static int LongRunningFunc(CancellationToken token)
{
   int total = 0;
   for(int i=0; i<1000; i++){
      for (int j = 0; j < 1000; j++)
      {
         total++;
      }
      if(token.IsCancellationRequested){ // observe cancellation
         throw new OperationCanceledException(token); // acknowledge cancellation
      }
   }
   return total;
}

The second example demonstrates using the callback facility on CancellationToken for when polling is not an option.

void BlockingOperation(CancellationToken token)
{
   ManualResetEvent mre = new ManualResetEvent(false);
   //register a callback that will set the MRE
   CancellationTokenRegistration registration =
      token.Register(() => mre.Set());
   using (registration)
   {
      mre.WaitOne();
      if (token.IsCancellationRequested) //did cancellation wake us?
          throw new OperationCanceledException(token);
   } //dispose the registration, which performs the deregisteration.
}

A common pattern in these situations is to use the callback to force whatever condition the blocking call is waiting on, and then immediately check to see if cancellation was the reason the blocking call was woken. Although there is a race condition here that normal waking may occur just before a cancellation request is made, this race is benign as it is always safe to respond to the cancellation request regardless of why the wait finished.

The third example shows how to listen for cancellation via a regular WaitHandle. Under the covers, CancellationToken.WaitHandle is a lazily-allocated ManualResetEvent that becomes set when cancellation is requested.

void Wait(WaitHandle wh, CancellationToken token)
{
   WaitHandle.WaitAny(new [] {wh, token.WaitHandle});
   if (token.IsCancellationRequested) //did cancellation wake us?
     throw new OperationCanceledException(token);
}

And finally, the fourth example shows running multiple asynchronous operations which share a common CancellationToken.

void Example4()
{
   CancellationTokenSource cts = new CancellationTokenSource();
   StartAsyncFunc1(cts.Token);
   StartAsyncFunc2(cts.Token);
   StartAsyncFunc3(cts.Token);
   //…
   cts.Cancel(); // all listeners see the same cancellation request.
}

In particular it is interesting to note that each of the asynchronous functions might pass on the token to other methods and that everyone who is observing the token (or copies of the token) will see the cancellation request when cts.Cancel() is called.

Details

A few details are important for an understanding of the new approach.

A CancellationTokenSource may transition from non-canceled only once and cannot be reset. This prevents various race-conditions that arise deep inside method calls if reset were permitted. At an application level, however, reset functionality is often required and this is achieved by creating a new CancellationTokenSource to replace a used one.

A CancellationToken is a lightweight struct that includes only a single reference back to a CancellationTokenSource. As such, it is the same ‘weight’ as a normal object reference, but because it is a separate type, it can have its own API than provides only read operations.

When a callback is registered to a CancellationToken, the current thread’s ExecutionContext is captured so that the callback will be run with the the exact same security context . The capturing of the current thread’s synchronization context is optional can be requested via an overload of ct.Register() if required. Callbacks are normally stored and then run when cancellation is requested, but if a callback is registered after cancellation has been requested, the callback will run immediately on the current thread, or via Send() on the current SynchronizationContext if applicable.

When a callback is registered to a CancellationToken, the returned object is a CancellationTokenRegistration. This is a light struct type that is IDiposable, and disposing this registration object causes the callback to be deregistered. A guarantee is made that after the Dispose() method has returned, the registered callback is neither running nor will subsequently commence. A consequence of this is that CancellationTokenRegistration.Dispose()must block if the callback is currently executing. Hence, all registered callbacks should be fast and not block for any significant duration.

Advanced patterns

To finish this introduction to the new cancellation framework we can look briefly at two more advanced patterns of usage.

One situation that arises is that some methods wish to observe two separate tokens. The best approach here is to create a linked CancellationTokenSource that will be signaled if either of the source tokens has become signaled.

image

The method CancellationTokenSource.CreateLinkedTokenSource() is specifically designed to assist with this and registers the callbacks as required. For example:

void LinkingExample(CancellationToken ct1, CancellationToken ct2)
{
   CancellationTokenSource linkedCTS =
   CancellationTokenSource.CreateLinkedTokenSource(ct1, ct2);
   try
   {
      SlowFunc(linkedCTS.Token);
   }
   catch(OperationCanceledException oce)
   {
      if (ct1.IsCancellationRequested)
      {
         // …
      }
      else if (ct2.IsCancellationRequested)
      {
         // …
      }
   }
   linkedCTS.Dispose(); // clean up the linking. required.
}

By using cancellation linking, cancellation-aware methods need only take one CancellationToken parameter. Be aware however that because linking attaches callbacks onto the source tokens, it is very important to clean up properly, otherwise linkedCTS and the callback delegates cannot be garbage collected until the linked sources are also available for collection. This is achieve by explicitly disposing linkedCTS, or via the using() pattern.

Another scenario that arises is to allow library code and user code to cooperate when responding to cancellation. Consider a library method that accepts a CancellationToken, and then calls back to user code. If the user code responds to a cancellation request, it wants to communicate this cleanly to the library code. This is achieved by the user code throwing an OperationCanceledException that mentions the same CancellationToken that the library code was given. Typically the user code will know about the same CancellationToken by getting it through some side channel, e.g. via closure-capture, rather than the library method explicitly passing it on. For example:

private void RunQuery()
{
  
int[] data = {1,2,3};
   CancellationTokenSource cts = new CancellationTokenSource();
   var query = data.AsParallel()
                .WithCancellation(cts.Token) // token given to library code
                .Select( (x) => SlowFunc(x, cts.Token) ); // token passed to user code
}

private int SlowFunc(int x, CancellationToken token)
{
   int result
   while(…)
   {
      if (token.IsCancellationRequested)
         throw new OperationCanceledException(token);
      …
   }
   return result;
}

Notice that the library function (in this case a PLINQ query) and the user code are both observing the same token and that the user-code throws an OperationCanceledException that mentions this common token. This allows the library code to see that the user code has performed cooperative cancellation and has not simply thrown some arbitrary exception. The library may respond to this by behaving exactly as though it had seen the cancellation request itself, rather than thinking the user code failed unexpectedly. It also has the advantage that if the user code throws an OperationCanceledException for some unexpected reason (and mentions a different token or no token at all), it will not be mistakenly interpreted as cooperative cancellation. Because the token is tracked in the OperationCanceledException itself, there is no confusion about the reason for the exception being thrown.

Conclusion

The new cancellation types introduced to .NET 4 provide a new framework for building systems that have rich and consistent cancellation behaviors. This will specifically assist with GUIs, applications, and libraries that manage long running or blocking operations. Over time we expect to see more libraries migrate to this approach and reduce the variation and issues inherent in the current crop of cancellation solutions. Applications and 3rd-party libraries can use these new types to interoperate cleanly with Parallel Extensions and for their own purposes.

Author

0 comments

Discussion are closed.