September 5th, 2018

Combining iterator blocks and async methods in C#

Sergey Tepliakov
Senior Software Engineer

One of the best traits of a well-designed system is composability. Large systems are complex and hierarchical and one of the best ways to fight accidental complexity is to compose a system from smaller components. You write and test each component independently then you glue them together to achieve a higher-level behavior.

Programming languages usually designed to have “composable” features as well. You should be able to use multiple features together and the entire thing should just work. In C# you can compose different features together, and everything works. Unless it isn’t.

Recently I stumbled across a piece of C# code that shows the complexity of the language and demonstrates how easy these days to make mistakes by combining different language features. C# language is complex, and it is important to understand the features deep enough to use them correctly.

Let’s review the following code:

public static IEnumerable<Task<int>> ParseFile(string path)
{
    if (string.IsNullOrEmpty(path)) throw new ArgumentNullException(nameof(path));

    // OpenWithRetryPolicyAsync uses RetryPolicy to try to open the file more than once.
    // The method throws FileNotFoundException if the file does not exists.
    using (var file = OpenWithRetryPolicyAsync(path).Result)
    {
        using (var reader = new StreamReader(file))
        {
            // Let's assume that the first line contains a number of entries.

            // Using int.Parse for simplicities sake
            var numberOfEntries = int.Parse(reader.ReadLine());

            for (int entry = 0; entry < numberOfEntries; entry++)
            {
                yield return ReadAndParseAsync(reader);
            }
        }
    }
}

private static async Task<int> ReadAndParseAsync(StreamReader reader)
{
    string line = await reader.ReadLineAsync();
    return int.Parse(line);
}

public static IEnumerable<Task<int>> ParseAndLogIfNeeded(string path)
{
    try
    {
        return ParseFile(path);
    }
    catch (Exception e)
    {
        Console.WriteLine($"Failed to parse the file: {e.Message}");
        return Enumerable.Empty<Task<int>>();
    }
}

The code of OpenWithRetryPolicyAsync method is not relevant for our discussion so I moved the code to the end of the blog post.

  1. Exception handling and iterator blocks

Iterator blocks are quite different from regular methods. If a method returns IEnumerable<T> or IEnumerator<T> and has yield return in it, the compiler completely changes the semantics of the method. The method becomes lazy and it’ll be executed not when the method is called, but rather when the sequence is consumed.

This behavior may catch you off-guard especially when the result of the iterator block is “transformed” using other lazily evaluated functions, like LINQ operators:

var s1 = ParseFile(null); // no exceptions

var s2 = s1.Select(t => t.GetAwaiter().GetResult()).Where(r => r == 42); // no exceptions

if (s2.Any()) // throws an exception, potentially in completely different subsystem!
{ }

This means that ParseFile method never throws when the method is called, and the catch block ParseAndLogIfNeeded is effectively unreachable and will never handle any errors.

  1. Observing ex.Message is not enough

Exceptions are like ogres, they have layers. In many cases, exceptions are nested with absolutely unactionable information in the top-level Message property.

Let’s suppose we resolved laziness issue by calling .ToList(): try { return ParseFile(path).ToList();}.

If a given path does not exist, then OpenWithRetryPolicyAsync throws FileNotFoundException. But because the method is asynchronous, the actual error will be wrapped in AggregateException. And in this case, the log file (the console in this case) will contain a very useful error message: “One or more error occurred”.

And AggregateException is not the only example: TypeLoadExcpetion and TargetInvocationException never contain relevant information in Messageproperty. In other cases, you need to know a stack trace or an inner exception in order to understand the root cause of an issue.

In the perfect world, you would never even catch System.Exception, but in reality, this suggestion isn’t practical. So if you ended up catching generic exception type without re-throwing it, at least trace the full exception instance. Believe me, you’ll save yourself hours of investigation time in the future.

  1. Prefer calling GetAwaiter().GetResult over .Result

Blocking asynchronous operations is another anti-pattern that should never be used. But similarly to another anti-pattern mentioned above, you have to use it in practice from time to time to avoid viral asynchrony of the code.

There are 3 ways to synchronously wait for the result of a task-based operation: .Result, .Wait and GetAwaiter().GetResult(). The first two operations will throw AggregateException if the task fails and GetAwaiter().GetResult() will unwrap and throw the first exception from task’s AggregateException.

In some rare cases, a task is backed by more than one operation and AggregateException is what you need. But in the majority tasks are backed by an asynchronous operation that could have only one error, and propagating it directly simplifies exception handling a lot, especially when a caller does not expect to get an AggregateException at all.

  1. Using block and the lifetime of asynchronous operations

All the issues we discussed before are important but more or less obvious for experienced developers. The last one is a bit subtler.

Let’s take a closer look at the lifetime of the file variable in ParseFile. The compiler generates a state machine and calls a generated “finally block” to close the file when all the items of the sequence are produced. But the problem is that the method “yields” tasks (not values) and if the last task is not synchronous, then the task may read a file when the file is already closed!

Here what may happen if the caller of ParseFile calls ToList() on the result to eagerly obtain all the elements from the sequence and if the tasks are not completed synchronously:

Step 1: Opening the file 
Step 2: Yielding Task0 
Step 3: Yielding Task1 
Step 4: Closing the file 
Step 5-6: Tasks 0 and 1 are reading the file and failing with `ObjectDisposeException`

The issue may be in your code for years and manifest itself in some weird ways. For instance, the code may work fine on Windows for small files when ReadLineAsync()returns synchronously because the data is prefetched, but fail in other cases if ReadLineAsync will do an actual IO. (This is actually exactly how we discovered the issue: once we tried to run the code on MacOS we started getting the errors consistently.)

How to solve the main issue?

To be honest, I’m not a big fan of IEnumerable<Task>. The main issue from my perspective, that the semantics of a method is not clear. Just by looking at a method signature it is hard to tell if the sequence is “hot” or “cold”. If the sequence is “cold” and lazy, then the tasks in the sequence are initiated one by one. If the sequence is “hot” and is backed by a collection, then all the tasks are running already.

“The right” approach depends on your needs. If clients will use all the items of the result all the time, then just use Task<List<T>>. As the author of a function, you should think about an ease of use and clarity of your functions. Types can be a useful source of information and you should try to express your intent as clearly as possible.

Asynchronous sequences could be useful in some cases. For instance, you may use IEnumerable<Task> if the sequence contains many elements, the time for getting the next element is high, and clients may need only the first few elements. But in this case, you should explain your intent in comments or, even better, use something like IAsyncEnumerable<T>.

The idea of async streams is not new and you may find types similar to IAsyncEnumerable<T> in many projects. The idiom is so widely used that the C# language authors consider having a language feature to consume async streams using await foreach syntax. (For more information, see Async Streams proposal).

The source code of OpenWithRetryPolicyAsync

private class RetryIfFileNotFoundPolicy : ITransientErrorDetectionStrategy
{
    public bool IsTransient(Exception ex) => ex is FileNotFoundException;
}

public static async Task<FileStream> OpenWithRetryPolicyAsync(string path)
{
    // Trying opening the file more then once.
    return await new RetryPolicy<RetryIfFileNotFoundPolicy>(retryCount: 5)
        .ExecuteAsync(() => Task.FromResult(new FileStream(path, FileMode.Open)));
}

 

Author

Sergey Tepliakov
Senior Software Engineer

Lead software developer and software architect. Specialties: C#, C++, OOD, Design Patterns, Unit Testing, Multitheading, TPL

0 comments

Discussion are closed.

Feedback