{"id":1285,"date":"2018-09-05T13:22:26","date_gmt":"2018-09-05T05:22:26","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/seteplia\/?p=1285"},"modified":"2019-06-11T21:30:00","modified_gmt":"2019-06-12T04:30:00","slug":"combining-iterator-blocks-and-async-methods-in-c","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/combining-iterator-blocks-and-async-methods-in-c\/","title":{"rendered":"Combining iterator blocks and async methods in C#"},"content":{"rendered":"<p>One of the best traits of a well-designed system is composability. Large systems are complex and hierarchical and one of the best ways to fight accidental complexity is to compose a system from smaller components. You write and test each component independently then you glue them together to achieve a higher-level behavior.<\/p>\n<p>Programming languages usually designed to have &#8220;composable&#8221; features as well. You should be able to use multiple features together and the entire thing should just work. In C# you can compose different features together, and everything works. Unless it isn&#8217;t.<\/p>\n<p>Recently I stumbled across a piece of C# code that shows the complexity of the language and demonstrates how easy these days to make mistakes by combining different language features. C# language is complex, and it is important to understand the features deep enough to use them correctly.<\/p>\n<p>Let\u2019s review the following code:<\/p>\n<pre class=\"lang:default decode:true \">public static IEnumerable&lt;Task&lt;int&gt;&gt; ParseFile(string path)\r\n{\r\n    if (string.IsNullOrEmpty(path)) throw new ArgumentNullException(nameof(path));\r\n\r\n    \/\/ OpenWithRetryPolicyAsync uses RetryPolicy to try to open the file more than once.\r\n    \/\/ The method throws FileNotFoundException if the file does not exists.\r\n    using (var file = OpenWithRetryPolicyAsync(path).Result)\r\n    {\r\n        using (var reader = new StreamReader(file))\r\n        {\r\n            \/\/ Let's assume that the first line contains a number of entries.\r\n\r\n            \/\/ Using int.Parse for simplicities sake\r\n            var numberOfEntries = int.Parse(reader.ReadLine());\r\n\r\n            for (int entry = 0; entry &lt; numberOfEntries; entry++)\r\n            {\r\n                yield return ReadAndParseAsync(reader);\r\n            }\r\n        }\r\n    }\r\n}\r\n\r\nprivate static async Task&lt;int&gt; ReadAndParseAsync(StreamReader reader)\r\n{\r\n    string line = await reader.ReadLineAsync();\r\n    return int.Parse(line);\r\n}\r\n\r\npublic static IEnumerable&lt;Task&lt;int&gt;&gt; ParseAndLogIfNeeded(string path)\r\n{\r\n    try\r\n    {\r\n        return ParseFile(path);\r\n    }\r\n    catch (Exception e)\r\n    {\r\n        Console.WriteLine($\"Failed to parse the file: {e.Message}\");\r\n        return Enumerable.Empty&lt;Task&lt;int&gt;&gt;();\r\n    }\r\n}<\/pre>\n<p>The code of <code>OpenWithRetryPolicyAsync<\/code> method is not relevant for our discussion so I moved the code to the end of the blog post.<\/p>\n<ol>\n<li>Exception handling and iterator blocks<\/li>\n<\/ol>\n<p>Iterator blocks are quite different from regular methods. If a method returns <code>IEnumerable&lt;T&gt;<\/code> or <code>IEnumerator&lt;T&gt;<\/code> and has <code>yield return<\/code> in it, the compiler completely changes the semantics of the method. The method becomes lazy and it&#8217;ll be executed not when the method is called, but rather when the sequence is consumed.<\/p>\n<p>This behavior may catch you off-guard especially when the result of the iterator block is &#8220;transformed&#8221; using other lazily evaluated functions, like LINQ operators:<\/p>\n<pre class=\"lang:default decode:true \">var s1 = ParseFile(null); \/\/ no exceptions\r\n\r\nvar s2 = s1.Select(t =&gt; t.GetAwaiter().GetResult()).Where(r =&gt; r == 42); \/\/ no exceptions\r\n\r\nif (s2.Any()) \/\/ throws an exception, potentially in completely different subsystem!\r\n{ }<\/pre>\n<p>This means that <code>ParseFile<\/code> method never throws when the method is called, and the catch block <code>ParseAndLogIfNeeded<\/code> is effectively unreachable and will never handle any errors.<\/p>\n<ol start=\"2\">\n<li>Observing <code>ex.Message<\/code> is not enough<\/li>\n<\/ol>\n<p><strong>Exceptions are like ogres, they have layers.<\/strong> In many cases, exceptions are nested with absolutely unactionable information in the top-level <code>Message<\/code> property.<\/p>\n<p>Let\u2019s suppose we resolved laziness issue by calling <code>.ToList()<\/code>: <code>try { return ParseFile(path).ToList();}<\/code>.<\/p>\n<p>If a given <code>path<\/code> does not exist, then <code>OpenWithRetryPolicyAsync<\/code> throws <code>FileNotFoundException<\/code>. But because the method is asynchronous, the actual error will be wrapped in <code>AggregateException<\/code>. And in this case, the log file (the console in this case) will contain a very useful error message: &#8220;One or more error occurred&#8221;.<\/p>\n<p>And <code>AggregateException<\/code> is not the only example: <code>TypeLoadExcpetion<\/code> and <code>TargetInvocationException<\/code> never contain relevant information in <code>Message<\/code>property. In other cases, you need to know a stack trace or an inner exception in order to understand the root cause of an issue.<\/p>\n<p>In the perfect world, you would never even catch <code>System.Exception<\/code>, but in reality, this suggestion isn&#8217;t practical. So if you ended up catching generic exception type without re-throwing it, at least trace the full exception instance. Believe me, you&#8217;ll save yourself hours of investigation time in the future.<\/p>\n<ol start=\"3\">\n<li>Prefer calling <code>GetAwaiter().GetResult<\/code> over <code>.Result<\/code><\/li>\n<\/ol>\n<p>Blocking asynchronous operations is another anti-pattern that should never be used. But similarly to another anti-pattern mentioned above, you have to use it in practice from time to time to avoid viral asynchrony of the code.<\/p>\n<p>There are 3 ways to synchronously wait for the result of a task-based operation: <code>.Result<\/code>, <code>.Wait<\/code> and <code>GetAwaiter().GetResult()<\/code>. The first two operations will throw <code>AggregateException<\/code> if the task fails and <code>GetAwaiter().GetResult()<\/code> will unwrap and throw the first exception from task&#8217;s <code>AggregateException<\/code>.<\/p>\n<p>In some rare cases, a task is backed by more than one operation and <code>AggregateException<\/code> is what you need. But in the majority tasks are backed by an asynchronous operation that could have only one error, and propagating it directly simplifies exception handling a lot, especially when a caller does not expect to get an <code>AggregateException<\/code> at all.<\/p>\n<ol start=\"4\">\n<li>Using block and the lifetime of asynchronous operations<\/li>\n<\/ol>\n<p>All the issues we discussed before are important but more or less obvious for experienced developers. The last one is a bit subtler.<\/p>\n<p>Let\u2019s take a closer look at the lifetime of the <code>file<\/code> variable in <code>ParseFile<\/code>. The compiler generates a state machine and calls a generated &#8220;finally block&#8221; to close the file when all the items of the sequence are produced. But the problem is that the method &#8220;yields&#8221; tasks (not values) and if the last task is not synchronous, then the task may read a file when the file is already closed!<\/p>\n<p>Here what may happen if the caller of <code>ParseFile<\/code> calls <code>ToList()<\/code> on the result to eagerly obtain all the elements from the sequence and if the tasks are not completed synchronously:<\/p>\n<pre class=\"lang:default decode:true \">Step 1: Opening the file \r\nStep 2: Yielding Task0 \r\nStep 3: Yielding Task1 \r\nStep 4: Closing the file \r\nStep 5-6: Tasks 0 and 1 are reading the file and failing with `ObjectDisposeException`<\/pre>\n<p>The issue may be in your code for years and manifest itself in some weird ways. For instance, the code may work fine on Windows for small files when <code>ReadLineAsync()<\/code>returns synchronously because the data is prefetched, but fail in other cases if <code>ReadLineAsync<\/code> will do an actual IO. (This is actually exactly how we discovered the issue: once we tried to run the code on MacOS we started getting the errors consistently.)<\/p>\n<h4>How to solve the main issue?<\/h4>\n<p>To be honest, I\u2019m not a big fan of <code>IEnumerable&lt;Task&gt;<\/code>. The main issue from my perspective, that the semantics of a method is not clear. Just by looking at a method signature it is hard to tell if the sequence is &#8220;hot&#8221; or &#8220;cold&#8221;. If the sequence is &#8220;cold&#8221; and lazy, then the tasks in the sequence are initiated one by one. If the sequence is &#8220;hot&#8221; and is backed by a collection, then all the tasks are running already.<\/p>\n<p>&#8220;The right&#8221; approach depends on your needs. If clients will use all the items of the result all the time, then just use <code>Task&lt;List&lt;T&gt;&gt;<\/code>. As the author of a function, you should think about an ease of use and clarity of your functions. Types can be a useful source of information and you should try to express your intent as clearly as possible.<\/p>\n<p>Asynchronous sequences could be useful in some cases. For instance, you may use <code>IEnumerable&lt;Task&gt;<\/code> if the sequence contains many elements, the time for getting the next element is high, and clients may need only the first few elements. But in this case, you should explain your intent in comments or, even better, use something like <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/microsoft.servicefabric.data.iasyncenumerable-1?view=azure-dotnet\"><code>IAsyncEnumerable&lt;T&gt;<\/code><\/a>.<\/p>\n<p>The idea of async streams is not new and you may find types similar to <code>IAsyncEnumerable&lt;T&gt;<\/code> in many projects. The idiom is so widely used that the C# language authors consider having a language feature to consume async streams using <code>await foreach<\/code> syntax. (For more information, see <a href=\"https:\/\/github.com\/dotnet\/csharplang\/blob\/master\/proposals\/async-streams.md\">Async Streams<\/a> proposal).<\/p>\n<h4>The source code of <code>OpenWithRetryPolicyAsync<\/code><\/h4>\n<pre class=\"lang:default decode:true \">private class RetryIfFileNotFoundPolicy : ITransientErrorDetectionStrategy\r\n{\r\n    public bool IsTransient(Exception ex) =&gt; ex is FileNotFoundException;\r\n}\r\n\r\npublic static async Task&lt;FileStream&gt; OpenWithRetryPolicyAsync(string path)\r\n{\r\n    \/\/ Trying opening the file more then once.\r\n    return await new RetryPolicy&lt;RetryIfFileNotFoundPolicy&gt;(retryCount: 5)\r\n        .ExecuteAsync(() =&gt; Task.FromResult(new FileStream(path, FileMode.Open)));\r\n}<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the best traits of a well-designed system is composability. Large systems are complex and hierarchical and one of the best ways to fight accidental complexity is to compose a system from smaller components. You write and test each component independently then you glue them together to achieve a higher-level behavior. Programming languages usually [&hellip;]<\/p>\n","protected":false},"author":4004,"featured_media":37840,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3912,6703],"tags":[6695],"class_list":["post-1285","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-async","category-code-reviews","tag-seteplia"],"acf":[],"blog_post_summary":"<p>One of the best traits of a well-designed system is composability. Large systems are complex and hierarchical and one of the best ways to fight accidental complexity is to compose a system from smaller components. You write and test each component independently then you glue them together to achieve a higher-level behavior. Programming languages usually [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/1285","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/4004"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=1285"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/1285\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/37840"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=1285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=1285"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=1285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}