{"id":55980,"date":"2009-08-04T17:55:00","date_gmt":"2009-08-04T17:55:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/pfxteam\/2009\/08\/04\/parallel-extensions-and-io\/"},"modified":"2009-08-04T17:55:00","modified_gmt":"2009-08-04T17:55:00","slug":"parallel-extensions-and-io","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/parallel-extensions-and-io\/","title":{"rendered":"Parallel Extensions and I\/O"},"content":{"rendered":"<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">In this post, we\u2019ll investigate some ways that Parallel Extensions can be used to introduce parallelism and asynchrony to I\/O scenarios.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Here\u2019s a simple scenario. <span>&nbsp;<\/span>I want to retrieve data from a number of web resources.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span>static string[] Resources = new string[]<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>&#8220;http:\/\/www.microsoft.com&#8221;, &#8220;http:\/\/www.msdn.com&#8221;,<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>&#8220;http:\/\/www.msn.com&#8221;, &#8220;http:\/\/www.bing.com&#8221;<\/span><\/p>\n<p class=\"MsoNormal\"><span>};<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Using the WebClient class, I might end up with the following.<span><\/span><\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span>var data = new List&lt;byte[]&gt;();<\/span><\/p>\n<p class=\"MsoNormal\"><span>var wc = new WebClient();<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>foreach (string resource in Resources)<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>data.Add(wc.DownloadData(resource));<\/span><\/p>\n<p class=\"MsoNormal\"><span>}<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>\/\/ Use the data.<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">However, these days, downloading data from the web usually utilizes only a small fraction of my available bandwidth.<span>&nbsp; <\/span>So there are potential performance gains here, and with TPL\u2019s parallel ForEach loop, they are easily had.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span>var data = new ConcurrentBag&lt;byte[]&gt;();<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>Parallel.ForEach(Resources, resource =&gt;<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>data.Add((new WebClient()).DownloadData(resource));<\/span><\/p>\n<p class=\"MsoNormal\"><span>});<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>\/\/ Use the data.<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Note that WebClient instances do not support multiple pending asynchronous operations (and the class is not thread-safe), so I need a separate instance for each operation.<span>&nbsp; <\/span>Also, since the normal BCL collections (List&lt;T&gt;, etc.) are not thread-safe, I need something like ConcurrentBag&lt;T&gt; to store the results.<span>&nbsp; <\/span>Of course, storing all the data in a collection assumes the scenario requires that all retrieval operations complete before processing.<span>&nbsp; <\/span>If this was not the case, I could start processing each data chunk right after obtaining it right in the loop, exploiting more parallelism.<span>&nbsp; <\/span>However, for the purposes of this investigation, I wanted to determine the possible performance gains in the absence of CPU-intensive work.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">As it turns out, the above often yields linear speedup against sequential, with some variation due to the inconsistent nature of web site response times.<span>&nbsp; <\/span>And it was pretty straightforward.<span>&nbsp; <\/span>However, things would have been even easier had I started out with a \u201cLINQ\u201d frame of mind.<span>&nbsp; <\/span>First, I can convert my original sequential code to a LINQ query.<span>&nbsp; <\/span>Then, I can turn it into PLINQ using the AsParallel method and use WithDegreeOfParallelism to control the number of concurrent retrievals.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span>var data =<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>from resource in Resources<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span>.AsParallel()<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><span>.WithDegreeOfParallelism(numConcurrentRetrievals)<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>select (new WebClient()).DownloadData(resource);<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>\/\/ Sometime later&#8230;<\/span><\/p>\n<p class=\"MsoNormal\"><span>foreach (byte[] result in data) { }<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">(As an aside, it\u2019s worth noting that WithDegreeOfParallelism causes PLINQ to use <i>exactly<\/i> numConcurrentRetrivals Tasks.<span>&nbsp; <\/span>This differs from the MaxDegreeOfParallelism option that I could have used with my previous Parallel.ForEach code, because that option sets the <i>maximum<\/i>; the actual number of threads still depends on the ThreadPool\u2019s thread-adjusting logic.)<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">This code offers enhanced readability and makes storing the data easier.<span>&nbsp; <\/span>In addition, I can continue on the main thread, as PLINQ queries do not execute until the data they represent is accessed \u2013 that is, when MoveNext is called on the relevant enumerator.<span>&nbsp; <\/span>However, in this particular case, I don\u2019t want to delay my query\u2019s execution until I need the data; I actually want to execute my query <i>while<\/i> continuing on the main thread.<span>&nbsp; <\/span>To do so, I can wrap my query in a Task and force its immediate execution using ToArray.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span>var t = Task.Factory.StartNew(() =&gt;<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>return<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span>from resource in Resources<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span>.AsParallel()<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><span>.WithDegreeOfParallelism(numConcurrentRetrievals)<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span>select (new WebClient()).DownloadData(resource).ToArray();<\/span><\/p>\n<p class=\"MsoNormal\"><span>});<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>\/\/ Sometime later&#8230;<\/span><\/p>\n<p class=\"MsoNormal\"><span>foreach (byte[] result in t.Result) { }<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>\/\/ OR, use a continuation<\/span><\/p>\n<p class=\"MsoNormal\"><span>t.ContinueWith(dataTask =&gt;<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>foreach (byte[] result in dataTask.Result) { }<\/span><\/p>\n<p class=\"MsoNormal\"><span>});<\/span><\/p>\n<p class=\"MsoNormal\"><span><\/span><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Now, I\u2019ve got asynchrony, and I still get similar speedup.<span>&nbsp; <\/span>However, there\u2019s still something about this code that is not ideal.<span>&nbsp; <\/span>The work (sending off download requests and blocking) requires almost no CPU, but it is being done by ThreadPool threads since I\u2019m using the default scheduler.<span>&nbsp; <\/span>Ideally, threads should only be used for CPU-bound work (when there\u2019s actually work to do).<span>&nbsp; <\/span>Of course, this probably won\u2019t matter much for most typical client applications, but in scenarios where resources are tight, it could be a serious issue.<span>&nbsp; <\/span>Therefore, it\u2019s worth investigating how we might reduce the number of blocked threads, perhaps by not using threads at all where possible.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><font size=\"3\" face=\"Calibri\">To achieve this, I\u2019ll be using ideas from a previous post: <\/font><a href=\"https:\/\/blogs.msdn.com\/pfxteam\/archive\/2009\/06\/19\/9791857.aspx\"><font size=\"3\" face=\"Calibri\">Tasks and the Event-based Asynchronous Pattern<\/font><\/a><font size=\"3\" face=\"Calibri\">.<span>&nbsp; <\/span>That article explained how to create a Task&lt;TResult&gt; from any type that implements the EAP, and it presented an extension method for WebClient (available along with many others in the <\/font><a href=\"https:\/\/code.msdn.microsoft.com\/ParExtSamples\"><font size=\"3\" face=\"Calibri\">ParallelExtensionsExtras<\/font><\/a><font size=\"3\"><font face=\"Calibri\">):<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span>public static Task&lt;byte[]&gt; DownloadDataTask(<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>this WebClient webClient, Uri address);<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">The key point is that this method produces a Task&lt;TResult&gt; by integrating WebClient\u2019s EAP implementation with a TaskCompletionSource&lt;TResult&gt;, and I can use it to rewrite my scenario.<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><span>var tasks = new Queue&lt;Task&lt;byte[]&gt;&gt;();<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>foreach (string resource in Resources)<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>WebClient wc = new WebClient();<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>tasks.Enqueue(wc.DownloadDataTask(new Uri(resource)));<\/span><\/p>\n<p class=\"MsoNormal\"><span>}<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>\/\/ Sometime later&#8230;<\/span><\/p>\n<p class=\"MsoNormal\"><span>while (tasks.Count &gt; 0)<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp; <\/span><span>&nbsp;<\/span>byte[] result = tasks.Dequeue().Result;<\/span><\/p>\n<p class=\"MsoNormal\"><span>}<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><span>\/\/ OR, use a continuation<\/span><\/p>\n<p class=\"MsoNormal\"><span>Task&lt;byte[]&gt;.Factory.ContinueWhenAll(tasks.ToArray(), dataTasks =&gt;<\/span><\/p>\n<p class=\"MsoNormal\"><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span><span>&nbsp;&nbsp;&nbsp; <\/span>foreach (var dataTask in dataTasks) <\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;&nbsp;&nbsp; <\/span><span>{<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/span><span>byte[] result = dataTask.Result;<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;&nbsp;&nbsp; <\/span><span>}<\/span><\/p>\n<p class=\"MsoNormal\"><span>});<\/span><\/p>\n<p class=\"MsoNormal\"><span>&nbsp;<\/span><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">With this, I\u2019ve got a solution that uses parallelism for speed-up, is asynchronous, and does not burn more threads than necessary!<\/font><\/font><\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">To recap, in this post, we considered a typical I\/O scenario.<span>&nbsp; <\/span>First, we saw how easy it was to arrive at solutions that are better than the sequential one.<span>&nbsp; <\/span>Then, we delved deeper to discover a more complex solution (integrating EAP with Tasks) that offers even more benefits.<\/font><\/font><\/p>\n<p><a href=\"https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Components.PostAttachments\/00\/09\/85\/74\/77\/PFX-and-IO.zip\">PFX-and-IO.zip<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we\u2019ll investigate some ways that Parallel Extensions can be used to introduce parallelism and asynchrony to I\/O scenarios. Here\u2019s a simple scenario. &nbsp;I want to retrieve data from a number of web resources. static string[] Resources = new string[] { &nbsp;&nbsp;&nbsp; &#8220;http:\/\/www.microsoft.com&#8221;, &#8220;http:\/\/www.msdn.com&#8221;, &nbsp;&nbsp;&nbsp; &#8220;http:\/\/www.msn.com&#8221;, &#8220;http:\/\/www.bing.com&#8221; }; &nbsp; Using the WebClient class, [&hellip;]<\/p>\n","protected":false},"author":485,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[7908],"tags":[7907,7910,7912],"class_list":["post-55980","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-pfxteam","tag-net-4","tag-plinq","tag-task-parallel-library"],"acf":[],"blog_post_summary":"<p>In this post, we\u2019ll investigate some ways that Parallel Extensions can be used to introduce parallelism and asynchrony to I\/O scenarios. Here\u2019s a simple scenario. &nbsp;I want to retrieve data from a number of web resources. static string[] Resources = new string[] { &nbsp;&nbsp;&nbsp; &#8220;http:\/\/www.microsoft.com&#8221;, &#8220;http:\/\/www.msdn.com&#8221;, &nbsp;&nbsp;&nbsp; &#8220;http:\/\/www.msn.com&#8221;, &#8220;http:\/\/www.bing.com&#8221; }; &nbsp; Using the WebClient class, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/55980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/485"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=55980"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/55980\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=55980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=55980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=55980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}