{"id":18285,"date":"2018-07-09T09:02:10","date_gmt":"2018-07-09T16:02:10","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/dotnet\/?p=18285"},"modified":"2021-09-29T16:24:44","modified_gmt":"2021-09-29T23:24:44","slug":"system-io-pipelines-high-performance-io-in-net","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/system-io-pipelines-high-performance-io-in-net\/","title":{"rendered":"System.IO.Pipelines: High performance IO in .NET"},"content":{"rendered":"<p><a href=\"https:\/\/www.nuget.org\/packages\/System.IO.Pipelines\/\" rel=\"nofollow\">System.IO.Pipelines<\/a> is a new library that is designed to make it easier to do high performance IO in .NET. It&#8217;s a library targeting .NET Standard that works on all .NET implementations.<\/p>\n<p>Pipelines was born from the work the .NET Core team did to make Kestrel one of the <a href=\"https:\/\/www.techempower.com\/benchmarks\/#section=data-r16&amp;hw=ph&amp;test=plaintext\" rel=\"nofollow\">fastest web servers in the industry<\/a>. What started as an implementation detail inside of Kestrel progressed into a re-usable API that shipped in 2.1 as a first class BCL API (System.IO.Pipelines) available for all .NET developers.<\/p>\n<h2><a id=\"user-content-what-problem-does-it-solve\" class=\"anchor\" href=\"#what-problem-does-it-solve\"><\/a>What problem does it solve?<\/h2>\n<p>Correctly parsing data from a stream or socket is dominated by boilerplate code and has many corner cases, leading to complex code that is difficult to maintain.\nAchieving high performance and being correct, while also dealing with this complexity is difficult. Pipelines aims to solve this complexity.<\/p>\n<h2><a id=\"user-content-what-extra-complexity-exists-today\" class=\"anchor\" href=\"#what-extra-complexity-exists-today\"><\/a>What extra complexity exists today?<\/h2>\n<p>Let&#8217;s start with a simple problem. We want to write a TCP server that receives line-delimited messages (delimited by n) from a client.<\/p>\n<h3><a id=\"user-content-tcp-server-with-networkstream\" class=\"anchor\" href=\"#tcp-server-with-networkstream\"><\/a>TCP Server with NetworkStream<\/h3>\n<p><em>DISCLAIMER: As with all performance sensitive work, each of the scenarios should be measured within the context of your application. The overhead of the various techniques mentioned may not be necessary depending on the scale your networking applications need to handle.<\/em><\/p>\n<p>The typical code you would write in .NET before pipelines looks something like this:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/terrajobst\/ee86ab15d1d7a1d5869d1c1f2443f3b3.js\"><\/script><\/p>\n<p>This code might work when testing locally but it&#8217;s has several errors:<\/p>\n<ul>\n<li>The entire message (end of line) may not have been received in a single call to <code>ReadAsync<\/code>.<\/li>\n<li>It&#8217;s ignoring the result of <code>stream.ReadAsync()<\/code> which returns how much data was actually filled into the buffer.<\/li>\n<li>It doesn&#8217;t handle the case where multiple lines come back in a single <code>ReadAsync<\/code> call.<\/li>\n<\/ul>\n<p>These are some of the common pitfalls when reading streaming data. To account for this we need to make a few changes:<\/p>\n<ul>\n<li>We need to buffer the incoming data until we have found a new line.<\/li>\n<li>We need to parse <em>all<\/em> of the lines returned in the buffer<\/li>\n<\/ul>\n<p><script src=\"https:\/\/gist.github.com\/terrajobst\/8e077db206883ca156dfdb7643969c76.js\"><\/script><\/p>\n<p>Once again, this might work in local testing but it&#8217;s possible that the line is bigger than 1KiB (1024 bytes). We need to resize the input buffer until we have found a new line.<\/p>\n<p>Also, we&#8217;re allocating buffers on the heap as longer lines are processed. We can improve this by using the <code>ArrayPool&lt;byte&gt;<\/code> to avoid repeated buffer allocations as we parse longer lines from the client.<\/p>\n<p><script src=\"https:\/\/gist.github.com\/terrajobst\/568dad7aa8e831cf4fcb48ca370ca251.js\"><\/script><\/p>\n<p>This code works but now we&#8217;re re-sizing the buffer which results in more buffer copies. It also uses more memory as the logic doesn&#8217;t shrink the buffer after lines are processed. To avoid this, we can store a list of buffers instead of resizing each time we cross the 1KiB buffer size.<\/p>\n<p>Also, we don&#8217;t grow the the 1KiB buffer until it&#8217;s completely empty. This means we can end up passing smaller and smaller buffers to <code>ReadAsync<\/code> which will result in more calls into the operating system.<\/p>\n<p>To mitigate this, we&#8217;ll allocate a new buffer when there&#8217;s less than 512 bytes remaining in the existing buffer:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/terrajobst\/aed8731297b8e8268ae6a37ebfc33146.js\"><\/script><\/p>\n<p>This code just got <em>much<\/em> more complicated. We&#8217;re keeping track of the filled up buffers as we&#8217;re looking for the delimiter. To do this, we&#8217;re using a <code>List&lt;BufferSegment&gt;<\/code> here to represent the buffered data while looking for the new line delimiter. As a result, <code>ProcessLine<\/code> and <code>IndexOf<\/code> now accept a <code>List&lt;BufferSegment&gt;<\/code> instead of a <code>byte[]<\/code>, <code>offset<\/code> and <code>count<\/code>. Our parsing logic needs to now handle one or more buffer segments.<\/p>\n<p>Our server now handles partial messages, and it uses pooled memory to reduce overall memory consumption but there are still a couple more changes we need to make:<\/p>\n<ol>\n<li>The <code>byte[]<\/code> we&#8217;re using from the <code>ArrayPool&lt;byte&gt;<\/code> are just regular managed arrays. This means whenever we do a <code>ReadAsync<\/code> or <code>WriteAsync<\/code>, those buffers get pinned for the lifetime of the asynchronous operation (in order to interop with the native IO APIs on the operating system). This has performance implications on the garbage collector since pinned memory cannot be moved which can lead to heap fragmentation. Depending on how long the async operations are pending, the pool implementation may need to change.<\/li>\n<li>The throughput can be optimized by decoupling the reading and processing logic. This creates a batching effect that lets the parsing logic consume larger chunks of buffers, instead of reading more data only after parsing a single line. This introduces some additional complexity:\n<ul>\n<li>We need two loops that run independently of each other. One that reads from the <code>Socket<\/code> and one that parses the buffers.<\/li>\n<li>We need a way to signal the parsing logic when data becomes available.<\/li>\n<li>We need to decide what happens if the loop reading from the <code>Socket<\/code> is &#8220;too fast&#8221;. We need a way to throttle the reading loop if the parsing logic can&#8217;t keep up. This is commonly referred to as &#8220;flow control&#8221; or &#8220;back pressure&#8221;.<\/li>\n<li>We need to make sure things are thread safe. We&#8217;re now sharing a set of buffers between the reading loop and the parsing loop and those run independently on different threads.<\/li>\n<li>The memory management logic is now spread across two different pieces of code, the code that rents from the buffer pool is reading from the socket and the code that returns from the buffer pool is the parsing logic.<\/li>\n<li>We need to be extremely careful with how we return buffers after the parsing logic is done with them. If we&#8217;re not careful, it&#8217;s possible that we return a buffer that&#8217;s still being written to by the <code>Socket<\/code> reading logic.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>The complexity has gone through the roof (and we haven&#8217;t even covered all of the cases). High performance networking usually means writing very complex code in order to eke out more performance from the system.<\/p>\n<p><em>The goal of <code>System.IO.Pipelines<\/code> is to make writing this type of code easier.<\/em><\/p>\n<h3><a id=\"user-content-tcp-server-with-systemiopipelines\" class=\"anchor\" href=\"#tcp-server-with-systemiopipelines\"><\/a>TCP server with System.IO.Pipelines<\/h3>\n<p>Let&#8217;s take a look at what this example looks like with <code>System.IO.Pipelines<\/code>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/terrajobst\/7e04b424ab279e711eece8f6b1c233d8.js\"><\/script><\/p>\n<p>The pipelines version of our line reader has 2 loops:<\/p>\n<ul>\n<li><code>FillPipeAsync<\/code> reads from the <code>Socket<\/code> and writes into the <code>PipeWriter<\/code>.<\/li>\n<li><code>ReadPipeAsync<\/code> reads from the <code>PipeReader<\/code> and parses incoming lines.<\/li>\n<\/ul>\n<p>Unlike the original examples, there are no explicit buffers allocated anywhere. This is one of pipelines&#8217; core features. All buffer management is delegated to the <code>PipeReader<\/code>\/<code>PipeWriter<\/code> implementations.<\/p>\n<p><strong>This makes it easier for consuming code to focus solely on the business logic instead of complex buffer management.<\/strong><\/p>\n<p>In the first loop, we first call <code>PipeWriter.GetMemory(int)<\/code> to get some memory from the underlying writer; then we call <code>PipeWriter.Advance(int)<\/code> to tell the <code>PipeWriter<\/code> how much data we actually wrote to the buffer. We then call <code>PipeWriter.FlushAsync()<\/code> to make the data available to the <code>PipeReader<\/code>.<\/p>\n<p>In the second loop, we&#8217;re consuming the buffers written by the <code>PipeWriter<\/code> which ultimately comes from the <code>Socket<\/code>. When the call to <code>PipeReader.ReadAsync()<\/code> returns, we get a <code>ReadResult<\/code> which contains 2 important pieces of information, the data that was read in the form of <code>ReadOnlySequence&lt;byte&gt;<\/code> and a bool <code>IsCompleted<\/code> that lets the reader know if the writer is done writing (EOF). After finding the end of line (EOL) delimiter and parsing the line, we slice the buffer to skip what we&#8217;ve already processed and then we call <code>PipeReader.AdvanceTo<\/code> to tell the <code>PipeReader<\/code> how much data we have consumed.<\/p>\n<p>At the end of each of the loops, we complete both the reader and the writer. This lets the underlying <code>Pipe<\/code> release all of the memory it allocated.<\/p>\n<h2><a id=\"user-content-systemiopipelines\" class=\"anchor\" href=\"#systemiopipelines\"><\/a>System.IO.Pipelines<\/h2>\n<h3><a id=\"user-content-partial-reads\" class=\"anchor\" href=\"#partial-reads\"><\/a>Partial Reads<\/h3>\n<p>Besides handling the memory management, the other core pipelines feature is the ability to peek at data in the <code>Pipe<\/code> without actually consuming it.<\/p>\n<p><code>PipeReader<\/code> has two core APIs <code>ReadAsync<\/code> and <code>AdvanceTo<\/code>. <code>ReadAsync<\/code> gets the data in the <code>Pipe<\/code>, <code>AdvanceTo<\/code> tells the <code>PipeReader<\/code> that these buffers are no longer required by the reader so they can be discarded (for example returned to the underlying buffer pool).<\/p>\n<p>Here&#8217;s an example of an http parser that reads partial data buffers data in the <code>Pipe<\/code> until a valid start line is received.<\/p>\n<p><a href=\"https:\/\/user-images.githubusercontent.com\/95136\/42349904-1a6e3484-8063-11e8-8ac2-7f8e636b4a23.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" style=\"max-width: 100%;\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2018\/07\/42349904-1a6e3484-8063-11e8-8ac2-7f8e636b4a23.png\" alt=\"image\" \/><\/a><\/p>\n<h3><a id=\"user-content-readonlysequencet\" class=\"anchor\" href=\"#readonlysequencet\"><\/a>ReadOnlySequence&lt;T&gt;<\/h3>\n<p>The <code>Pipe<\/code> implementation stores a linked list of buffers that get passed between the <code>PipeWriter<\/code> and <code>PipeReader<\/code>. <code>PipeReader.ReadAsync<\/code> exposes a <code>ReadOnlySequence&lt;T&gt;<\/code> which is a new BCL type that represents a view over one or more segments of <code>ReadOnlyMemory&lt;T&gt;<\/code>, similar to <code>Span&lt;T&gt;<\/code> and <code>Memory&lt;T&gt;<\/code> which provide a view over arrays and strings.<\/p>\n<p><a href=\"https:\/\/user-images.githubusercontent.com\/95136\/42292592-74a4028e-7f88-11e8-85f7-a6b2f925769d.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" style=\"max-width: 100%;\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2018\/07\/42292592-74a4028e-7f88-11e8-85f7-a6b2f925769d.png\" alt=\"image\" \/><\/a><\/p>\n<p>The <code>Pipe<\/code> internally maintains pointers to where the reader and writer are in the overall set of allocated data and updates them as data is written or read. The <code>SequencePosition<\/code> represents a single point in the linked list of buffers and can be used to efficiently slice the <code>ReadOnlySequence&lt;T&gt;<\/code>.<\/p>\n<p>Since the <code>ReadOnlySequence&lt;T&gt;<\/code> can support one or more segments, it&#8217;s typical for high performance processing logic to split fast and slow paths based on single or multiple segments.<\/p>\n<p>For example, here&#8217;s a routine that converts an ASCII <code>ReadOnlySequence&lt;byte&gt;<\/code> into a <code>string<\/code>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/terrajobst\/6e1bea5bec4591edd7c5fe5416ce7f56.js\"><\/script><\/p>\n<h3><a id=\"user-content-back-pressure-and-flow-control\" class=\"anchor\" href=\"#back-pressure-and-flow-control\"><\/a>Back pressure and flow control<\/h3>\n<p>In a perfect world, reading &amp; parsing work as a team: the reading thread consumes the data from the network and puts it in buffers while the parsing thread is responsible for constructing the appropriate data structures. Normally, parsing will take more time than just copying blocks of data from the network. As a result, the reading thread can easily overwhelm the parsing thread. The result is that the reading thread will have to either slow down or allocate more memory to store the data for the parsing thread. For optimal performance, there is a balance between frequent pauses and allocating more memory.<\/p>\n<p>To solve this problem, the pipe has two settings to control the flow of data, the <code>PauseWriterThreshold<\/code> and the <code>ResumeWriterThreshold<\/code>. The <code>PauseWriterThreshold<\/code> determines how much data should be buffered before calls to <code>PipeWriter.FlushAsync<\/code> pauses. The <code>ResumeWriterThreshold<\/code> controls how much the reader has to consume before writing can resume.<\/p>\n<p><a href=\"https:\/\/user-images.githubusercontent.com\/95136\/42291183-0114a0f2-7f7f-11e8-983f-5332b7585a09.png\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" style=\"max-width: 100%;\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2018\/07\/42291183-0114a0f2-7f7f-11e8-983f-5332b7585a09.png\" alt=\"image\" \/><\/a><\/p>\n<p><code>PipeWriter.FlushAsync<\/code> &#8220;blocks&#8221; when the amount of data in the <code>Pipe<\/code> crosses <code>PauseWriterThreshold<\/code> and &#8220;unblocks&#8221; when it becomes lower than <code>ResumeWriterThreshold<\/code>. Two values are used to prevent thrashing around the limit.<\/p>\n<h3><a id=\"user-content-scheduling-io\" class=\"anchor\" href=\"#scheduling-io\"><\/a>Scheduling IO<\/h3>\n<p>Usually when using async\/await, continuations are called on either on thread pool threads or on the current <code>SynchronizationContext<\/code>.<\/p>\n<p>When doing IO it&#8217;s very important to have fine-grained control over where that IO is performed so that one can take advantage of CPU caches more effectively, which is critical for high-performance applications like web servers. Pipelines exposes a <code>PipeScheduler<\/code> that determines where asynchronous callbacks run. This gives the caller fine-grained control over exactly what threads are used for IO.<\/p>\n<p>An example of this in practice is in the Kestrel Libuv transport where IO callbacks run on dedicated event loop threads.<\/p>\n<h3><a id=\"user-content-other-benefits-of-the-pipereader-pattern\" class=\"anchor\" href=\"#other-benefits-of-the-pipereader-pattern\"><\/a>Other benefits of the PipeReader pattern:<\/h3>\n<ul>\n<li>Some underlying systems support a &#8220;bufferless wait&#8221;, that is, a buffer never needs to be allocated until there&#8217;s actually data available in the underlying system. For example on Linux with epoll, it&#8217;s possible to wait until data is ready before actually supplying a buffer to do the read. This avoids the problem where having a large number of threads waiting for data doesn&#8217;t immediately require reserving a huge amount of memory.<\/li>\n<li>The default <code>Pipe<\/code> makes it easy to write unit tests against networking code because the parsing logic is separated from the networking code so unit tests only run the parsing logic against in-memory buffers rather than consuming directly from the network. It also makes it easy to test those hard to test patterns where partial data is sent. ASP.NET Core uses this to test various aspects of the Kestrel&#8217;s http parser.<\/li>\n<li>Systems that allow exposing the underlying OS buffers (like the Registered IO APIs on Windows) to user code are a natural fit for pipelines since buffers are always provided by the <code>PipeReader<\/code> implementation.<\/li>\n<\/ul>\n<h3><a id=\"user-content-other-related-types\" class=\"anchor\" href=\"#other-related-types\"><\/a>Other Related types<\/h3>\n<p>As part of making System.IO.Pipelines, we also added a number of new primitive BCL types:<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.buffers.memorypool-1?view=netcore-2.1\" rel=\"nofollow\">MemoryPool&lt;T&gt;<\/a>, <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.buffers.imemoryowner-1?view=netcore-2.1\" rel=\"nofollow\">IMemoryOwner&lt;T&gt;<\/a>, <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.buffers.memorymanager-1?view=netcore-2.1\" rel=\"nofollow\">MemoryManager&lt;T&gt;<\/a> &#8211; .NET Core 1.0 added <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.buffers.arraypool-1?view=netcore-2.1\" rel=\"nofollow\">ArrayPool&lt;T&gt;<\/a> and in .NET Core 2.1 we now have a more general abstraction for a pool that works over any <code>Memory&lt;T&gt;<\/code>. This provides an extensibility point that lets you plug in more advanced allocation strategies as well as control how buffers are managed (for e.g. provide pre-pinned buffers instead of purely managed arrays).<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.buffers.ibufferwriter-1?view=netcore-2.1\" rel=\"nofollow\">IBufferWriter&lt;T&gt;<\/a> &#8211; Represents a sink for writing synchronous buffered data. (<code>PipeWriter<\/code> implements this)<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.threading.tasks.sources.ivaluetasksource-1?view=netcore-2.1\" rel=\"nofollow\">IValueTaskSource<\/a> &#8211; <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.threading.tasks.valuetask-1?view=netcore-2.1\" rel=\"nofollow\">ValueTask&lt;T&gt;<\/a> has existed since .NET Core 1.1 but has gained some super powers in .NET Core 2.1 to allow allocation-free awaitable async operations. See <a href=\"https:\/\/github.com\/dotnet\/corefx\/issues\/27445\">https:\/\/github.com\/dotnet\/corefx\/issues\/27445<\/a> for more details.<\/li>\n<\/ul>\n<h2><a id=\"user-content-how-do-i-use-pipelines\" class=\"anchor\" href=\"#how-do-i-use-pipelines\"><\/a>How do I use Pipelines?<\/h2>\n<p>The APIs exist in the <a href=\"https:\/\/www.nuget.org\/packages\/System.IO.Pipelines\/\" rel=\"nofollow\">System.IO.Pipelines<\/a> nuget package.<\/p>\n<p>Here&#8217;s an example of a .NET Core 2.1 server application that uses pipelines to handle line based messages (our example above) <a href=\"https:\/\/github.com\/davidfowl\/TcpEcho\">https:\/\/github.com\/davidfowl\/TcpEcho<\/a>. It should run with <code>dotnet run<\/code> (or by running it in Visual Studio). It listens to a socket on port 8087 and writes out received messages to the console. You can use a client like netcat or putty to make a connection to 8087 and send line based messages to see it working.<\/p>\n<p>Today Pipelines powers Kestrel and SignalR, and we hope to see it at the center of many networking libraries and components from the .NET community.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>System.IO.Pipelines is a new library that is designed to make it easier to do high performance IO in .NET. It&#8217;s a library targeting .NET Standard that works on all .NET implementations. Pipelines was born from the work the .NET Core team did to make Kestrel one of the fastest web servers in the industry. What [&hellip;]<\/p>\n","protected":false},"author":1489,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685],"tags":[],"class_list":["post-18285","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet"],"acf":[],"blog_post_summary":"<p>System.IO.Pipelines is a new library that is designed to make it easier to do high performance IO in .NET. It&#8217;s a library targeting .NET Standard that works on all .NET implementations. Pipelines was born from the work the .NET Core team did to make Kestrel one of the fastest web servers in the industry. What [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/18285","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/1489"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=18285"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/18285\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=18285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=18285"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=18285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}