gRPC is a modern open source remote procedure call framework. There are many exciting features in gRPC: real-time streaming, end-to-end code generation, and great cross-platform support to name a few. The most exciting to me, and consistently mentioned by developers who are interested in gRPC, is performance.
Last year Microsoft contributed a new implementation of gRPC for .NET to the CNCF. Built on top of Kestrel and HttpClient, gRPC for .NET makes gRPC a first-class member of the .NET ecosystem.
In our first gRPC for .NET release, we focused on gRPC’s core features, compatibility, and stability. In .NET 5, we made gRPC really fast.
gRPC and .NET 5 are fast
In a community run benchmark of different gRPC server implementations, .NET gets the highest requests per second after Rust, and is just ahead of C++ and Go.
This result builds on top of the work done in .NET 5. Our benchmarks show .NET 5 server performance is 60% faster than .NET Core 3.1. .NET 5 client performance is 230% faster than .NET Core 3.1.
Stephen Toub discusses dotnet/runtime changes in his Performance Improvements in .NET 5 blog post. Check it out to read about improvements in HttpClient and HTTP/2.
In the rest of this blog post I’ll talk about the improvements we made to make gRPC fast in ASP.NET Core.
HTTP/2 allocations in Kestrel
gRPC uses HTTP/2 as its underlying protocol. A fast HTTP/2 implementation is the most important factor when it comes to performance. Our gRPC server builds on top of Kestrel, a HTTP server written in C# that is designed with performance in mind. Kestrel is a top contender in the TechEmpower benchmarks, and gRPC benefits from a lot of the performance improvements in Kestrel automatically. However, there are many HTTP/2 specific optimizations that were made in .NET 5.
Reducing allocations is a good place to start. Fewer allocations per HTTP/2 request means less time doing garbage collection (GC). And CPU time “wasted” in GC is CPU time not spent serving HTTP/2 requests.
The performance profiler above is measuring allocations over 100,000 gRPC requests. The live object graph’s sawtooth shaped pattern indicates memory building up, then being garbage collected. About 3.9KB is being allocated per request. Lets try to get that number down!
dotnet/aspnetcore#18601 adds pooling of streams in a HTTP/2 connection. This one change almost cuts allocations per request in half. It enables reuse of internal types like Http2Stream
, and publicly accessible types like HttpContext
and HttpRequest
, across multiple requests.
Once streams are pooled a range of optimizations become available:
- dotnet/aspnetcore#19356 reuses input and output
Pipe
instances.Pipe
is the single biggest contributor to allocations. - dotnet/aspnetcore#19431 reuses known header string values. Related to header reuse, dotnet/aspnetcore#19457 adds HTTP/2 pseudo headers as known headers.
String
allocations use the third most bytes. - dotnet/aspnetcore#19695 and dotnet/aspnetcore#19629 reuses some smaller per-request objects.
- While pooling is great when a server is under load, we want to free up memory that is no longer used. dotnet/aspnetcore#24767 removes streams from the pool if they haven’t been used by a HTTP request in the last 5 seconds.
There are many smaller allocation savings. dotnet/aspnetcore#19783 removes allocations in Kestrel’s HTTP/2 flow control. A resettable ManualResetValueTaskSourceCore<T>
type replaces allocating a new object each time flow control is triggered. dotnet/aspnetcore#19273 replaces an array allocation with stackalloc
when validating the HTTP request path. dotnet/aspnetcore#19277 and dotnet/aspnetcore#19325 eliminate some unintended allocations related to logging. dotnet/aspnetcore#22557 avoids allocating a Task<T>
if a task is already complete. And finally dotnet/aspnetcore#19732 saves a string allocation by special casing content-length
of 0. Because every allocation matters.
Per-request memory in .NET 5 is now just 330 B, a decrease of 92%. The sawtooth pattern has also disappeared. Reduced allocations means garbage collection didn’t run at all while the server processed 100,000 gRPC calls.
Reading HTTP headers in Kestrel
A hotpath in HTTP/2 is reading and writing HTTP headers. A HTTP/2 connection supports concurrent requests over a TCP socket, a feature called multiplexing. Multiplexing allows HTTP/2 to make efficient use of connections, but only the headers for one request on a connection can be processed at a time. HTTP/2’s HPack header compression is stateful and depends on order. Processing HTTP/2 headers is a bottleneck so has to be as fast as possible.
dotnet/aspnetcore#23083 optimizes the performance of HPackDecoder
. The decoder is a state machine that reads incoming HTTP/2 HEADER
frames. The approach here is good, the state machine allows Kestrel to decode frames as they arrive, but the decoder was checking state after parsing each byte. Another problem is literal values, the header names and values, were copied multiple times. Optimizations in this PR include:
- Tighten parsing loops. For example, if we’ve just parsed a header name then the value must come afterwards. There is no need to check the state machine to figure out the next state.
- Skip literal parsing all together. Literals in HPack have a length prefix. If we know the next 100 bytes are a literal then there is no need to inspect each byte. Mark the literal’s location and resuming parsing at its end.
- Avoid copying literal bytes. Previously literal bytes were always copied to an intermediary array before passed to Kestrel. Most of the time this isn’t necessary and instead we can just slice the original buffer and pass a
ReadOnlySpan<byte>
to Kestrel.
Together these changes significantly decrease the time it takes to parse headers. Header size is almost no longer a factor. The decoder marks the start and end position of a value and then slices that range.
private HPackDecoder _decoder = CreateDecoder();
private byte[] _smallHeader = new byte[] { /* HPack bytes */ };
private byte[] _largeHeader = new byte[] { /* HPack bytes */ };
private IHttpHeadersHandler _noOpHandler = new NoOpHeadersHandler();
[Benchmark]
public void SmallDecode() =>
_decoder.Decode(_smallHeader, endHeaders: true, handler: _noOpHandler);
[Benchmark]
public void LargeDecode() =>
_decoder.Decode(_largeHeader, endHeaders: true, handler: _noOpHandler);
Method | Runtime | Mean | Ratio | Allocated |
---|---|---|---|---|
SmallDecode | .NET Core 3.1 | 111.20 ns | 1.00 | 0 B |
SmallDecode | .NET 5.0 | 71.90 ns | 0.65 | 0 B |
LargeDecode | .NET Core 3.1 | 49,083.00 ns | 1.00 | 0 B |
LargeDecode | .NET 5.0 | 98.68 ns | 0.002 | 0 B |
Once headers have been decoded, Kestrel needs to validate and process them. For example, special HTTP/2 headers like :path
and :method
need to be set onto HttpRequest.Path
and HttpRequest.Method
, and other headers need to be converted to strings and added to the HttpRequest.Headers
collection.
Kestrel has the concept of known request headers. Known headers are a selection of commonly occuring request headers that have been optimized for fast setting and getting. dotnet/aspnetcore#24730 adds an even faster path for setting HPack static table headers to the known headers. The HPack static table gives 61 common header names and values a number ID that can be sent instead of the full name. A header with a static table ID can use the optimized path to bypass some validation and quickly be set in the collection based on its ID. dotnet/aspnetcore#24945 adds extra optimization for static table IDs with a name and value.
Adding HPack response compression
Prior to .NET 5, Kestrel supported reading HPack compressed headers in requests, but it didn’t compress response headers. The obvious advantage of response header compression is less network usage, but there are performance benefits as well. It’s faster to write a couple of bits for a compressed header than it is to encode and write the header’s full name and value as bytes.
dotnet/aspnetcore#19521 adds initial HPack static compression. Static compression is pretty simple: if the header is in the HPack static table then write the ID to identify the header instead of the longer name.
Dynamic HPack header compression is more complicated, but also provides bigger gains. Response header names and values are tracked in a dynamic table and are each assigned an ID. As a response’s headers are written, the server checks to see if the header name and value are in the table. If there is a match then the ID is written. If there isn’t then the full header is written, and it is added to the table for the next response. There is a maximum size of the dynamic table, so adding a header to it may evict other headers with a first in, first out order.
dotnet/aspnetcore#20058 adds dynamic HPack header compression. To quickly search for headers the dynamic table groups header entries using a basic hash table. To track order and evict the oldest headers, entries maintain a linked list. To avoid allocations, removed entries are pooled and reused.
Using Wireshark, we can see the impact of header compression on response size for this example gRPC call. .NET Core 3.x writes 77 B, while .NET 5 is only 12 B.
Protobuf message serialization
gRPC for .NET uses the Google.Protobuf package as the default serializer for messages. Protobuf is an efficient binary serialization format. Google.Protobuf is designed for performance, using code generation instead of reflection to serialize .NET objects. There are some modern .NET APIs and features that can be added to it to reduce allocations and improve efficiency.
The biggest improvement to Google.Protobuf is support for modern .NET IO types: Span<T>
, ReadOnlySequence<T>
and IBufferWriter<T>
. These types allow gRPC messages to be serialized directly using buffers exposed by Kestrel. This saves Google.Protobuf allocating an intermediary array when serializing and deserializing Protobuf content.
Support for Protobuf buffer serialization was a multi-year effort between Microsoft and Google engineers. Changes were spread across multiple repositories.
protocolbuffers/protobuf#7351 and protocolbuffers/protobuf#7576 add support for buffer serialization to Google.Protobuf. This is by far the biggest and most complicated change. Three attempts were made to add this feature before the right balance between performance, backwards compatibility and code reuse was found. Protobuf reading and writing uses many performance oriented features and APIs added to C# and .NET Core:
Span<T>
and C#ref struct
types enables fast and safe access to memory.Span<T>
represents a contiguous region of arbitrary memory. Using span lets us serialize to managed .NET arrays, stack allocated arrays, or unmanaged memory, without using pointers.Span<T>
and .NET protects us against buffer overflow.stackalloc
is used to create stack-based arrays.stackalloc
is a useful tool to avoid allocations when a small buffer is required.- Low-level methods such as
MemoryMarshal.GetReference()
,Unsafe.ReadUnaligned()
andUnsafe.WriteUnaligned()
convert directly between primitive types and bytes. BinaryPrimitives
has helper methods for efficiently converting between .NET primitive types and bytes. For example,BinaryPrimitives.ReadUInt64LittleEndian
reads little endian bytes and returns an unsigned 64 bit number. Methods provided byBinaryPrimitive
are heavily optimized and use vectorization.
A great thing about modern C# and .NET is it is possible to write fast, efficient, low-level libraries without sacrificing memory safety. When it comes to performance, .NET lets you have your cake and eat it too!
private TestMessage _testMessage = CreateMessage();
private ReadOnlySequence<byte> _testData = CreateData();
private IBufferWriter<byte> _bufferWriter = CreateWriter();
[Benchmark]
public IMessage ToByteArray() =>
_testMessage.ToByteArray();
[Benchmark]
public IMessage ToBufferWriter() =>
_testMessage.WriteTo(_bufferWriter);
[Benchmark]
public IMessage FromByteArray() =>
TestMessage.Parser.ParseFrom(CreateBytes());
[Benchmark]
public IMessage FromSequence() =>
TestMessage.Parser.ParseFrom(_testData);
Method | Runtime | Mean | Ratio | Allocated |
---|---|---|---|---|
ToByteArray | .NET 5.0 | 1,133.82 ns | 1.00 | 184 B |
ToBufferWriter | .NET 5.0 | 589.05 ns | 0.51 | 64 B |
FromByteArray | .NET 5.0 | 409.88 ns | 1.00 | 1960 B |
FromSequence | .NET 5.0 | 381.03 ns | 0.92 | 1776 B |
Adding support for buffer serialization to Google.Protobuf is just the first step. More work is required for gRPC for .NET to take advantage of the new capability:
- grpc/grpc#18865 and grpc/grpc#19792 adds
ReadOnlySequence<byte>
andIBufferWriter<byte>
APIs to the gRPC serialization abstraction layer in Grpc.Core.Api. - grpc/grpc#23485 updates gRPC code generation to glue the changes in Google.Protobuf to Grpc.Core.Api.
- grpc/grpc-dotnet#376 and grpc/grpc-dotnet#629 updates gRPC for .NET to use the new serialization abstractions in Grpc.Core.Api. This code is the integration between Kestrel and gRPC. Because Kestrel’s IO is built on top of System.IO.Pipelines, we can use its buffers during serialization.
The end result is gRPC for .NET serializes Protobuf messages directly to Kestrel’s request and response buffers. Intermediary array allocations and byte copies have been eliminated from gRPC message serialization.
Wrapping Up
Performance is a feature of .NET and gRPC, and as cloud apps scale it is more important than ever. I think all developers can agree it is fun to make fast apps, but performance has real world impact. Lower latency and higher throughput means fewer servers. It is an opportunity to save money, reduce power use and build greener apps.
As is obvious from this tour, a lot of changes have gone into gRPC, Protobuf and .NET aimed at improving performance. Our benchmarks show a 60% improvement in gRPC server RPS and a 230% improvement in gRPC client RPS.
.NET 5 RC2 is available now, and the official .NET 5 release is in November. To try out the performance improvements and to get started using gRPC with .NET, the best place to start is the Create a gRPC client and server in ASP.NET Core tutorial.
We look forward to hearing about apps built with gRPC and .NET, and to your future contributions in the dotnet and grpc repos!
What about data annotations for form validation with resx translations?
Very easy to use in the Blazor EditForm.
I have a hard time finding a solution with gRPC, or do I have to map the gRPC object back to another class with annotations.
Dude, why have you removed my comment about poor performance of regex-redux (Slower than python)
What’s the roadmap to HTTP/3?
HTTP/3 is experimental in .NET 5. A complete implementation is planned for .NET 6.
Hi James;
Now that .Net 5 is out, I'm interested in using gRPC in my next .Net 5 project.
I'd like to ask you two important questions:
1) I need to use SignalR for a live chat/messaging in my app. How does gRPC come into picture of a SignalR app? Do they complement each other or one vs. another? How can I use/combine these two in one app?
2) For my client I'll be using Vue.js, How can...
1) Both gRPC and SignalR support streaming. The key difference between them is SignalR supports broadcasting messages out to every client connected to a hub. gRPC is point-to-point.
2) Many examples, including using gRPC from the browser, are here: https://github.com/grpc/grpc-dotnet/tree/master/examples
Awesome news, thanks, but until it’s not supported by Azure . . . 🙁
Deploy .NET via kubernetes with mcr.microsoft.com/dotnet/aspnet:5.0
Hello gRpc performance are better than SignalR in net 5 ?
I work on mmoserver i use signalr actually.
I’m thinking about gRPC as an alternative to JSON in Blazor application. Unfortunately c# decimal data type is not supported in gRPC. This is probably one of the most important data types used in any business / financial application. Can you add support for decimal in gRPC? I’m not the only one asking: https://github.com/protocolbuffers/protobuf/issues/4406
No built-in decimal type is a shortcoming of Protobuf. I’ve been involved with an effort to add a decimal known-type to Protobuf but it is still work in progress – https://github.com/protocolbuffers/protobuf/pull/7847
Until something is added, I recommend you check out our documentation on creating Protobuf messages. There is a section on handling decimal values – https://docs.microsoft.com/aspnet/core/grpc/protobuf#decimals
Thank you James for the links.
decimal is nothing more than 128 bits. You can already represent that in protobuf with either a byte array or 4 int fields, which is exactly how System.Decimal stores its value. If you’re only working with .NET this should be good enough. You can add extension methods to the generated class to make it more seamless.
I don’t think it will ever be natively supported by protobuf, so better count on yourself 🙂
Thank you Jonathan. Probably I will use the solution described in Microsoft’s documentation (see James’ reply below). In case I will ever need to use non-.NET environment it will probably be easier to convert from/to integer values than from binary decimal representation.
Great work and great article, thank you very much.
Is there now a time horizon from when Azure AppService fully supports gRPC?
greetings from Germany,
mMilk
Progress is being made – IIS and Http.sys are now supported – but there is no date to give you on AppService.
The best place to stay updated is the GitHub issue – https://github.com/dotnet/aspnetcore/issues/9020
Thanks for the reply, i will watch and wait 🙂