{"id":40856,"date":"2022-07-13T09:50:00","date_gmt":"2022-07-13T16:50:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=40856"},"modified":"2024-12-13T15:11:46","modified_gmt":"2024-12-13T23:11:46","slug":"announcing-rate-limiting-for-dotnet","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/announcing-rate-limiting-for-dotnet\/","title":{"rendered":"Announcing Rate Limiting for .NET"},"content":{"rendered":"<p>We&#8217;re excited to announce built-in Rate Limiting support as part of .NET 7. Rate limiting provides a way to protect a resource in order to avoid overwhelming your app and keep traffic at a safe level.<\/p>\n<h2>What is rate limiting?<\/h2>\n<p>Rate limiting is the concept of limiting how much a resource can be accessed. For example, you know that a database your application accesses can handle 1000 requests per minute safely, but are not confident that it can handle much more than that. You can put a rate limiter in your application that allows 1000 requests every minute and rejects any more requests before they can access the database. Thus, rate limiting your database and allowing your application to handle a safe number of requests without potentially having bad failures from your database.<\/p>\n<p>There are multiple different rate limiting algorithms to control the flow of requests. We&#8217;ll go over 4 of them that will be provided in .NET 7.<\/p>\n<h3>Concurrency limit<\/h3>\n<p>Concurrency limiter limits how many concurrent requests can access a resource. If your limit is 10, then 10 requests can access a resource at once and the 11th request will not be allowed. Once a request completes, the number of allowed requests increases to 1, when a second request completes, the number increases to 2, etc. This is done by disposing a <a href=\"#ratelimiter-apis\">RateLimitLease<\/a> which we&#8217;ll talk about later.<\/p>\n<h3>Token bucket limit<\/h3>\n<p>Token bucket is an algorithm that derives its name from describing how it works. Imagine there is a bucket filled to the brim with tokens. When a request comes in, it takes a token and keeps it forever. After some consistent period of time, someone adds a pre-determined number of tokens back to the bucket, never adding more than the bucket can hold. If the bucket is empty, when a request comes in, the request is denied access to the resource.<\/p>\n<p>To give a more concrete example, let&#8217;s say the bucket can hold 10 tokens and every minute 2 tokens are added to the bucket. When a request comes in it takes a token so we&#8217;re left with 9, 3 more requests come in and each take a token leaving us with 6 tokens, after a minute has passed we get 2 new tokens which puts us at 8. 8 requests come in and take the remaining tokens leaving us with 0. If another request comes in it is not allowed to access the resource until we gain more tokens, which happens every minute. After 5 minutes of no requests the bucket will have all 10 tokens again and won&#8217;t add any more in the subsequent minutes unless requests take more tokens.<\/p>\n<h3>Fixed window limit<\/h3>\n<p>The fixed window algorithm uses the concept of a window which will be used in the next algorithm as well. The window is an amount of time that our limit is applied before we move on to the next window. In the fixed window case moving to the next window means resetting the limit back to its starting point. Let&#8217;s imagine there is a movie theater with a single room that can seat 100 people, and the movie playing is 2 hours long. When the movie starts we let people start lining up for the next showing which will be in 2 hours, up to 100 people are allowed to line up before we start telling them to come back some other time. Once the 2 hour movie is finished the line of 0 to 100 people can move into the movie theater and we restart the line. This is the same as moving the window in the fixed window algorithm.<\/p>\n<h3>Sliding window limit<\/h3>\n<p>The sliding window algorithm is similar to the fixed window algorithm but with the addition of segments. A segment is part of a window, if we take the previous 2 hour window and split it into 4 segments, we now have 4 30 minute segments. There is also a current segment index which will always point to the newest segment in a window. Requests during a 30 minute period go into the current segment and every 30 minutes the window slides by one segment. If there were any requests during the segment the window slides past, these are now refreshed and our limit increases by that amount. If there weren&#8217;t any requests our limit stays the same.<\/p>\n<p>For example, let&#8217;s use the sliding window algorithm with 3 10 minute segments and a 100 request limit. Our initial state is 3 segments all with 0 counts and our current segment index is pointing to the 3rd segment.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/07\/sliding_part1.png\" alt=\"Sliding window, empty segments and current segment pointer at segment 3, window covering segments 1-3\" \/><\/p>\n<p>During the first 10 minutes we receive 50 requests all of which are tracked in the 3rd segment (our current segment index). Once the 10 minutes have passed we slide the window by 1 segment also moving our current segment index to the 4th segment. Any used requests in the 1st segment are now added back to our limit. Since there were none our limit is at 50 (as 50 are already used in the 3rd segment).<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/07\/sliding_part2.png\" alt=\"Sliding window, 50 requests in segment 3, current segment pointer at segment 4, window moved to cover segments 2-4\" \/><\/p>\n<p>During the next 10 minutes we recieve 20 more requests, so we have 50 in the 3rd segment and 20 in the 4th segment now. Again, we slide the window after 10 minutes passes, so our current segment index is pointing to 5 and we add any requests from segment 2 to our limit.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/07\/sliding_part3.png\" alt=\"Sliding window, 50 and 20 requests in segment 3 and 4, current segment pointer at segment 5, window covering segments 3-5\" \/><\/p>\n<p>10 minutes later we slide the window again, this time when the window slides the current segment index is at 6 and segment 3 (the one with 50 requests) is now outside of the window. So we get the 50 requests back and add them to our limit, which will now be 80, as there are still 20 in use by segment 4.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/07\/sliding_part4.png\" alt=\"Sliding window, 50 requests crossed out in segment 3, current segment pointer at segment 6, window covering segments 4-6\" \/><\/p>\n<h2>RateLimiter APIs<\/h2>\n<p>Introducing the new, in .NET 7, nuget package <a href=\"https:\/\/www.nuget.org\/packages\/System.Threading.RateLimiting\">System.Threading.RateLimiting<\/a>!<\/p>\n<p>This package provides the primitives for writing rate limiters as well as providing a few commonly used algorithms built-in. The main type is the abstract base class <code>RateLimiter<\/code>.<\/p>\n<pre><code class=\"language-csharp\">public abstract class RateLimiter : IAsyncDisposable, IDisposable\r\n{\r\n    public abstract int GetAvailablePermits();\r\n    public abstract TimeSpan? IdleDuration { get; }\r\n\r\n    public RateLimitLease Acquire(int permitCount = 1);\r\n    public ValueTask&lt;RateLimitLease&gt; WaitAsync(int permitCount = 1, CancellationToken cancellationToken = default);\r\n\r\n    public void Dispose();\r\n    public ValueTask DisposeAsync();\r\n}<\/code><\/pre>\n<p><code>RateLimiter<\/code> contains <code>Acquire<\/code> and <code>WaitAsync<\/code> as the core methods for trying to gain permits for a resource that is being protected. Depending on the application the protected resource may need to acquire more than 1 permits, so <code>Acquire<\/code> and <code>WaitAsync<\/code> both accept an optional <code>permitCount<\/code> parameter. <code>Acquire<\/code> is a synchronous method that will check if enough permits are available or not and return a <code>RateLimitLease<\/code> which contains information about whether you successfully acquired the permits or not. <code>WaitAsync<\/code> is similar to <code>Acquire<\/code> except that it can support queuing permit requests which can be de-queued at some point in the future when the permits become available, which is why it&#8217;s asynchronous and accepts an optional <code>CancellationToken<\/code> to allow canceling the queued request.<\/p>\n<p><code>RateLimitLease<\/code> has an <code>IsAcquired<\/code> property which is used to see if the permits were acquired. Additionally, the <code>RateLimitLease<\/code> may contain metadata such as a suggested retry-after period if the lease failed (will show this in a later example). Finally, the <code>RateLimitLease<\/code> is disposable and should be disposed when the code is done using the protected resource. The disposal will let the <code>RateLimiter<\/code> know to update its limits based on how many permits were acquired. Below is an example of using a <code>RateLimiter<\/code> to try to acquire a resource with 1 permit.<\/p>\n<pre><code class=\"language-csharp\">RateLimiter limiter = GetLimiter();\r\nusing RateLimitLease lease = limiter.Acquire(permitCount: 1);\r\nif (lease.IsAcquired)\r\n{\r\n    \/\/ Do action that is protected by limiter\r\n}\r\nelse\r\n{\r\n    \/\/ Error handling or add retry logic\r\n}<\/code><\/pre>\n<p>In the example above we attempt to acquire 1 permit using the synchronous <code>Acquire<\/code> method. We also use <code>using<\/code> to make sure we dispose the lease once we are done with the resource. The lease is then checked to see if the permit we requested was acquired, if it was we can then use the protected resource, otherwise we may want to have some logging or error handling to inform the user or app that the resource wasn&#8217;t used due to hitting a rate limit.<\/p>\n<p>The other method for trying to acquire permits is <code>WaitAsync<\/code>. This method allows queuing permits and waiting for the permits to become available if they aren&#8217;t. Let&#8217;s show another example to explain the queuing concept.<\/p>\n<pre><code class=\"language-csharp\">RateLimiter limiter = new ConcurrencyLimiter(\r\n    new ConcurrencyLimiterOptions(permitLimit: 2, queueProcessingOrder: QueueProcessingOrder.OldestFirst, queueLimit: 2));\r\n\r\n\/\/ thread 1:\r\nusing RateLimitLease lease = limiter.Acquire(permitCount: 2);\r\nif (lease.IsAcquired) { }\r\n\r\n\/\/ thread 2:\r\nusing RateLimitLease lease = await limiter.WaitAsync(permitCount: 2);\r\nif (lease.IsAcquired) { }<\/code><\/pre>\n<p>Here we show our first example of using one of the built-in rate limiting implementations, <code>ConcurrencyLimiter<\/code>. We create the limiter with a maximum permit limit of 2 and a queue limit of 2. This means that a maximum of 2 permits can be acquired at any time and we allow queuing <code>WaitAsync<\/code> calls with up to 2 total permit requests.<\/p>\n<p>The <code>queueProcessingOrder<\/code> parameter determines the order that items in the queue are processed, it can be the value of <code>QueueProcessingOrder.OldestFirst<\/code> (<a href=\"https:\/\/wikipedia.org\/wiki\/FIFO_(computing_and_electronics)\">FIFO<\/a>) or <code>QueueProcessingOrder.NewestFirst<\/code> (<a href=\"https:\/\/wikipedia.org\/wiki\/Stack_(abstract_data_type)\">LIFO<\/a>). One interesting behavior to note is that using <code>QueueProcessingOrder.NewestFirst<\/code> when the queue is full will complete the oldest queued <code>WaitAsync<\/code> calls with a failed <code>RateLimitLease<\/code> until there is space in the queue for the newest queue item.<\/p>\n<p>In this example there are 2 threads trying to acquire permits. If thread 1 runs first it will acquire the 2 permits successfully and the <code>WaitAsync<\/code> in thread 2 will be queued waiting for the <code>RateLimitLease<\/code> in thread 1 to be disposed. Additionally, if another thread tries to acquire permits using either <code>Acquire<\/code> or <code>WaitAsync<\/code> it will immediately receive a <code>RateLimitLease<\/code> with an <code>IsAcquired<\/code> property equal to false, because the <code>permitLimit<\/code> and <code>queueLimit<\/code> are already used up.<\/p>\n<p>If thread 2 runs first it will immediately get a <code>RateLimitLease<\/code> with <code>IsAcquired<\/code> equal to true, and when thread 1 runs next (assuming the lease in thread 2 hasn&#8217;t been disposed yet) it will synchronously get a <code>RateLimitLease<\/code> with an <code>IsAcquired<\/code> property equal to false, because <code>Acquire<\/code> does not queue and the <code>permitLimit<\/code> is used up by the <code>WaitAsync<\/code> call.<\/p>\n<p>So far we&#8217;ve seen the <code>ConcurrencyLimiter<\/code>, there are 3 other limiters we provide in-box. <code>TokenBucketRateLimiter<\/code>, <code>FixedWindowRateLimiter<\/code>, and <code>SlidingWindowRateLimiter<\/code> all of which implement the abstract class <code>ReplenishingRateLimiter<\/code> which itself implements <code>RateLimiter<\/code>. <code>ReplenishingRateLimiter<\/code> introduces the <code>TryReplenish<\/code> method as well as a couple properties for observing common settings on the limiter. <code>TryReplenish<\/code> will be explained after showing some examples of these rate limiters.<\/p>\n<pre><code class=\"language-csharp\">RateLimiter limiter = new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions(tokenLimit: 5, queueProcessingOrder: QueueProcessingOrder.OldestFirst,\r\n    queueLimit: 1, replenishmentPeriod: TimeSpan.FromSeconds(5), tokensPerPeriod: 1, autoReplenishment: true));\r\n\r\nusing RateLimitLease lease = await limiter.WaitAsync(5);\r\n\r\n\/\/ will complete after ~5 seconds\r\nusing RateLimitLease lease2 = await limiter.WaitAsync();<\/code><\/pre>\n<p>Here we show the <code>TokenBucketRateLimiter<\/code>, it has a few more options than the <code>ConcurrencyLimiter<\/code>. The <code>replenishmentPeriod<\/code> is how often new tokens (same concept as permits, just a better name in the context of token bucket) are added back to the limit. In this example <code>tokensPerPeriod<\/code> is 1 and the <code>replenishmentPeriod<\/code> is 5 seconds, so every 5 seconds 1 token is added back to the <code>tokenLimit<\/code> up to the max of 5. And lastly, <code>autoReplenishment<\/code> is set to true which means the limiter will create a <code>Timer<\/code> internally to handle the replenishment of tokens every 5 seconds.<\/p>\n<p>If <code>autoReplenishment<\/code> is set to false then it is up to the developer to call <code>TryReplenish<\/code> on the limiter. This is useful when managing multiple <code>ReplenishingRateLimiter<\/code> instances and wanting to lower the overhead by creating a single <code>Timer<\/code> instance and managing the replenish calls yourself, instead of having each limiter create a <code>Timer<\/code>.<\/p>\n<pre><code class=\"language-csharp\">ReplenishingRateLimiter[] limiters = GetLimiters();\r\nTimer rateLimitTimer = new Timer(static state =&gt;\r\n{\r\n    var replenishingLimiters = (ReplenishingRateLimiter[])state;\r\n    foreach (var limiter in replenishingLimiters)\r\n    {\r\n        limiter.TryReplenish();\r\n    }\r\n}, limiters, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));<\/code><\/pre>\n<p><code>FixedWindowRateLimiter<\/code> has a <code>window<\/code> option which defines how long it takes for the window to update.<\/p>\n<pre><code class=\"language-csharp\">new FixedWindowRateLimiter(new FixedWindowRateLimiterOptions(permitLimit: 2,\r\n    queueProcessingOrder: QueueProcessingOrder.OldestFirst, queueLimit: 1, window: TimeSpan.FromSeconds(10), autoReplenishment: true));<\/code><\/pre>\n<p>And <code>SlidingWindowRateLimiter<\/code> has a <code>segmentsPerWindow<\/code> option in addition to <code>window<\/code> which specifies how many segments there are and how often the window will slide.<\/p>\n<pre><code class=\"language-csharp\">new SlidingWindowRateLimiter(new SlidingWindowRateLimiterOptions(permitLimit: 2,\r\n    queueProcessingOrder: QueueProcessingOrder.OldestFirst, queueLimit: 1, window: TimeSpan.FromSeconds(10), segmentsPerWindow: 5, autoReplenishment: true));<\/code><\/pre>\n<p>Going back to the mention of metadata earlier, let&#8217;s show an example of where metadata might be useful.<\/p>\n<pre><code class=\"language-csharp\">class RateLimitedHandler : DelegatingHandler\r\n{\r\n    private readonly RateLimiter _rateLimiter;\r\n\r\n    public RateLimitedHandler(RateLimiter limiter) : base(new HttpClientHandler())\r\n    {\r\n        _rateLimiter = limiter;\r\n    }\r\n\r\n    protected override async Task&lt;HttpResponseMessage&gt; SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\r\n    {\r\n        using RateLimitLease lease = await _rateLimiter.WaitAsync(1, cancellationToken);\r\n        if (lease.IsAcquired)\r\n        {\r\n            return await base.SendAsync(request, cancellationToken);\r\n        }\r\n        var response = new HttpResponseMessage(System.Net.HttpStatusCode.TooManyRequests);\r\n        if (lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))\r\n        {\r\n            response.Headers.Add(HeaderNames.RetryAfter, ((int)retryAfter.TotalSeconds).ToString(NumberFormatInfo.InvariantInfo));\r\n        }\r\n        return response;\r\n    }\r\n}\r\n\r\nRateLimiter limiter = new TokenBucketRateLimiter(new TokenBucketRateLimiterOptions(tokenLimit: 5, queueProcessingOrder: QueueProcessingOrder.OldestFirst,\r\n    queueLimit: 1, replenishmentPeriod: TimeSpan.FromSeconds(5), tokensPerPeriod: 1, autoReplenishment: true));;\r\nHttpClient client = new HttpClient(new RateLimitedHandler(limiter));\r\nawait client.GetAsync(\"https:\/\/example.com\");<\/code><\/pre>\n<p>In this example we are making a rate limited <code>HttpClient<\/code> and if we fail to acquire the requested permit we want to return a failed http request with a 429 status code (Too Many Requests) instead of making an HTTP request to our downstream resource. Additionally, 429 responses can contain a &#8220;Retry-After&#8221; header that let&#8217;s the consumer know when a retry might be successful. We accomplish this by looking for metadata on the <code>RateLimitLease<\/code> using <code>TryGetMetadata<\/code> and <code>MetadataName.RetryAfter<\/code>. We also use the <code>TokenBucketRateLimiter<\/code> because it is able to calculate an estimate of when the number of requested tokens will be available as it knows how often it replenishes tokens. Whereas the <code>ConcurrencyLimiter<\/code> would have no way of knowing when permits would become available, so it wouldn&#8217;t provide any <code>RetryAfter<\/code> metadata.<\/p>\n<p><code>MetadataName<\/code> is a static class that provides a couple pre-created <code>MetadataName&lt;T&gt;<\/code> instances, the <code>MetadataName.RetryAfter<\/code> that we just saw, which is typed as <code>MetadataName&lt;TimeSpan&gt;<\/code>, and <code>MetadataName.ReasonPhrase<\/code>, which is typed as <code>MetadataName&lt;string&gt;<\/code>. There is also a static <code>MetadataName.Create&lt;T&gt;(string name)<\/code> method for creating your own strongly-typed named metadata keys. <code>RateLimitLease.TryGetMetadata<\/code> has 2 overloads, one for the strongly-typed <code>MetadataName&lt;T&gt;<\/code> which has an <code>out T<\/code> parameter, and the other accepts a string for the metadata name and has an <code>out object<\/code> parameter.<\/p>\n<p>Let&#8217;s now look at another API being introduced to help with more complicated scenarios, the <code>PartitionedRateLimiter<\/code>!<\/p>\n<h2>PartitionedRateLimiter<\/h2>\n<p>Also contained in the <a href=\"https:\/\/www.nuget.org\/packages\/System.Threading.RateLimiting\">System.Threading.RateLimiting<\/a> nuget package is <code>PartitionedRateLimiter&lt;TResource&gt;<\/code>. This is an abstraction that is very similar to the <code>RateLimiter<\/code> class except that it accepts a <code>TResource<\/code> instance as an argument to methods on it. For example <code>Acquire<\/code> is now: <code>Acquire(TResource resourceID, int permitCount = 1)<\/code>. This is useful for scenarios where you might want to change rate limiting behavior depending on the <code>TResource<\/code> that is passed in. This can be something such as independent concurrency limits for different <code>TResource<\/code>s or more complicated scenarios like grouping X and Y under the same concurrency limit, but having W and Z under a token bucket limit.<\/p>\n<p>To assist with common usages, we have included a way to construct a <code>PartitionedRateLimiter&lt;TResource&gt;<\/code> via <code>PartitionedRateLimiter.Create&lt;TResource, TPartitionKey&gt;(...)<\/code>.<\/p>\n<pre><code class=\"language-csharp\">enum MyPolicyEnum\r\n{\r\n    One,\r\n    Two,\r\n    Admin,\r\n    Default\r\n}\r\n\r\nPartitionedRateLimiter&lt;string&gt; limiter = PartitionedRateLimiter.Create&lt;string, MyPolicyEnum&gt;(resource =&gt;\r\n{\r\n    if (resource == \"Policy1\")\r\n    {\r\n        return RateLimitPartition.Create(MyPolicyEnum.One, key =&gt; new MyCustomLimiter());\r\n    }\r\n    else if (resource == \"Policy2\")\r\n    {\r\n        return RateLimitPartition.CreateConcurrencyLimiter(MyPolicyEnum.Two, key =&gt;\r\n            new ConcurrencyLimiterOptions(permitLimit: 2, queueProcessingOrder: QueueProcessingOrder.OldestFirst, queueLimit: 2));\r\n    }\r\n    else if (resource == \"Admin\")\r\n    {\r\n        return RateLimitPartition.CreateNoLimiter(MyPolicyEnum.Admin);\r\n    }\r\n    else\r\n    {\r\n        return RateLimitPartition.CreateTokenBucketLimiter(MyPolicyEnum.Default, key =&gt;\r\n            new TokenBucketRateLimiterOptions(tokenLimit: 5, queueProcessingOrder: QueueProcessingOrder.OldestFirst,\r\n                queueLimit: 1, replenishmentPeriod: TimeSpan.FromSeconds(5), tokensPerPeriod: 1, autoReplenishment: true));\r\n    }\r\n});\r\nRateLimitLease lease = limiter.Acquire(resourceID: \"Policy1\", permitCount: 1);\r\n\r\n\/\/ ...\r\n\r\nRateLimitLease lease = limiter.Acquire(resourceID: \"Policy2\", permitCount: 1);\r\n\r\n\/\/ ...\r\n\r\nRateLimitLease lease = limiter.Acquire(resourceID: \"Admin\", permitCount: 12345678);\r\n\r\n\/\/ ...\r\n\r\nRateLimitLease lease = limiter.Acquire(resourceID: \"other value\", permitCount: 1);<\/code><\/pre>\n<p><code>PartitionedRateLimiter.Create<\/code> has 2 generic type parameters, the first one represents the resource type which will also be the <code>TResource<\/code> in the returned <code>PartitionedRateLimiter&lt;TResource&gt;<\/code>. The second generic type is the partition key type, in the above example we use <code>MyPolicyEnum<\/code>\u00a0as our key type. The key is used to differentiate a group of <code>TResource<\/code> instances with the same limiter, which is what we are calling a partition. <code>PartitionedRateLimiter.Create<\/code> accepts a <code>Func&lt;TResource, RateLimitPartition&lt;TPartitionKey&gt;&gt;<\/code> which we call the partitioner. This function is called every time the <code>PartitionedRateLimiter<\/code> is interacted with via <code>Acquire<\/code> or <code>WaitAsync<\/code> and a <code>RateLimitPartition&lt;TKey&gt;<\/code> is returned from the function. <code>RateLimitPartition&lt;TKey&gt;<\/code> contains a <code>Create<\/code> method which is how the user specifies what identifier the partition will have and what limiter will be associated with that identifier.<\/p>\n<p>In our first block of code above, we are checking the resource for equality with &#8220;Policy1&#8221;, if they match we create a partition with the key <code>MyPolicyEnum.One<\/code> and return a factory for creating a custom <code>RateLimiter<\/code>. The factory is called once and then the rate limiter is cached so future accesses for the key <code>MyPolicyEnum.One<\/code> will use the same rate limiter instance.<\/p>\n<p>Looking at the first <code>else if<\/code> condition we similarly create a partition when the resource equals &#8220;Policy2&#8221;, this time we use the convenience method <code>CreateConcurrencyLimiter<\/code> to create a <code>ConcurrencyLimiter<\/code>. We use a new partition key of <code>MyPolicyEnum.Two<\/code> for this partition and specify the options for the <code>ConcurrencyLimiter<\/code> that will be generated. Now every <code>Acquire<\/code> or <code>WaitAsync<\/code> for &#8220;Policy2&#8221; will use the same instance of <code>ConcurrencyLimiter<\/code>.<\/p>\n<p>Our third condition is for our &#8220;Admin&#8221; resource, we don&#8217;t want to limit our admin(s) so we use <code>CreateNoLimiter<\/code> which will have no limits applied. We also assign the partition key <code>MyPolicyEnum.Admin<\/code> for this partition.<\/p>\n<p>Finally, we have a fallback for all other resources to use a <code>TokenBucketLimiter<\/code> instance and we assign the key of <code>MyPolicyEnum.Default<\/code> to this partition. Any request to a resource not covered by our <code>if<\/code> conditions will use this <code>TokenBucketLimiter<\/code>. It&#8217;s generally a good practice to have a non-noop fallback limiter in case you didn&#8217;t cover all conditions or add new behavior to your application in the future.<\/p>\n<p>In the next example, let&#8217;s combine the <code>PartitionedRateLimiter<\/code> with our customized <code>HttpClient<\/code> from earlier. We&#8217;ll use <code>HttpRequestMessage<\/code> as our resource type for the <code>PartitionedRateLimiter<\/code>, which is the type we get in the <code>SendAsync<\/code> method of <code>DelegatingHandler<\/code>. And a <code>string<\/code> for our partition key as we are going to be partitioning based on url paths.<\/p>\n<pre><code class=\"language-csharp\">PartitionedRateLimiter&lt;HttpRequestMessage&gt; limiter = PartitionedRateLimiter.Create&lt;HttpRequestMessage, string&gt;(resource =&gt;\r\n{\r\n    if (resource.RequestUri?.IsLoopback)\r\n    {\r\n        return RateLimitPartition.CreateNoLimiter(\"loopback\");\r\n    }\r\n\r\n    string[]? segments = resource.RequestUri?.Segments;\r\n    if (segments?.Length &gt;= 2 &amp;&amp; segments[1] == \"api\/\")\r\n    {\r\n        \/\/ segments will be [] { \"\/\", \"api\/\", \"next_path_segment\", etc.. }\r\n        return RateLimitPartition.CreateConcurrencyLimiter(segments[2].Trim('\/'), key =&gt;\r\n            new ConcurrencyLimiterOptions(permitLimit: 2, queueProcessingOrder: QueueProcessingOrder.OldestFirst, queueLimit: 2));\r\n    }\r\n\r\n    return RateLimitPartition.Create(\"default\", key =&gt; new MyCustomLimiter());\r\n});\r\n\r\nclass RateLimitedHandler : DelegatingHandler\r\n{\r\n    private readonly PartitionedRateLimiter&lt;HttpRequestMessage&gt; _rateLimiter;\r\n\r\n    public RateLimitedHandler(PartitionedRateLimiter&lt;HttpRequestMessage&gt; limiter) : base(new HttpClientHandler())\r\n    {\r\n        _rateLimiter = limiter;\r\n    }\r\n\r\n    protected override async Task&lt;HttpResponseMessage&gt; SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)\r\n    {\r\n        using RateLimitLease lease = await _rateLimiter.WaitAsync(request, 1, cancellationToken);\r\n        if (lease.IsAcquired)\r\n        {\r\n            return await base.SendAsync(request, cancellationToken);\r\n        }\r\n        var response = new HttpResponseMessage(System.Net.HttpStatusCode.TooManyRequests);\r\n        if (lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))\r\n        {\r\n            response.Headers.Add(HeaderNames.RetryAfter, ((int)retryAfter.TotalSeconds).ToString(NumberFormatInfo.InvariantInfo));\r\n        }\r\n        return response;\r\n    }\r\n}<\/code><\/pre>\n<p>Looking closely at the <code>PartitionedRateLimiter<\/code> in the above example, our first check is for localhost, we&#8217;ve decided that if the user is doing things locally we don&#8217;t want to limit them, they won&#8217;t be using the upstream resource that we are trying to protect. The next check is more interesting, we are looking at the url path and finding any requests to an <code>\/api\/&lt;something&gt;<\/code> endpoint. If the request matches we grab the <code>&lt;something&gt;<\/code> part of the path and create a partition for that specific path. What this means is that any requests to <code>\/api\/apple\/*<\/code> will use one instance of our <code>ConcurrencyLimiter<\/code> while any requests to <code>\/api\/orange\/*<\/code> will use a different instance of our <code>ConcurrencyLimiter<\/code>. This is because we use a different partition key for those requests and so our limiter factory generates a new limiter for the different partitions. And finally, we have a fallback limit for any requests that aren&#8217;t for localhost or an <code>\/api\/*<\/code> endpoint.<\/p>\n<p>Also shown, is the updated <code>RateLimitedHandler<\/code> which now accepts a <code>PartitionedRateLimiter&lt;HttpRequestMessage&gt;<\/code> instead of a <code>RateLimiter<\/code> and passes in <code>request<\/code> to the <code>WaitAsync<\/code> call, otherwise the rest of the code remains the same.<\/p>\n<p>There are a few things worth pointing out in this example. We may potentially create many partitions if lots of unique <code>\/api\/*<\/code> requests are made, this would result in memory usage growing in our <code>PartitionedRateLimiter<\/code>. The <code>PartitionedRateLimiter<\/code> returned from <code>PartitionedRateLimiter.Create<\/code> does have some logic to remove limiters once they haven&#8217;t been used for a while to help mitigate this, but application developers should also be aware of creating unbounded partitions and try to avoid that when possible. Additionally, we have <code>segments[2].Trim('\/')<\/code> for our partition key, the <code>Trim<\/code> call is to avoid using a different limiter in the cases of <code>\/api\/apple<\/code> and <code>\/api\/apple\/<\/code> as those produce different segments when using <code>Uri.Segments<\/code>.<\/p>\n<p>Custom <code>PartitionedRateLimiter&lt;T&gt;<\/code> implementations can also be written without using the <code>PartitionedRateLimiter.Create<\/code> method. Below is an example of a custom implementation using a concurrency limit for each <code>int<\/code> resource. So resource <code>1<\/code> has its own limit, <code>2<\/code> has its own limit, etc. This has the advantage of being more flexible and potentially more efficient at the cost of higher maintenance.<\/p>\n<pre><code class=\"language-csharp\">public sealed class PartitionedConcurrencyLimiter : PartitionedRateLimiter&lt;int&gt;\r\n{\r\n    private ConcurrentDictionary&lt;int, int&gt; _keyLimits = new();\r\n    private int _permitLimit;\r\n\r\n    private static readonly RateLimitLease FailedLease = new Lease(null, 0, 0);\r\n\r\n    public PartitionedConcurrencyLimiter(int permitLimit)\r\n    {\r\n        _permitLimit = permitLimit;\r\n    }\r\n\r\n    public override int GetAvailablePermits(int resourceID)\r\n    {\r\n        if (_keyLimits.TryGetValue(resourceID, out int value))\r\n        {\r\n            return value;\r\n        }\r\n        return 0;\r\n    }\r\n\r\n    protected override RateLimitLease AcquireCore(int resourceID, int permitCount)\r\n    {\r\n        if (_permitLimit &lt; permitCount)\r\n        {\r\n            return FailedLease;\r\n        }\r\n\r\n        bool wasUpdated = false;\r\n        _keyLimits.AddOrUpdate(resourceID, (key) =&gt;\r\n        {\r\n            wasUpdated = true;\r\n            return _permitLimit - permitCount;\r\n        }, (key, currentValue) =&gt;\r\n        {\r\n            if (currentValue &gt;= permitCount)\r\n            {\r\n                wasUpdated = true;\r\n                currentValue -= permitCount;\r\n            }\r\n            return currentValue;\r\n        });\r\n\r\n        if (wasUpdated)\r\n        {\r\n            return new Lease(this, resourceID, permitCount);\r\n        }\r\n        return FailedLease;\r\n    }\r\n\r\n    protected override ValueTask&lt;RateLimitLease&gt; WaitAsyncCore(int resourceID, int permitCount, CancellationToken cancellationToken)\r\n    {\r\n        return new ValueTask&lt;RateLimitLease&gt;(AcquireCore(resourceID, permitCount));\r\n    }\r\n\r\n    private void Release(int resourceID, int permitCount)\r\n    {\r\n        _keyLimits.AddOrUpdate(resourceID, _permitLimit, (key, currentValue) =&gt;\r\n        {\r\n            currentValue += permitCount;\r\n            return currentValue;\r\n        });\r\n    }\r\n\r\n    private sealed class Lease : RateLimitLease\r\n    {\r\n        private readonly int _permitCount;\r\n        private readonly int _resourceId;\r\n        private PartitionedConcurrencyLimiter? _limiter;\r\n\r\n        public Lease(PartitionedConcurrencyLimiter? limiter, int resourceId, int permitCount)\r\n        {\r\n            _limiter = limiter;\r\n            _resourceId = resourceId;\r\n            _permitCount = permitCount;\r\n        }\r\n\r\n        public override bool IsAcquired =&gt; _limiter is not null;\r\n\r\n        public override IEnumerable&lt;string&gt; MetadataNames =&gt; throw new NotImplementedException();\r\n\r\n        public override bool TryGetMetadata(string metadataName, out object? metadata)\r\n        {\r\n            throw new NotImplementedException();\r\n        }\r\n\r\n        protected override void Dispose(bool disposing)\r\n        {\r\n            if (_limiter is null)\r\n            {\r\n                return;\r\n            }\r\n\r\n            _limiter.Release(_resourceId, _permitCount);\r\n            _limiter = null;\r\n        }\r\n    }\r\n}\r\n\r\nPartitionedRateLimiter&lt;int&gt; limiter = new PartitionedConcurrencyLimiter(permitLimit: 10);\r\n\/\/ both will be successful acquisitions as they use different resource IDs\r\nRateLimitLease lease = limiter.Acquire(resourceID: 1, permitCount: 10);\r\nRateLimitLease lease2 = limiter.Acquire(resourceID: 2, permitCount: 7);<\/code><\/pre>\n<p>This implementation does have some issues such as never removing entries in the dictionary, not supporting queuing, and throwing when accessing metadata, so please use it as inspiration for implementing a custom <code>PartitionedRateLimiter&lt;T&gt;<\/code> and don&#8217;t copy without modifications into your code.<\/p>\n<p>Now that we&#8217;ve gone over the main APIs, let&#8217;s take a look at the RateLimiting middleware in ASP.NET Core that makes use of these primitives.<\/p>\n<h2>RateLimiting middleware<\/h2>\n<p>This middleware is provided via the <a href=\"https:\/\/www.nuget.org\/packages\/Microsoft.AspNetCore.RateLimiting\">Microsoft.AspNetCore.RateLimiting<\/a> NuGet package. The main usage pattern is to configure some rate limiting policies and then attach those policies to your endpoints. A policy is a named <code>Func&lt;HttpContext, RateLimitPartition&lt;TPartitionKey&gt;&gt;<\/code>, which is the same as what the <code>PartitionedRateLimiter.Create<\/code> method took, where <code>TResource<\/code> is now <code>HttpContext<\/code> and <code>TPartitionKey<\/code> is still a user defined key. There are also extension methods for the 4 built-in rate limiters when you want to configure a single limiter for a policy without needing different partitions.<\/p>\n<pre><code class=\"language-csharp\">var app = WebApplication.Create(args);\r\n\r\napp.UseRateLimiter(new RateLimiterOptions()\r\n    .AddConcurrencyLimiter(policyName: \"get\", new ConcurrencyLimiterOptions(permitLimit: 2, queueProcessingOrder: QueueProcessingOrder.OldestFirst, queueLimit: 2))\r\n    .AddNoLimiter(policyName: \"admin\")\r\n    .AddPolicy(policyName: \"post\", partitioner: httpContext =&gt;\r\n    {\r\n        if (!StringValues.IsNullOrEmpty(httpContext.Request.Headers[\"token\"]))\r\n        {\r\n            return RateLimitPartition.CreateTokenBucketLimiter(\"token\", key =&gt;\r\n                new TokenBucketRateLimiterOptions(tokenLimit: 5, queueProcessingOrder: QueueProcessingOrder.OldestFirst,\r\n                    queueLimit: 1, replenishmentPeriod: TimeSpan.FromSeconds(5), tokensPerPeriod: 1, autoReplenishment: true));\r\n        }\r\n        else\r\n        {\r\n            return RateLimitPartition.Create(\"default\", key =&gt; new MyCustomLimiter());\r\n        }\r\n    }));\r\n\r\napp.MapGet(\"\/get\", context =&gt; context.Response.WriteAsync(\"get\")).RequireRateLimiting(\"get\");\r\n\r\napp.MapGet(\"\/admin\", context =&gt; context.Response.WriteAsync(\"admin\")).RequireRateLimiting(\"admin\").RequireAuthorization(\"admin\");\r\n\r\napp.MapPost(\"\/post\", context =&gt; context.Response.WriteAsync(\"post\")).RequireRateLimiting(\"post\");\r\n\r\napp.Run();<\/code><\/pre>\n<p>This example shows how to add the middleware, configure some policies, and apply the different policies to different endpoints. Starting at the top, we add the middleware to our middleware pipeline using <code>UseRateLimiter<\/code>. Next we add some policies to our options using the convenience methods <code>AddConcurrencyLimiter<\/code> and <code>AddNoLimiter<\/code> for 2 of the policies, named <code>\"get\"<\/code> and <code>\"admin\"<\/code> respectively. Then we use the <code>AddPolicy<\/code> method that allows configuring different partitions based on the resource passed in (<code>HttpContext<\/code> for the middleware). Finally, we use the <code>RequireRateLimiting<\/code> method on our various endpoints to let the Rate Limiting middleware know what policy to run on what endpoint. (Note the <code>RequireAuthorization<\/code> usage on the <code>\/admin<\/code> endpoint doesn&#8217;t do anything in this minimal sample, imagine that authentication and authorization are configured)<\/p>\n<p>The <code>AddPolicy<\/code> method also has 2 more overloads that use <code>IRateLimiterPolicy&lt;TPartitionKey&gt;<\/code>. This interface exposes an <code>OnRejected<\/code> callback, the same as <code>RateLimiterOptions<\/code> which I&#8217;ll describe below, and a <code>GetPartition<\/code> method that takes the <code>HttpContext<\/code> as an argument and returns a <code>RateLimitPartition&lt;TPartitionKey&gt;<\/code>. The first overload of <code>AddPolicy<\/code> takes an instance of <code>IRateLimiterPolicy<\/code> and the second takes an implementation of <code>IRateLimiterPolicy<\/code> as a generic argument. The generic argument one will use dependency injection to call the constructor and instantiate the <code>IRateLimiterPolicy<\/code> for you.<\/p>\n<pre><code class=\"language-csharp\">public class CustomRateLimiterPolicy&lt;string&gt; : IRateLimiterPolicy&lt;string&gt;\r\n{\r\n    private readonly ILogger _logger;\r\n\r\n    public CustomRateLimiterPolicy(ILogger&lt;CustomRateLimiterPolicy&lt;string&gt;&gt; logger)\r\n    {\r\n        _logger = logger;\r\n    }\r\n\r\n    public Func&lt;OnRejectedContext, CancellationToken, ValueTask&gt;? OnRejected\r\n    {\r\n        get =&gt; (context, lease) =&gt;\r\n        {\r\n            context.HttpContext.Response.StatusCode = 429;\r\n            _logger.LogDebug(\"Request rejected\");\r\n            return new ValueTask();\r\n        };\r\n    }\r\n\r\n    public RateLimitPartition&lt;string&gt; GetPartition(HttpContext context)\r\n    {\r\n        if (!StringValues.IsNullOrEmpty(httpContext.Request.Headers[\"token\"]))\r\n        {\r\n            return RateLimitPartition.CreateTokenBucketLimiter(\"token\", key =&gt;\r\n                new TokenBucketRateLimiterOptions(tokenLimit: 5, queueProcessingOrder: QueueProcessingOrder.OldestFirst,\r\n                    queueLimit: 1, replenishmentPeriod: TimeSpan.FromSeconds(5), tokensPerPeriod: 1, autoReplenishment: true));\r\n        }\r\n        else\r\n        {\r\n            return RateLimitPartition.Create(\"default\", key =&gt; new MyCustomLimiter());\r\n        }\r\n    }\r\n}\r\n\r\nvar app = WebApplication.Create(args);\r\nvar logger = app.Services.GetRequiredService&lt;ILogger&lt;CustomRateLimiterPolicy&lt;string&gt;&gt;&gt;();\r\n\r\napp.UseRateLimiter(new RateLimitOptions()\r\n    .AddPolicy(\"a\", new CustomRateLimiterPolicy&lt;string&gt;(logger))\r\n    .AddPolicy&lt;CustomRateLimiterPolicy&lt;string&gt;&gt;(\"b\"));<\/code><\/pre>\n<p>Other configuration on <code>RateLimiterOptions<\/code> include <code>RejectionStatusCode<\/code> which is the status code that will be returned if a lease fails to be acquired, by default a 503 is returned. For more advanced usages there is also the <code>OnRejected<\/code> function which will be called after <code>RejectionStatusCode<\/code> is used and receives <code>OnRejectedContext<\/code> as an argument.<\/p>\n<pre><code class=\"language-csharp\">new RateLimiterOptions()\r\n{\r\n    OnRejected = (context, cancellationToken) =&gt;\r\n    {\r\n        context.HttpContext.StatusCode = StatusCodes.Status429TooManyRequests;\r\n        return new ValueTask();\r\n    }\r\n};<\/code><\/pre>\n<p>And last but not least, <code>RateLimiterOptions<\/code> allows configuring a global <code>PartitionedRateLimiter&lt;HttpContext&gt;<\/code> via <code>RateLimiterOptions.GlobalLimiter<\/code>. If a <code>GlobalLimiter<\/code> is provided it will run before any policy specified on an endpoint. For example, if you wanted to limit your application to handle 1000 concurrent requests no matter what endpoint policies were specified you could configure a <code>PartitionedRateLimiter<\/code> with those settings and set the <code>GlobalLimiter<\/code> property.<\/p>\n<h2>Summary<\/h2>\n<p>Please try Rate Limiting out and let us know what you think! For the RateLimiting APIs in the System.Threading.RateLimiting namespace use the nuget package <a href=\"https:\/\/www.nuget.org\/packages\/System.Threading.RateLimiting\">System.Threading.RateLimiting<\/a> and provide feedback in the <a href=\"https:\/\/github.com\/dotnet\/runtime\">Runtime<\/a> GitHub repo. For the RateLimiting middleware use the nuget package <a href=\"https:\/\/www.nuget.org\/packages\/Microsoft.AspNetCore.RateLimiting\">Microsoft.AspNetCore.RateLimiting<\/a> and provide feedback in the <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/\">AspNetCore<\/a> GitHub repo.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;re excited to announce built-in Rate Limiting support as part of .NET 7. Rate limiting provides a way to protect a resource in order to avoid overwhelming your app and keep traffic at a safe level.<\/p>\n","protected":false},"author":82107,"featured_media":40857,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,197,7509],"tags":[],"class_list":["post-40856","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-aspnet","category-aspnetcore"],"acf":[],"blog_post_summary":"<p>We&#8217;re excited to announce built-in Rate Limiting support as part of .NET 7. Rate limiting provides a way to protect a resource in order to avoid overwhelming your app and keep traffic at a safe level.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/40856","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/82107"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=40856"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/40856\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/40857"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=40856"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=40856"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=40856"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}