Prevent throttling in your application by using RateLimit headers in SharePoint Online
SharePoint Online uses throttling to maintain optimal performance and reliability of the SharePoint Online service. The throttling feature controls the number of API calls or operations within a time window to prevent overuse of resources. When your application gets throttled, SharePoint Online returns a HTTP status code 429 (“Too many requests”) or 503 (“Server Too Busy”) and the requests will fail. In both cases, a Retry-After header is included in the response indicating how long the calling application should wait before retrying or making a new request.
In addition to the Retry-After header in the response of throttled requests, SharePoint Online also returns the IETF RateLimit headers for selected limits in certain conditions to help applications manage rate limiting. We recommend applications to take advantage of these headers to avoid getting throttled and therefore achieve a better overall throughput for your application.
- These headers are currently in beta and subject to change. At the time when the headers were adopted, the IETF specification was in draft. The current implementation is based on the draft-03 of the IETF specification. There is the potential for changes when the specification is final, and we will adapt to those changes in the future.
- RateLimit headers only are returned when using application permissions.
Benefits of using RateLimit headers
A typical pattern we see with applications getting throttled is the following: the application starts full speed, gets throttled and as such is halted for some time and then again ramps up…until it’s halted again due to throttling. This cycle of ramping up and halting is not very efficient and results in a lesser throughput compared to using the RateLimit headers and avoid getting throttled. Using RateLimit headers will slow down your application, but you’ll not see the application being halted. Furthermore, when a request is retried due to throttling each retry itself counts towards the defined resource unit quota, resulting in less successful requests for a given resource unit quota.
- The highest application throughput is achieved by “just” staying within the application’s resource unit quota, so by “just” not getting throttled.
The below graph shows the request throughput for the RateLimit demo application, clearly showing the difference between using RateLimit headers and not using them. The blue line shows the number of requests for when RateLimit headers are used, it’s relatively smooth and there are no application halts. The orange line shows the number of requests when not using RateLimit headers, you’ll see more horizontal parts indicating application halting and after 5 minutes there’s considerately less throughput compared to when using RateLimit headers.
RateLimit demo application
If you want to see RateLimit handling in action you can check out this demo application: it shows how to capture RateLimit headers for Microsoft Graph, SharePoint REST and SharePoint CSOM calls and how to process them.
The demo application will start to run, launching 5 parallel threads that each start issuing a Microsoft Graph, SharePoint REST and SharePoint CSOM in a loop.
When the application has consumed 80% of its resource unit quota SharePoint will start to send RateLimit headers, which are shown by the application. Once the application detects there’s only 10% resource units left it will automatically slow down to avoid getting throttled.
For more information on how to setup and tailor the demo application, please see the available documentation in GitHub.