{"id":31442,"date":"2021-01-14T09:00:27","date_gmt":"2021-01-14T16:00:27","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=31442"},"modified":"2022-05-03T21:00:11","modified_gmt":"2022-05-04T04:00:11","slug":"azure-active-directorys-gateway-service-is-on-net-core-3-1","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/azure-active-directorys-gateway-service-is-on-net-core-3-1\/","title":{"rendered":"Azure Active Directory&#8217;s gateway is on .NET Core 3.1!"},"content":{"rendered":"<p>Azure Active Directory&#8217;s gateway service is a reverse proxy that fronts hundreds of services that make up Azure Active Directory (Azure AD). If you&#8217;ve used services such as office.com, outlook.com, azure.com or xbox.live.com, then you&#8217;ve used Azure AD&#8217;s gateway. The gateway provides features such as TLS termination, automatic failovers\/retries, geo-proximity routing, throttling, and tarpitting to services in Azure AD. The gateway is present in more than 53 Azure datacenters worldwide and serves <strong>~115 Billion<\/strong> requests each day. Up until recently, Azure AD&#8217;s gateway was running on .NET Framework 4.6.2. As of September 2020, it&#8217;s running on .NET Core 3.1.<\/p>\n<h2>Motivation for porting to .NET Core<\/h2>\n<p>The gateway&#8217;s scale of execution results in significant consumption of compute resources, which in turn costs money. Finding ways to reduce the cost of executing the service has been a key goal for the team behind it. The buzz around .NET Core&#8217;s focus on performance caught our attention, especially since <a href=\"https:\/\/www.techempower.com\/benchmarks\/#section=data-r19&amp;hw=ph&amp;test=plaintext\">TechEmpower<\/a> listed ASP.NET Core as one of the fastest web frameworks on the planet. We ran our own benchmarks on gateway prototypes on .NET Core and the results made the decision very easy: we <strong>must<\/strong> port our service to .NET Core.<\/p>\n<h2>Does .NET Core performance translate to real-life cost savings?<\/h2>\n<p>It <em>absolutely<\/em> does. In Azure AD gateway&#8217;s case, we were able to cut our CPU costs by 50%.<\/p>\n<p>The gateway used to run on IIS with .NET Framework 4.6.2. Today, it runs on IIS with .NET Core 3.1. The image below shows that our CPU usage was reduced by half on .NET Core 3.1 compared to .NET Framework 4.6.2 (effectively doubling our throughput).<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/throughput.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-31443\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/throughput.png\" alt=\"Azure Active Directory\u2019s gateway service is on .NET Core\" width=\"1689\" height=\"797\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/throughput.png 1689w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/throughput-300x142.png 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/throughput-1024x483.png 1024w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/throughput-768x362.png 768w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/throughput-1536x725.png 1536w\" sizes=\"(max-width: 1689px) 100vw, 1689px\" \/><\/a><\/p>\n<p>As a result of the gains in throughput, we were able to reduce our fleet size from ~40k cores to ~20k cores (50% reduction).<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/CoresReduction_2.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-31520\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/CoresReduction_2.png\" alt=\"Image CoresReduction 2\" width=\"864\" height=\"231\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/CoresReduction_2.png 864w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/CoresReduction_2-300x80.png 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2021\/01\/CoresReduction_2-768x205.png 768w\" sizes=\"(max-width: 864px) 100vw, 864px\" \/><\/a><\/p>\n<h2>How was the port to .NET Core achieved?<\/h2>\n<p>The porting was done in 3 phases.<\/p>\n<h3>Phase 1: Choosing an edge webserver.<\/h3>\n<p>When we started the porting effort, the first question we had to ask ourselves was which of the 3 webservers in .NET Core do we pick?<\/p>\n<p>We ran our production scenarios on all 3 webservers, and we realized it all came down to TLS support. Given the gateway is a reverse proxy, support for a wide range of TLS scenarios is critical.<\/p>\n<p><strong>Kestrel:<\/strong><\/p>\n<ul>\n<li>When we started our migration (November 2019), <a href=\"https:\/\/docs.microsoft.com\/aspnet\/core\/fundamentals\/servers\/kestrel?view=aspnetcore-5.0\">Kestrel<\/a> did not support client certificate negotiation nor revocation on a per-hostname basis. In .NET 5.0, support for these features was <a href=\"https:\/\/github.com\/dotnet\/runtime\/issues\/31097\">added<\/a>.<\/li>\n<li>As for .NET 5.0, Kestrel (via its reliance on SslStream) does not support CTL stores on a per hostname basis. Support is <a href=\"https:\/\/github.com\/dotnet\/runtime\/issues\/45456\">expected in .NET 6.0<\/a>.<\/li>\n<\/ul>\n<p><strong>HTTP.sys:<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/aspnet\/core\/fundamentals\/servers\/httpsys?view=aspnetcore-5.0\">HTTP.sys server<\/a> had a disconnect between the TLS configuration at Http.Sys layer and the .NET implementation: Even when a binding is configured to not negotiate client certificates, accessing the Client certificate property in .NET Core triggers an unwanted TLS renegotiation.\n<p><br\/>\nFor example, performing a simple null check in C# renegotiates the TLS handshake:<\/p>\n<pre><code class=\"csharp\">if (HttpContext.Connection.ClientCertificate != null)\r\n<\/code><\/pre>\n<p>This has been addressed in: <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/issues\/14806\">https:\/\/github.com\/dotnet\/aspnetcore\/issues\/14806<\/a> in ASP.NET Core 3.1. At the time, when we started the port in November 2019, we were on ASP.NET Core 2.2 and therefore did not pick this server.\n<\/li>\n<\/ul>\n<p><strong>IIS:<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/aspnet\/core\/host-and-deploy\/iis\/?view=aspnetcore-5.0\">IIS<\/a> met all our requirements for TLS, so that&#8217;s the webserver we chose.<\/li>\n<\/ul>\n<h3>Phase 2: Migrating the application and dependencies.<\/h3>\n<p>As with many large services and applications, Azure AD&#8217;s gateway has many dependencies. Some were written specifically for the service, and some written by others inside and outside of Microsoft. In certain cases, those libraries were already targeting .NET Standard 2.0. For others, we updated them to support .NET Standard 2.0 or found alternative implementations, e.g. removing our legacy Dependency Injection library and instead using .NET Core&#8217;s built-in support for dependency injection. The <a href=\"https:\/\/docs.microsoft.com\/dotnet\/standard\/analyzers\/portability-analyzer\">.NET Portability Analyzer<\/a> was of great help in this step.<\/p>\n<p>For the application itself:<\/p>\n<ul>\n<li>Azure AD&#8217;s gateway used to have a dependency on <code>IHttpModule<\/code> and <code>IHttpHandler<\/code> from classic ASP.NET, which don&#8217;t exist in ASP.NET Core. So, we re-wrote the application using the middleware constructs in ASP.NET Core.<\/li>\n<li>One of the things that really helped throughout the migration is Azure Profiler (a service that collects performance traces at runtime on Azure VMs). We deployed our nightly builds to test beds, used <a href=\"https:\/\/github.com\/giltene\/wrk2\">wrk2<\/a> as a load agent to test the scenarios under stress and collected Azure Profiler traces. These traces would then inform us of the next tweak necessary to extract peak performance from our application.<\/li>\n<\/ul>\n<h3>Phase 3: Rolling out gradually.<\/h3>\n<p>The philosophy we adopted during rollout was to discover as many issues as possible with little or no production impact.<\/p>\n<ul>\n<li>We deployed our initial builds to test, integration and DogFood environments. This led to <strong>early<\/strong> discovery of bugs and helped in fixing them before hitting production.<\/li>\n<li>After code complete, we deployed the .NET Core build to a <strong>single<\/strong> production machine in a scale unit. A scale unit is a load balanced pool of machines.\n<ul>\n<li>The scale unit had ~100 machines, where 99 machines were still running our existing .NET Framework build and only 1 machine was running the new .NET Core build.<\/li>\n<li>All ~100 machines in this scale unit receive the exact type and amount of traffic. Then, we compared status codes, error rates, functional scenarios and performance of the <strong>single<\/strong> machine to the remaining 99 machines to detect anomalies.<\/li>\n<li>We wanted this single machine to behave functionally the same as the remaining 99 machines, but have much better performance\/throughput and that&#8217;s what we observed.<\/li>\n<\/ul>\n<\/li>\n<li>We also &#8220;forked&#8221; traffic from live production scale units (running .NET Framework build) to .NET Core scale units to compare and contrast as indicated above.<\/li>\n<li>Once we reached functional equivalence, we started expanding the number of Scale units running .NET Core and gradually expanded to an entire datacenter.<\/li>\n<li>Once an entire datacenter was migrated, the last step was to gradually expand worldwide to all the Azure datacenters where Azure AD&#8217;s gateway service has a presence. <strong>Migration done!<\/strong><\/li>\n<\/ul>\n<h2>Learnings<\/h2>\n<ul>\n<li>ASP.NET Core is strict about RFCs. This is a very good thing as it drives good practices across the board. However, classic ASP.NET and .NET Framework were quite a bit more lenient and that causes some backwards compatibility issues:\n<ul>\n<li>The Webserver by default allows only ASCII values in HTTP Headers. At our request, Latin1 support was added in IISHttpServer: <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/22798\">https:\/\/github.com\/dotnet\/aspnetcore\/pull\/22798<\/a><\/li>\n<li><code>HttpClient<\/code> on .NET Core used to support only ASCII values in HTTP headers.\n<ul>\n<li>.NET Core team has added Latin1 support in .NET Core 3.1: <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/42978\">https:\/\/github.com\/dotnet\/corefx\/pull\/42978<\/a><\/li>\n<li>Ability to select encoding scheme added in .NET 5.0: <a href=\"https:\/\/github.com\/dotnet\/runtime\/issues\/38711\">https:\/\/github.com\/dotnet\/runtime\/issues\/38711<\/a><\/li>\n<\/ul>\n<\/li>\n<li>Forms and cookies that are not RFC compliant result in validation exceptions. So, we built &#8220;fallback&#8221; parsers using classic ASP.NET source code to maintain backward compatibility for customers.<\/li>\n<\/ul>\n<\/li>\n<li>There was a performance bottleneck in <code>FileBufferingReadStream<\/code>&#8216;s <code>CopyToAsync()<\/code> method due to multiple 1 byte copies of a n byte stream. This has been addressed in .NET 5.0 by picking a default buffer size of 4K: <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/issues\/24032\">https:\/\/github.com\/dotnet\/aspnetcore\/issues\/24032<\/a><\/li>\n<li>Be aware of classic ASP.NET quirks:\n<ul>\n<li>Trailing whitespace is auto-trimmed in the path:\n<ul>\n<li>foo.com\/oauth \u00a0 ?client=abc is trimmed to foo.com\/oauth?client=abc on classic ASP.NET.<\/li>\n<li>Over the years, customers\/downstream services have taken a dependency on this path being trimmed and ASP.NET Core does not auto-trim the path. So, we had to trim trailing whitespace to mimic classic ASP.NET behavior.<\/li>\n<\/ul>\n<\/li>\n<li><code>Content-Type<\/code> header is auto-generated if missing:\n<ul>\n<li>When the response is larger than zero bytes, but <code>Content-Type<\/code> header is missing, classic ASP.NET generates a default <code>Content-Type:text\/html<\/code> header. ASP.NET Core does not force generate a default <code>Content-Type<\/code> header and clients who assume <code>Content-Type<\/code> header is always sent in the response start having issues. We mimicked the classic ASP.NET behavior by adding a default <code>Content-Type<\/code> when it is missing from downstream services.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Future<\/h2>\n<p>Porting to .NET Core resulted in doubling the throughput for our service and it was a great decision to move.\nOur .NET Core journey will not stop after porting. For the future, we are looking at:<\/p>\n<ul>\n<li>Upgrading to .NET 5.0 for <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-5\/\">better performance<\/a>.<\/li>\n<li>Porting to Kestrel so that we can intercept connections at the TLS layer for better resiliency.<\/li>\n<li>Leveraging components\/best practices in YARP (<a href=\"https:\/\/microsoft.github.io\/reverse-proxy\/\">https:\/\/microsoft.github.io\/reverse-proxy\/<\/a>) in our own reverse proxy and also contribute back.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Read about Azure Active Directory gateway service&#8217;s move from .NET Framework to .NET Core.<\/p>\n","protected":false},"author":47836,"featured_media":31443,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,196,327,7635],"tags":[9,7225,37],"class_list":["post-31442","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-dotnet-core","category-azure","category-developer-stories","tag-net-core","tag-net-core-3-1","tag-azure"],"acf":[],"blog_post_summary":"<p>Read about Azure Active Directory gateway service&#8217;s move from .NET Framework to .NET Core.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/31442","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/47836"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=31442"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/31442\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/31443"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=31442"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=31442"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=31442"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}