{"id":23615,"date":"2019-06-18T11:11:59","date_gmt":"2019-06-18T18:11:59","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=23615"},"modified":"2019-06-18T11:11:59","modified_gmt":"2019-06-18T18:11:59","slug":"the-evolving-infrastructure-of-net-core","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/the-evolving-infrastructure-of-net-core\/","title":{"rendered":"The Evolving Infrastructure of .NET Core"},"content":{"rendered":"<p class=\"code-line code-line code-line\" data-line=\"2\">With\u00a0<a title=\"https:\/\/devblogs.microsoft.com\/dotnet\/announcing-net-core-3-0-preview-6\/\" href=\"https:\/\/devblogs.microsoft.com\/dotnet\/announcing-net-core-3-0-preview-6\/\">.NET Core 3.0 Preview 6<\/a>\u00a0out the door, we thought it would be useful to take a brief look at the history of our infrastructure systems and the significant improvements that have been made in the last year or so.<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"7\">This post will be interesting if you are interested in build infrastructure or want a behind-the-scenes look at how we build a product as big as .NET Core. It doesn&#8217;t describe new features or sample code that you should use in your next application. Please tell us if you like these types of posts. We have a few more like this planned, but would appreciate knowing if you find this type of information helpful.<\/p>\n<h2 id=\"a-little-history\" class=\"code-line code-line code-line\" data-line=\"13\">A little history<\/h2>\n<p class=\"code-line code-line code-line\" data-line=\"15\">Begun over 3 years ago now, the .NET Core project was a significant departure from traditional Microsoft projects.<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"17\">Developed publicly on GitHub<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"18\">Composed of isolated git repositories that integrate together vs. a monolithic repository.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"19\">Targets many platforms<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"20\">Its components may ship in more than one &#8216;vehicle&#8217; (e.g. Roslyn ships as a component of Visual Studio as well as the SDK)<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"23\">Our early infrastructure decisions were made around necessity and expediency. We used Jenkins for GitHub PR and CI validation because it supported cross-platform OSS development. Our official builds lived in Azure DevOps (called VSTS at the time) and TeamCity (used by\u00a0ASP.NET\u00a0Core), where signing and other critical shipping infrastructure exists. We integrated repositories together using a combination of manually updating package dependency versions and somewhat automated GitHub PRs. Teams independently built what tooling they needed to do packaging, layout, localization and all the rest of the usual tasks that show up in big development projects. While not ideal, on some level this worked well enough in the early days. As the project grew from .NET Core 1.0 and 1.1 into 2.0 and beyond we wanted to invest in a more integrated stack, faster shipping cadences and easier servicing. We wanted to produce a new SDK with the latest runtime multiple times per day. And we wanted all of this without reducing the development velocity of the isolated repositories.<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"35\">Many of the infrastructure challenges .NET Core faces stem from the isolated, distributed nature of the repository structure. Although it&#8217;s varied quite a bit over the years, the product is made up of anywhere from 20-30 independent git repositories (ASP.NET\u00a0Core had many more until recently). On one hand, having many independent development silos tends to make development in those silos very efficient; a developer can iterate very quickly in the libraries without much worry about the rest of the stack. On the other hand, it makes innovation and integration of the overall project much less efficient. Some examples:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"42\">If we need to roll out new signing or packaging features, doing so across so many independent repos that use different tools is very costly.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"44\">Moving changes across the stack is slow and costly. Fixes and features in repositories &#8216;low&#8217; in the stack (e.g. corefx libraries) may not be seen in the SDK (the &#8216;top&#8217; of the stack) for several days. If we make a fix in dotnet\/corefx, that change must be built and the new version flowed into any up-stack components that reference it (e.g. dotnet\/core-setup and\u00a0ASP.NET\u00a0Core), where it will be tested, committed and built. Those new components will then need to flow those new outputs further up the stack, and so on and so forth until the head is reached.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"51\">In all of these cases, there is chance for failure at many levels, further slowing down the process. As .NET Core 3.0 planning began in earnest, it became clear that we could we could not create a release of the scope that we wanted without significant changes in our infrastructure.<\/p>\n<h2 id=\"a-three-pronged-approach\" class=\"code-line code-line code-line\" data-line=\"55\">A three-pronged approach<\/h2>\n<p class=\"code-line code-line code-line\" data-line=\"57\">We developed a three-pronged approach to ease our pain:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"59\"><strong>Shared Tooling (aka\u00a0<a title=\"https:\/\/github.com\/dotnet\/arcade\" href=\"https:\/\/github.com\/dotnet\/arcade\">Arcade<\/a>)<\/strong>\u00a0&#8211; Invest in shared tooling across our repositories.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"61\"><strong>System Consolidation (Azure DevOps)<\/strong>\u00a0&#8211; Move off of Jenkins and into Azure DevOps for our GitHub CI. Move our official builds from classic VSTS-era processes onto modern config-as-code.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"63\"><strong>Automated Dependency Flow and Discovery (Maestro)<\/strong>\u00a0&#8211; Explicitly track inter-repo dependencies and automatically update them on a fast cadence.<\/li>\n<\/ul>\n<h3 id=\"arcade\" class=\"code-line code-line code-line\" data-line=\"66\">Arcade<\/h3>\n<p class=\"code-line code-line code-line\" data-line=\"68\">Prior to .NET Core 3.0, there were 3-5 different tooling implementations scattered throughout various repositories, depending on how you counted.<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"70\">The core runtime repositories (<a title=\"https:\/\/github.com\/dotnet\/coreclr\" href=\"https:\/\/github.com\/dotnet\/coreclr\">dotnet\/coreclr<\/a>,\u00a0<a title=\"https:\/\/github.com\/dotnet\/corefx\" href=\"https:\/\/github.com\/dotnet\/corefx\">dotnet\/corefx<\/a>\u00a0and\u00a0<a title=\"https:\/\/github.com\/dotnet\/core-setup\" href=\"https:\/\/github.com\/dotnet\/core-setup\">dotnet\/core-setup<\/a>) had\u00a0<a title=\"https:\/\/github.com\/dotnet\/buildtools\" href=\"https:\/\/github.com\/dotnet\/buildtools\">dotnet\/buildtools<\/a>.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"74\">ASP.NET\u00a0Core&#8217;s repositories had\u00a0<a title=\"https:\/\/github.com\/aspnet\/KoreBuild\" href=\"https:\/\/github.com\/aspnet\/KoreBuild\">aspnet\/KoreBuild<\/a><\/li>\n<li class=\"code-line code-line code-line\" data-line=\"75\">Various repositories like\u00a0<a title=\"https:\/\/github.com\/dotnet\/symreader\" href=\"https:\/\/github.com\/dotnet\/symreader\">dotnet\/symreader<\/a>\u00a0used\u00a0<a title=\"https:\/\/github.com\/dotnet\/roslyn-tools\" href=\"https:\/\/github.com\/dotnet\/roslyn-tools\">Repo Toolset<\/a><\/li>\n<li class=\"code-line code-line code-line\" data-line=\"77\">A few other isolated repositories had independent implementations.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"79\">While in this world each team gets to customize their tooling and only build exactly what they need, it does have some significant downsides:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"81\">\n<p class=\"code-line code-line code-line\" data-line=\"81\"><strong>Developers move between repositories less efficiently<\/strong><\/p>\n<p class=\"code-line code-line code-line\" data-line=\"83\"><em>Example:<\/em>\u00a0When a developer moves from dotnet\/corefx into dotnet\/core-sdk, the &#8216;language&#8217; of the repository is different. What does she type to build and test? Where do the logs get placed? If she needs to add a new project to the repo, how is this done?<\/p>\n<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"86\">\n<p class=\"code-line code-line code-line\" data-line=\"86\"><strong>Each required feature gets built N times<\/strong><\/p>\n<p class=\"code-line code-line code-line\" data-line=\"88\"><em>Example:<\/em>\u00a0.NET Core produces tons of NuGet packages. While there is some variation (e.g. shared runtime packages like Microsoft.NETCore.App produced out of dotnet\/core-setup are built differently than &#8216;normal&#8217; packages like Microsoft.AspNet.WebApi.Client), the steps to produce them are fairly similar. Unfortunately, as repositories diverge their layout, project structure, etc. it generates differences in how these packaging tasks need to be implemented. How does a repository define what packages should be generated, what goes in those packages, their metadata, and so on. Without shared tooling, it is often easier for a team to just implement another packaging task rather than reuse another. This is of course a strain on resources.<\/p>\n<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"97\">With\u00a0<a title=\"https:\/\/github.com\/dotnet\/arcade\" href=\"https:\/\/github.com\/dotnet\/arcade\">Arcade<\/a>, we endeavored to bring all our repos under a common layout, repository &#8216;language&#8217;, and set of tasks where possible. This is not without its pitfalls. Any kind of shared tooling ends up solving a bit of a &#8216;Goldilocks&#8217; problem. If the shared tooling is too prescriptive, then the kind of customization required within a project of any significant size becomes difficult, and updating that tooling becomes tough. It&#8217;s easy to break a repository with new updates. BuildTools suffered from this. The repositories that used it became so tightly coupled to it that it was not only unusable for other repositories, but making any changes in buildtools often broke consumers in unexpected ways. If shared tooling is not prescriptive enough, then repositories tend to diverge in their usage of the tooling, and rolling out updates often requires lots of work in each individual repository. At that point, why have shared tooling in the first place?<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"109\">Arcade actually tries to go with both approaches at the same time. It defines a common repository &#8216;language&#8217; as set of scripts (see\u00a0<a title=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/eng\/common\" href=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/eng\/common\">eng\/common<\/a>), a common repository layout, and common set of build targets rolled out as an MSBuild SDK. Repositories that choose to fully adopt Arcade have predictable behavior, making changes easy to roll out across repositories. Repositories that do not wish to do so can pick and choose from a variety of MSBuild task packages that provide basic functionality, like signing and packaging, that tend to look the same across all repositories. As we roll out changes to these tasks, we try our best to avoid breaking changes.<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"118\">Let&#8217;s take a look at the primary features that Arcade provides and how they integrate into our larger infrastructure.<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"121\"><strong>Common build task packages<\/strong>\u00a0&#8211; These are a basic layer of MSBuild tasks which can either be utilized independently or as part of the Arcade SDK. They are &#8220;pay for play&#8221; (hence the name &#8216;Arcade&#8217;). They provide a common set of functionality that is needed in most .NET Core repositories:\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"125\">Signing:\u00a0<a title=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/src\/Microsoft.DotNet.SignTool\" href=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/src\/Microsoft.DotNet.SignTool\">Microsoft.DotNet.SignTool<\/a><\/li>\n<li class=\"code-line code-line code-line\" data-line=\"127\">Output publishing (to inter-repo feeds):\u00a0<a title=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/src\/Microsoft.DotNet.Build.Tasks.Feed\" href=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/src\/Microsoft.DotNet.Build.Tasks.Feed\">Microsoft.DotNet.Build.Tasks.Feed<\/a><\/li>\n<li class=\"code-line code-line code-line\" data-line=\"129\">Packaging\u00a0<a title=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/src\/Microsoft.DotNet.Build.Tasks.Packaging\" href=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/src\/Microsoft.DotNet.Build.Tasks.Packaging\">Microsoft.DotNet.Build.Tasks.Packaging<\/a><\/li>\n<\/ul>\n<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"131\"><strong>Common repo targets and behaviors<\/strong>\u00a0&#8211; These are provided as part of an MSBuild SDK called the &#8220;Arcade SDK&#8221;. By utilizing it, repositories opt-in to the default Arcade build behaviors, project and artifact layout, etc.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"134\"><strong>Common repository &#8216;language&#8217;<\/strong>\u00a0&#8211; A set of common\u00a0<a title=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/eng\/common\" href=\"https:\/\/github.com\/dotnet\/arcade\/tree\/master\/eng\/common\">script files<\/a>\u00a0that are synchronized between all the Arcade repositories using dependency flow (more on that later). These script files introduce a common &#8216;language&#8217; for repositories that have adopted Arcade. Moving between these repositories becomes more seamless for developers. Moreover, because these scripts are synced between repositories, rolling out new changes to the original copies located in the Arcade repo can quickly introduce new features or behavior into repositories that have fully adopted the shared tooling.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"142\"><strong>Shared Azure DevOps job and step templates<\/strong>\u00a0&#8211; While the scripts that define the common repository &#8216;language&#8217; are primarily targeted towards interfacing with humans, Arcade also has a set of Azure DevOps\u00a0<a title=\"https:\/\/docs.microsoft.com\/en-us\/azure\/devops\/pipelines\/process\/templates?view=azure-devops\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/devops\/pipelines\/process\/templates?view=azure-devops\">job and step templates<\/a>\u00a0that allow for Arcade repositories to interface with the Azure DevOps CI systems. Like the common build task packages, the step templates form a base layer that can be utilized by almost every repository (e.g. to send build telemetry). The job templates form more complete units, allowing repositories to worry less about the details of their CI process.<\/li>\n<\/ul>\n<h3 id=\"moving-to-azure-devops\" class=\"code-line code-line code-line\" data-line=\"151\">Moving to Azure DevOps<\/h3>\n<p class=\"code-line code-line code-line\" data-line=\"153\">As noted above, the larger team used a combination of CI systems through the 2.2 release:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"154\">AppVeyor and Travis for\u00a0ASP.NET\u00a0Core&#8217;s GitHub PRs<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"155\">TeamCity for\u00a0ASP.NET&#8217;s official builds<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"156\">Jenkins for the rest of .NET Core&#8217;s GitHub PRs and rolling validation.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"157\">Classic (non-YAML) Azure DevOps workflows for all\u00a0non-ASP.NET\u00a0Core official builds.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"159\">A lot of differentiation was simply from necessity. Azure DevOps did not support public GitHub PR\/CI validation, so\u00a0ASP.NET\u00a0Core turned to AppVeyor and Travis to fill the gap while .NET Core invested in Jenkins. Classic Azure DevOps did not have a lot of support for build orchestration, so the\u00a0ASP.NET\u00a0Core team turned to TeamCity while the .NET Core team built a tool called PipeBuild on top of Azure DevOps to help out. All of this divergence was very expensive, even in some non-obvious ways:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"165\">While Jenkins is flexible, maintaining a large (~6000-8000 jobs), stable installation is a serious undertaking.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"167\">Building our own orchestration on top of classic Azure DevOps required a lot of compromises. The checked in pipeline job descriptions were not really human-readable (they were just exported json descriptions of manually created build definitions), secrets management was ugly, and they quickly became over-parameterized as we attempted to deal with the wide variance in build requirements.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"172\">When official build vs. nightly validation vs. PR validation processes are defined in different systems, sharing logic becomes difficult. Developers must take additional care when making process changes because and breaks are common. We defined Jenkins PR jobs in a special script file, TeamCity had lots of manually configured jobs, AppVeyor and Travis used their own yaml formats, and Azure DevOps had the obscure custom system we built on top of it. It was easy to make a change to build logic in a PR and break the official CI build. To mitigate this, we did work to keep as much logic in scripting common to official CI and PR builds, but invariably differences creep in over time. Some variance, like in build environments, is basically impossible to entirely remove.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"181\">Practices for making changes to workflows varied wildly and were often difficult to understand. What a developer learned about Jenkins&#8217;s netci.groovy files for updating PR logic did not translate over to the PipeBuild json files for official CI builds. As a result, knowledge of the systems was typically isolated to a few team members, which is less than ideal in large organizations.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"187\">When Azure DevOps began to roll out YAML based build pipelines and support for public GitHub projects as .NET Core 3.0 began to get underway, we recognized we had a unique opportunity. With this new support, we could move all our existing workflows out of the separate systems and into modern Azure DevOps and also make some changes to how we deal with official CI vs. PR workflows. We started with the following rough outline of the effort:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"192\">Keep all our logic in code, in GitHub. Use the YAML pipelines everywhere.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"193\">Have a public and private project.\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"194\">The public project will run all the public CI via GitHub repos and PRs as we always have<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"195\">The private project will run official CI be the home of any private changes we need to make, in repositories matching the public GitHub repositories<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"197\">Only the private project will have access to restricted resources.<\/li>\n<\/ul>\n<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"198\">Share the same YAML between official CI and PR builds. Use\u00a0<a title=\"https:\/\/docs.microsoft.com\/en-us\/azure\/devops\/pipelines\/process\/templates?view=azure-devops#template-expressions\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/devops\/pipelines\/process\/templates?view=azure-devops#template-expressions\">template expressions<\/a>\u00a0to differentiate between the public and private project where behavior must diverge, or resources only available in the private project would be accessed. While this often makes the overall YAML definition a little messier, it means that:\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"203\">The likelihood of a build break when making a process change is lower.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"204\">A developer only really needs to change one set of places to change official CI and PR process.<\/li>\n<\/ul>\n<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"206\">Build up Azure DevOps templates for common tasks to keep duplication of boilerplate YAML to a minimum, and enable rollout of updates (e.g. telemetry) easy using dependency flow.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"209\">As of now, all of the primary .NET Core 3.0 repositories are on Azure DevOps for their public PRs and official CI. A good example pipeline is the official build\/PR pipeline for <a title=\"https:\/\/github.com\/dotnet\/arcade\/blob\/master\/azure-pipelines.yml\" href=\"https:\/\/github.com\/dotnet\/arcade\/blob\/master\/azure-pipelines.yml\">dotnet\/arcade<\/a>\u00a0itself.<\/p>\n<h3 id=\"maestro-and-dependency-flow\" class=\"code-line code-line code-line\" data-line=\"213\">Maestro and Dependency Flow<\/h3>\n<p class=\"code-line code-line code-line\" data-line=\"215\">The final piece of the .NET Core 3.0 infrastructure puzzle is what we call dependency flow. This is not a unique concept to .NET Core. Unless they are entirely self-contained, most software projects contain some kind of versioned reference to other software. In .NET Core, these are commonly represented as NuGet packages. When we want new features or fixes that libraries have shipped, we pull those new updates by updating the referenced version numbers in our projects. Of course, these packages may also have versioned references to other packages, those other packages may have more references, so on and so forth. This creates a graph. Changes flow through the graph as each repository pulls new versions of their input dependencies.<\/p>\n<h4 id=\"a-complex-graph\" class=\"code-line code-line code-line\" data-line=\"224\">A Complex Graph<\/h4>\n<p class=\"code-line code-line code-line\" data-line=\"226\">The primary development life-cycle (what developers regularly work on) of most software projects typically involves a small number of inter-related repositories. Input dependencies are typically stable and updates are sparse. When they do need to change, it&#8217;s usually a manual operation. A developer evaluates the available versions of the input package, chooses an appropriate one, and commits the update. This is not the case in .NET Core. The need for components to be independent, ship on different cadences and have efficient inner-loops development experiences has led to a fairly large number of repositories with a large amount of inter-dependency. The inter-dependencies also form a fairly deep graph:<\/p>\n<p><figure id=\"attachment_23617\" aria-labelledby=\"figcaption_attachment_23617\" class=\"wp-caption aligncenter\" ><a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-RepoGraph.png\"><img decoding=\"async\" class=\"wp-image-23617 size-large\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-RepoGraph-1024x504.png\" alt=\"\" width=\"640\" height=\"315\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-RepoGraph-1024x504.png 1024w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-RepoGraph-300x148.png 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-RepoGraph-768x378.png 768w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-RepoGraph.png 2048w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"figcaption_attachment_23617\" class=\"wp-caption-text\">The dotnet\/core-sdk repository serves as the aggregation point for all sub-components. We ship a specific build of dotnet\/core-sdk, which describes all other referenced components.<\/figcaption><\/figure><\/p>\n<p>&nbsp;<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"239\">We also expect that new outputs will flow quickly through this graph so that the end product can be validated as often as possible. For instance, we expect the latest bits of\u00a0ASP.NET\u00a0Core or the .NET Core Runtime to express themselves in the SDK as often as possible. This essentially means updating dependencies in each repository on a regular, fast cadence. In a graph of sufficient size, like .NET Core has, this quickly becomes an impossible task to do manually. A software project of this size might go about solving this is a number of ways:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"245\"><strong>Auto-floating input versions<\/strong>\u00a0&#8211; In this model, dotnet\/core-sdk might reference the Microsoft.NETCore.App produced out of dotnet\/core-setup by allowing NuGet to float to the latest prerelease version. While this works, it suffers from major drawbacks. Builds become non-deterministic. Checking out an older git SHA and building will not necessarily use the same inputs or produce the same outputs. Reproducing bugs becomes difficult. A bad commit in dotnet\/core-setup can break any repository pulling in its outputs, outside of PR and CI checks. Orchestration of builds becomes a major undertaking, because separate machines in a build may restore packages at different times, yielding different inputs. All of these problems are &#8216;solvable&#8217;, but require huge investment and unnecessary complication of the infrastructure.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"254\"><strong>&#8216;Composed&#8217; build<\/strong>\u00a0&#8211; In this model, the entire graph is built all at once in isolation, in dependency order, using the latest git SHAs from each of the input repositories. The outputs from each stage of the build are fed into the next stage. A repository effectively has its input dependency version numbers overwritten by its input stages. At the end of a successful build, the outputs are published and all the repositories update their input dependencies to match what was just built. This is a bit of an improvement over auto-floating version numbers in that individual repository builds aren&#8217;t automatically broken by bad check-ins in other repos, but it still has major drawbacks. Breaking changes are almost impossible to flow efficiently between repositories, and reproducing failures is still problematic because the source in a repository often doesn&#8217;t match what was actually built (since input versions were overwritten outside of source control).<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"264\"><strong>Automated dependency flow<\/strong>\u00a0&#8211; In this model, external infrastructure is used to automatically update dependencies in a deterministic, validated fashion between repositories. Repositories explicitly declare their input dependencies and associated versions in source, and &#8216;subscribe&#8217; to updates from other repositories. When new builds are produced, the system finds matching subscriptions, updates any of the declared input dependencies, and opens a PR with the changes. This method improves reproducibility, the ability to flow breaking changes, and allows a repository owner to have control over how updates are done. On the downside, it can be significantly slower than either of the other two methods. A change can only flow from the bottom of the stack to the top as fast as the total sum of the PR and Official CI times in each repository along the flow path.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"275\">.NET Core has tried all 3 methods. We floated versions early on in the 1.x cycle, had some level of automated dependency flow in 2.0 and went to a composed build for 2.1 and 2.2. With 3.0 we decided to invest heavily in automated dependency flow and abandon the other methods. We wanted to improve over our former 2.0 infrastructure in some significant ways:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"279\"><strong>Ease traceability of what is actually in the product<\/strong>\u00a0&#8211; At any given repository, it&#8217;s generally possible to determine what versions of what components are being used as inputs, but almost always hard to find out where those components were built, what git SHAs they came from, what their input dependencies were, etc.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"283\"><strong>Reduce required human interaction<\/strong>\u00a0&#8211; Most dependency updates are mundane. Auto-merge the update PRs as they pass validation to speed up flow.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"285\"><strong>Keep dependency flow information separate from repository state<\/strong>\u00a0&#8211; Repositories should only contain information about the current state of their node in the dependency graph. They should not contain information regarding transformation, like when updates should be taken, what sources they pull from, etc.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"289\"><strong>Flow dependencies based on &#8216;intent&#8217;, not branch<\/strong>\u00a0&#8211; Because .NET Core is made up of quite a few semi-autonomous teams with different branching philosophies, different component ship cadences, etc. do not use branch as a proxy for intent. Teams should define what new dependencies they pull into their repositories based on the purpose of those inputs, not where they came from. Furthermore, the purpose of those inputs should be declared by those teams producing those inputs.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"295\"><strong>&#8216;Intent&#8217; should be deferred from the time of build<\/strong>\u00a0&#8211; To improve flexibility, avoid assigning the intent of a build until after the build is done, allowing for multiple intentions to be declared. At the time of build, the outputs are just a bucket of bits built at some git SHA. Just like running a release pipeline on the outputs of an Azure DevOps build essentially assigns a purpose for the outputs, assigning an intent to a build in the dependency flow system begins the process of flowing dependencies based on intent.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"302\">With these goals in mind, we created a service called Maestro++ and a tool called &#8216;darc&#8217; to handle our dependency flow. Maestro++ handles the data and automated movement of dependencies, while darc provides a human interface for Maestro++ as well as a window into the overall product dependency state. Dependency flow is based around 4 primary concepts: dependency information, builds, channels and subscriptions.<\/p>\n<h4 id=\"builds-channels-and-subscriptions\" class=\"code-line code-line code-line\" data-line=\"308\">Builds, Channels, and Subscriptions<\/h4>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"310\"><strong>Dependency information<\/strong>\u00a0&#8211; In each repository, there is a declaration of the input dependencies of the repository along with source information about those input dependencies in the\u00a0<a title=\"https:\/\/github.com\/dotnet\/core-sdk\/blob\/master\/eng\/Version.Details.xml\" href=\"https:\/\/github.com\/dotnet\/core-sdk\/blob\/master\/eng\/Version.Details.xml\">eng\/Version.Details<\/a>. Reading this file, then transitively following the repository+sha combinations for each input dependency yields the product dependency graph.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"315\"><strong>Builds<\/strong>\u00a0&#8211; A build is just the Maestro++ view on an Azure DevOps build. A build identifies the repository+sha, overall version number and the full set of assets and their locations that were produced from the build (e.g. NuGet packages, zip files, installers, etc.).<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"318\"><strong>Channels<\/strong>\u00a0&#8211; A channel represents intent. It may be useful to think of a channel as a cross repository branch. Builds can be assigned to one or more channels to assign intent to the outputs. Channels can be associated with one or more release pipelines. Assignment of a build to a channel activates the release pipeline and causes publishing to happen. The asset locations of the build are updated based on release publishing activities.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"323\"><strong>Subscriptions<\/strong>\u00a0&#8211; A subscription represents transform. It maps the outputs of a build placed on a specific channel onto another repository&#8217;s branch, with additional information about when those transforms should take place.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"327\">These concepts are designed so that repository owners do not need global knowledge of the stack or other teams&#8217; processes in order to participate in dependency flow. They basically just need to know three things:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"330\">The intent (if any) of the builds that they do, so that channels may be assigned.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"331\">Their input dependencies and what repositories they are produced from.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"332\">What channels they wish to update those dependencies from.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"334\">As an example, let&#8217;s say I own the dotnet\/core-setup repository. I know that my master branch produces bits for day to day .NET Core 3.0 development. I want to assign new builds to the pre-declared &#8216;.NET Core 3.0 Dev&#8217; channel. I also know that I have several dotnet\/coreclr and dotnet\/corefx package inputs. I don&#8217;t need to know how they were produced, or from what branch. All I need to know is that I want the newest dotnet\/coreclr inputs from the &#8216;.NET Core 3.0 Dev&#8217; channel on a daily basis, and the newest dotnet\/corefx inputs from the &#8216;.NET Core 3.0 Dev&#8217; channel every time they appear.<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"342\">First, I onboard by adding an\u00a0<a title=\"https:\/\/github.com\/dotnet\/core-setup\/blob\/master\/eng\/Version.Details.xml\" href=\"https:\/\/github.com\/dotnet\/core-setup\/blob\/master\/eng\/Version.Details.xml\">eng\/Version.Details<\/a>\u00a0file. I then use the &#8216;darc&#8217; tool to ensure that every new build of my repository on the master branch is assigned by default to the &#8216;.NET Core 3.0 Dev&#8217; channel. Next, I set up subscriptions to pull inputs from .NET Core 3.0 Dev for builds of dotnet\/corefx, dotnet\/coreclr, dotnet\/standard, etc. These subscriptions have a cadence and auto-merge policy (e.g. weekly or every build).<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"349\">As the trigger for each subscription is activated, Maestro++ updates files (eng\/Version.Details.xml, eng\/Versions.props, and a few others) in the core-setup repo based on the declared dependencies intersected with the newly produced outputs. It opens a PR, and once the configured checks are satisfied, will automatically merge the PR.<\/p>\n<p class=\"code-line\" data-line=\"346\"><a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-UpdatePR.jpg\"><img decoding=\"async\" class=\"aligncenter wp-image-23618 size-large\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-UpdatePR-1024x783.jpg\" alt=\"\" width=\"640\" height=\"489\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-UpdatePR-1024x783.jpg 1024w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-UpdatePR-300x230.jpg 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-UpdatePR-768x588.jpg 768w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlog-UpdatePR.jpg 1588w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p class=\"code-line\" data-line=\"348\">This in turn generates a new build of core-setup on the master branch. Upon completion, automatic assignment of the build to the &#8216;.NET Core 3.0 Dev&#8217; channel is started. The &#8216;.NET Core 3.0 Dev&#8217; channel has an associated release pipeline which pushes the build&#8217;s output artifacts (e.g. packages and symbol files) to a set of target locations. Since this channel is intended for day to day public dev builds, packages and symbols are pushed to various public locations. Upon release pipeline completion, channel assignment is finalized and any subscriptions that activate on this event are fired. As more components are added, we build up a full flow graph representing all of the automatic flow between repositories.<\/p>\n<p><figure id=\"attachment_23616\" aria-labelledby=\"figcaption_attachment_23616\" class=\"wp-caption aligncenter\" ><a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlob-FlowGraph.png\"><img decoding=\"async\" class=\"wp-image-23616 size-large\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlob-FlowGraph-1024x502.png\" alt=\"\" width=\"640\" height=\"314\" srcset=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlob-FlowGraph-1024x502.png 1024w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlob-FlowGraph-300x147.png 300w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlob-FlowGraph-768x377.png 768w, https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2019\/06\/NetCoreInfraBlob-FlowGraph.png 2048w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><figcaption id=\"figcaption_attachment_23616\" class=\"wp-caption-text\">Flow graph for the .NET Core 3 Dev channel, including other channels that (e.g. Arcade&#8217;s &#8216;.NET Tools Latest&#8217;) that contribute to the .NET Core 3 Dev flow.<\/figcaption><\/figure><\/p>\n<p>&nbsp;<\/p>\n<h4 id=\"coherency-and-incoherency\" class=\"code-line code-line code-line\" data-line=\"369\">Coherency and Incoherency<\/h4>\n<p class=\"code-line code-line code-line\" data-line=\"371\">The increased visibility into the state of .NET Core&#8217;s dependency graph highlighted an existing question:\u00a0<strong>What happens when multiple versions of the same component are referenced at various nodes in the graph?<\/strong>\u00a0Each node in .NET Core&#8217;s dependency graph may flow dependencies to more than one other node. For instance, the Microsoft.NETCore.App dependency, produced out of dotnet\/core-setup, flows to dotnet\/toolset, dotnet\/core-sdk, aspnet\/extensions and a number of other places. Updates of this dependency will be committed at different rates in each of those places, due to variations in pull request validation time, need for reaction to breaking changes, and desired subscription update frequencies. As those repositories then flow elsewhere and eventually coalesce under dotnet\/core-sdk, there may be a number of different versions of Microsoft.NETCore.App that have been transitively referenced throughout the graph. This is called\u00a0<strong>incoherency<\/strong>. When only a single version of each product dependency is referenced throughout the dependency graph, the graph is\u00a0<strong>coherent<\/strong>. We always strive to ship a coherent product if possible.<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"385\"><em><strong>What kinds of problems of does incoherency cause?<\/strong><\/em>\u00a0Incoherency represents a\u00a0<em>possible<\/em>\u00a0error state. For an example let&#8217;s take a look at Microsoft.NETCore.App. This package represents a specific API surface area. While multiple versions of Microsoft.NETCore.App may be referenced in the repository dependency graph, the SDK ships with just one. This runtime must satisfy all of the demands of the transitively referenced components (e.g. WinForms and WPF) that may execute on that runtime. If the runtime does not satisfy those demands (e.g. breaking API change), failures may occur. In an incoherent graph, because all repositories have not ingested the same version of Microsoft.NETCore.App, there is a possibility that a breaking change has been missed.<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"394\"><em><strong>Does this mean that incoherency is always an error state?<\/strong><\/em>\u00a0No. For example, let&#8217;s say that the the incoherency of Microsoft.NETCore.App in the graph only represents a single change in coreclr, a single non-breaking JIT bug fix. There would technically be no need to ingest the new Microsoft.NETCore.App at each point in the graph. Simply shipping the same components against the new runtime will suffice.<\/p>\n<p class=\"code-line code-line code-line\" data-line=\"400\"><em><strong>If incoherency only matters occasionally, why do we strive to ship a coherent product?<\/strong><\/em>\u00a0Because determining when incoherency does not matter is hard. It is easier to simply ship with coherency as the desired state than attempt to understand any semantic effects differences between incoherent components will have on the completed product. It can be done, but on a build to build basis it is time intensive and prone to error. Enforcing coherency as the default state is safer.<\/p>\n<h4 id=\"dependency-flow-goodies\" class=\"code-line code-line code-line\" data-line=\"406\">Dependency Flow Goodies<\/h4>\n<p class=\"code-line code-line code-line\" data-line=\"408\">All this automation and tracking has a ton of advantages that become apparent as the repository graph gets bigger. It opens up a lot of possibility to solve real problems we have on a day to day basis. While we have just begun to explore this area, the system can begin to answer interesting questions and handling scenarios like:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"412\">What &#8216;real&#8217; changes happened between git SHA A and SHA B of dotnet\/core-sdk? &#8211; By building up a full dependency graph by walking the Version.Details.xml files, I can identify the non-dependency changes change happened in the graph.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"415\">How long will it take for a fix to appear in the product? &#8211; By combining the repository flow graph and per-repository telemetry, we can estimate how long it will take to move a fix from repo A to repo B in the graph. This is especially valuable late in a release, as it helps us make a more accurate cost\/benefit estimation when looking at whether to take specific changes. For example: Do we have enough time to flow this fix and complete our scenario testing?<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"420\">What are the locations of all assets produced by a build of core-sdk and all of its input builds?<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"421\">In servicing releases, we want to take specific fixes but hold off on others. Channels could be placed into modes where a specific fix is allowed to flow automatically through the graph, but others are blocked or require approval.<\/li>\n<\/ul>\n<h2 id=\"whats-next\" class=\"code-line code-line code-line\" data-line=\"425\">What&#8217;s next?<\/h2>\n<p class=\"code-line code-line code-line\" data-line=\"427\">As .NET Core 3.0 winds down, we&#8217;re looking for new areas to improve. While planning is still in the (very) early stages, we expect investments in a some key areas:<\/p>\n<ul>\n<li class=\"code-line code-line code-line\" data-line=\"429\">Reduce the time to turn a fix into a shippable, coherent product &#8211; The number of hops in our dependency graph is significant. This allows repositories a lot of autonomy in their processes, but increases our end to end &#8216;build&#8217; time as each hop requires a commit and official build. We&#8217;d like to significantly reduce that end-to-end time.<\/li>\n<li class=\"code-line code-line code-line\" data-line=\"433\">Improving our infrastructure telemetry &#8211; If we can better track where we fail, what our resource usage looks like, what our dependency state looks like, etc. we can better determine where our investments need to be to ship a better product. In .NET Core 3.0 we took some steps in this direction but we have a ways to go.<\/li>\n<\/ul>\n<p class=\"code-line code-line code-line\" data-line=\"438\">We&#8217;ve evolved our infrastructure quite a bit over the years. From Jenkins to Azure DevOps, from manual dependency flow to Maestro++, and from many tooling implementations to one, the changes we&#8217;ve made to ship .NET Core 3.0 are a huge step forward. We&#8217;ve set ourselves up to develop and ship a more exciting product more reliably than ever before.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A deep dive into the infrastructure of .NET Core.  Challenges, changes and the future.<\/p>\n","protected":false},"author":5251,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[196],"tags":[],"class_list":["post-23615","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet-core"],"acf":[],"blog_post_summary":"<p>A deep dive into the infrastructure of .NET Core.  Challenges, changes and the future.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/23615","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/5251"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=23615"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/23615\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=23615"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=23615"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=23615"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}