How the .NET Team uses Azure Pipelines to produce Docker Images
Producing Docker images for .NET might not seem like that big of a deal. Once you’ve got a Dockerfile defined, just run “docker build“ and “docker push“ and you’re done, right? Then just rinse and repeat when new versions of .NET are released and that should be all that’s needed. Well, it’s not quite that simple.
When you factor in the number of Linux distros and Windows versions, different processor architectures, and different .NET versions, you end up with a substantial matrix of images that need to be built and published. Then consider that some images have dependencies on others which implies a specific order in which to build the images. And on top of all that, we need to ensure the images are published as quickly as possible so that customers can get their hands on newly released product versions and security fixes. Oh, and by the way, in addition to the .NET Core images we also produce .NET Core nightly images for preview releases, images for developers of .NET Core, as well as images for .NET Framework. This is starting to look a little daunting. Let’s dive into what goes into producing the .NET Docker images.
To keep things “simple”, let’s just consider the Docker images for .NET Core. The same infrastructure is used amongst all the types of images we produce but keep in mind that the scope of the work is greater than described here.
The full set of .NET Core images are derived from the following matrix:
- Linux: 3 distros / 7 versions
- Windows: 4 versions
- Architectures: AMD64, ARM32, ARM64
- .NET Core: 3 versions
In total, 119 distinct images with 309 tags (281 simple and 28 shared) are being produced today. This matrix is constantly evolving as new OS and .NET versions are released.
Anatomy of our Pipeline
Our CI/CD pipeline is implemented using Azure Pipelines with the core YAML-based source located here. It’s divided into three stages: build, test, and publish. Build and test each run multiple jobs in parallel. This parallelism dramatically reduces the pipeline’s execution time from start to finish by an order of magnitude versus running the jobs sequentially.
Since we’ve got jobs running in parallel, we also need a number of build agents that can fulfill the execution of those jobs. There is a self-hosted agent pool that we use for producing .NET images which consists of a variety of virtual machines and physical hardware to meet our platform and perf demands.
For Linux AMD64 builds, we use the Hosted Ubuntu 1604 pool provided by Azure DevOps. That pool meets our performance needs and makes things simple from an operations standpoint.
For Windows AMD64 builds, we have custom Azure VMs configured as Azure Pipeline self-hosted agents that are running four different Windows versions (five agents for each version).
For ARM builds, things get a bit trickier. We need to build and test the Docker images on ARM-based hardware. Since the Azure Pipeline agent software’s support for ARM is limited to Linux/ARM32, we use AMD-based Linux machines as the agents that send commands to remote Linux and Windows ARM devices. Each of those devices runs a Docker daemon. The agent machines act as proxies to send Docker commands to the remote daemons running on the ARM devices. For Linux, we use NVIDA Jetson devices that run on the AArch64 architecture which are capable of building images that target either ARM32 or ARM64. For Windows, we have SolidRun HummingBoard ARM devices.
Image Matrix Generation
One of the key features of Azure Pipelines that we rely on is the matrix strategy for build jobs. It allows a variable number of build jobs to be generated based on an image matrix that is defined by our pipeline. An illustration of a very simplified matrix is the following YAML:
3.0-runtime-deps-disco-graph: imageBuilderPaths: 3.0/runtime-deps/disco 3.0/runtime/disco 3.0/aspnet/disco osType: linux architecture: amd64 3.0-sdk-disco: imageBuilderPaths: –path 3.0/sdk/disco osType: linux architecture: amd64
This matrix would cause two build jobs to execute in parallel, each running the same set of steps but with different inputs. The inputs consist of variables defined by the matrix. The first job, as identified by 3.0-runtime-deps-disco-graph, has a variable named imageBuilderPaths that indicates to the build steps that the .NET Core 3.0 Docker images for runtime-deps, runtime, and aspnet on Ubuntu Disco are to be built. The reason those images are built in a single job is because there are dependencies amongst them. The runtime image depends on runtime-deps and the aspnet image depends on the runtime image; there’s no parallelism that can be done within this graph. The sdk image, however, can be built in parallel with the others because it doesn’t depend on them; it depends on buildpack-deps:disco-scm, an official Docker image.
The goal is to produce a matrix that splits things apart such that operations are executed in parallel whenever possible. You might be thinking that such a matrix has got to be a real headache to maintain. And you’d be right. That’s why we don’t maintain a statically defined matrix. It’s generated for us dynamically at build time by a multi-purpose tool we’ve created called Image Builder. With this tool, we can execute a command that will consume a custom manifest file and outputs a matrix that is consumed by Azure Pipelines. The manifest file contains a bunch of metadata about all the images we need to produce and includes information like the file paths to the Dockerfiles and the tags to be assigned to the images.
We don’t just generate one matrix either. Separate matrices are generated based on the platform and architecture. For example, there are separate matrices for Linux/AMD64, Linux/ARM32, Windows Nano Server 1809/ARM32, etc. The output from Image Builder labels each matrix with its corresponding platform/architecture identifier. This identifier determines which build agents will run that particular matrix. As an example, the pipeline is configured to run the Linux/AMD64 matrix on the Hosted Ubuntu 1604 agent pool.
The build stage of the pipeline is responsible for building the Docker images. There are 64 jobs that are executed which account for the different platform and product version combinations as well as image dependencies. Examples of job names include “Build-2.2-aspnet-Windows-NanoServer1809-AMD64”, “Build-2.1-runtime-deps-graph-Linux-bionic-ARM32v7”, and “Build-3.0-sdk-Linux-bionic-AMD64”.
The first step of this process is to call Image Builder to generate the build matrices. Each matrix produces a set of jobs that build the set of Docker images as described by their portion of the matrix. Remember the imageBuilderPaths variable contained in the matrix example mentioned earlier? This value is fed into Image Builder so that it knows which Docker images it should build. It also uses the metadata in the manifest file to know which tags should be defined for these images. This includes the definition of simple tags (platform-specific and map to a single image) and shared tags (not platform-specific and can map to multiple images).
Because a build agent begins a job in a clean state and has no state from its previous run, there needs to be an external storage mechanism for the Docker images that are produced. For that reason, each job pushes the images it has built to a staging location in an Azure Container Registry (ACR) so they can later be pulled by the agents running in the test stage and eventually published. In some cases, a given image may be used by multiple test jobs so having it available to be pulled from an external source is necessary.
Now that all the images have been built it’s time to test them. This is done with a set of smoke tests that verify the basics, such as being able to create and build a project with the SDK image and run it with the runtime image. Even though these tests are very basic, they have sometimes caught product issues and enabled us to halt publishing a .NET Core update.
Like the build stage, the test stage is split into a set of 34 jobs that run in parallel. Each test job is responsible for testing a specific .NET Core version on a specific operating system version on a specific architecture. Examples of job names include “Test-2.1-Windows-NanoServer1809-AMD64”, “Test-2.2-Linux-alpine3.9-AMD64”, and “Test-3.0-Linux-bionic-ARM64v8”. Notice that the breakdown of jobs is different compared to the build stage as the tests have dependencies on images that are different than the build jobs. For example, even though an SDK image might be able to be built independently of the runtime image, both images are needed together in order to test them because of how our test scenarios are authored. There are not separate jobs that test just the runtime image and just the SDK image; rather, there is one job that tests them both for a given platform/architecture/.NET version. That means each test job selectively pulls down only the images it requires from the staging location in ACR.
Once it’s known that all the images are in a good state from the test stage, we can move on to publishing them to Microsoft Container Registry (MCR). Publishing runs relatively quickly (the entire stage only takes about 3 minutes) because the images are efficiently transferred from ACR to MCR within shared Azure infrastructure. MCR detects this transfer and makes the images available for public consumption.
Included with publishing the images are a few other supplemental steps. The first is to publish the image manifests to support multi-arch using the Docker manifest tool. Next, the README files on Docker Hub are updated to reflect the latest content from the repo’s README files. Lastly, a JSON file is updated that keeps track of metadata about the latest images that have been published. This file serves several purposes, one of which is to provide a way to determine when we need to re-build an image due to its base image being updated. More on that in a future blog post.
It is a testament to the power and flexibility of Azure Pipelines to enable us to produce Docker images at the scale and breadth of platforms that we require. If you’re interested in the nitty-gritty details, check out our pipeline infrastructure.
What are the systems that you have in place for producing your organization’s Docker images? Did this post spark any ideas on changes you could make to your process? Let us know in the comments. And if you’re a consumer of our Docker images, let us know how we’re doing either in the comments or at our GitHub repo.