Azure SDK Release (July 2020) – Routing Failure Notification in Azure DevOps

Daniel Jurek

Welcome to the July release of the Azure SDK. We have updated the following libraries:

  • App Configuration
  • Cosmos DB (Java only)
  • Key Vault (.NET only)
  • Search

These are ready to use in your production applications. You can find details of all released libraries on our releases page.

New preview releases:

  • Event Hubs
  • Form Recognizer
  • Key Vault
  • Service Bus
  • Storage

We believe these are ready for you to use and experiment with, but not yet ready for production. Between now and the GA release, these libraries may undergo API changes. We’d love your feedback! If you use these libraries and like what you see, or you want to see changes, let us know in GitHub issues.

Getting Started

Use the links below to get started with your language of choice. You will notice that all the preview libraries are tagged with “preview”.

If you want to dive deep into the content, the release notes linked above and the change logs they point to give more details on what has changed.

Routing Failure Notification in Azure DevOps

The Azure SDK Engineering System runs roughly 480 Azure Pipelines to validate GitHub pull requests, run live test automation, and ensure that the libraries are ready to be released. If a build or test fails, our alerting system immediately notifies our SDK engineering teams and partner Azure service teams. In this post, I’ll show how the Azure SDK Engineering Systems team scaled Azure DevOps notifications to alert the right people at the right time when our automation discovers an issue.

That’s a lot of pipelines

We generally create one pipeline per service/language combination to build and test code relating to that pipeline. This keeps our DevOps agent machine time focused more narrowly and enables engineers to iterate more quickly on changes. For example, there’s no need to build packages and run tests for Storage when you’re making changes to Key Vault.

Keeping this many pipelines organized requires a standard approach to how we create, name, configure, and trigger pipelines. When you can apply standards to the work, you can automate that work. We built automation that uses the Azure DevOps API. A small team of engineers can scale our Engineering Systems to meet the needs of an ever growing matrix of services and languages.

Alert System Architecture

The Engineering Systems team looks for existing tools to handle problems. Azure Pipelines includes an alerting system does most of the work for us. Rather than build our own alerting system, we automate configuring Azure DevOps notification groups.

Our alerting system makes use of the GitHub CODEOWNERS file that teams are already using to add reviewers to pull requests. These same owners are responsible for fixing code in the service SDK when a build or test fails. To route notifications, we need to do a few things:

  1. Get a list of pipelines with schedule triggers.
  2. Create notification groups for each pipeline.
  3. Subscribe the notificaiton groups to receive failure notifications.
  4. Synchronize the code owners into the notification group.

Get a list of pipelines with schedule triggers

We use the .NET client libraries for Azure DevOps and TFS in our tools to simplify working with the Azure DevOps API.

The client library returns an enumerable list of definition types, and we use LINQ to filter the results to what we need. This example shows how to filter pipeline definitions based on the trigger type:

using Microsoft.TeamFoundation.Build.WebApi;
using Microsoft.VisualStudio.Services.Common;
using Microsoft.VisualStudio.Services.WebApi;
using System;
using System.Linq;
using System.Threading.Tasks;

namespace DemoFilteringScheduledPipelines
{
    class Program
    {
         static async Task Main(string[] args)
        {
            const string devOpsToken = "<devops_pat>";
            const string organization = "<your_organization_name>";
            const string projectName = "<your_project_name>";

            // Create credentials using the PAT
            var devOpsCreds = new VssBasicCredential("nobody", devOpsToken);
            var devOpsConnection = new VssConnection(new Uri($"https://dev.azure.com/{organization}/"), devOpsCreds);

            // Create client and fetch a list of definitions for the
            // specified project. The .NET client library does not directly
            // support querying for build definitions based on trigger types
            var buildDefinitionClient = await devOpsConnection.GetClientAsync<BuildHttpClient>();
            var definitions = await buildDefinitionClient.GetFullDefinitionsAsync2(project: projectName);

            // Use LINQ to filter definitions to those definitions that have schedules
            var targetDefinitions = definitions.Where(def =>
                def.Triggers.Any(trigger => trigger.TriggerType == DefinitionTriggerType.Schedule));


            foreach (var definition in targetDefinitions)
            {
                Console.WriteLine($"{definition.Name} ({definition.Id}) has a schedule");
            }
        }
    }
}

Create notification groups

For each of the filtered pipelines, we create two groups or “Teams” in Azure DevOps. The parent notification team serves as the central point of contact. We can add add arbitrary members to the parent team, like project or Engineering System administrators, who want to know when a pipeline fails but don’t want to be a code owner on GitHub. The parent notification team is subscribed to alerts for its pipeline.

The synchronized notification team contains the contacts from the CODEOWNERS file and is a member of the parent notification team. When a GitHub alias is added or removed from the CODEOWNERS file for a given pipeline the corresponding contact is added or removed from the Synchronized Notification Team.

The synchronized notification team is a member of the parent notification team

We keep track of groups using YAML in the Description field of the pipeline. This removes the need to store data in another database. For example:

pipelineId: 123
purpose: ParentNotificationTeam

The pipelineId field gives the Azure DevOps pipeline numerical ID and the purpose can either be ParentNotificationTeam or SynchronizedNotificationTeam.

Synchronize CODEOWNERS

Here is a subset of the CODEOWNERS file in the Azure SDK for .NET repo.

# Catch all
/sdk/        @AlexGhiondea

# Core
/sdk/core/        @pakrym @KrzysztofCwalina
…
# Service teams
/sdk/appconfiguration/    @annelo-msft @AlexanderSher

In this example we see a couple of services under the sdk/ folder (Core and App Configuration) and a catch-all contact who can be alerted when a specific entry doesn’t exist for a service. For example, if we added a service called Foo but did not have an entry in the CODEOWNERS file, @AlexGhiondea would still receive notifications for Foo pipeline failures.

Our pipeline definition YAML files exist at the level of the service (e.g. /sdk/appconfiguration/ci.yml) and match the path leading up to the .yml file (e.g. /sdk/appconfiguration/) against the service team folder for each pipeline definition.

We translate the GitHub aliases using a Microsoft database of employees who contribute on behalf of the company and synchronize those contacts into the synchronized notification team.

Use our tools

The notification creation tool is open source so you can use it to build your own notification strategy. It is designed to work with repositories that follow our repository layout and pipeline structure but can be modified and extended to work with other environments. You can find the code in out tools repository

If you are working outside of Microsoft you will need to adapt the GitHubNameResolver class to work with your own contact resolving strategy. The process may be as simple as getting the user’s email address from the GitHub User API.

Other things to try

Some ideas we had when designing this project but have not implemented yet:

Alert Microsoft Teams – The Azure SDK team makes considerable use of Microsoft Teams. Putting these alerts in a Teams channel could work better than email for the ways in which some teams work.

Make use of error logs – Azure DevOps has a Logging Command feature where a specifically formatted message sent to the console will generate an alert in Azure DevOps. These alerts are the only detail a developer will see in an email and a little more effort here can help a product engineer quickly make sense of the failure and save time on investigations:

An example of error message logs in an Azure DevOps failure email

Conclusion

Using and enforcing consistent pipeline and repo layouts means that our Engineering System will scale smoothly to meet the needs of our product engineers. Small tools like this one can tune your system in ways that help your product teams stay focused on their goals.

Working with us and giving feedback

So far, the community has filed hundreds of issues against these new SDKs with feedback ranging from documentation issues to API surface area change requests to pointing out failure cases. Please keep that coming. We work in the open on GitHub and you can submit issues here:

Finally, please keep up to date with all the news about the Azure developer experience programs and let us know how we are doing by following @AzureSDK on Twitter.

0 comments

Discussion is closed.

Feedback usabilla icon