March 28th, 2007

Stop piling on when the build breaks: Build checkin policy for Continuous Integration in Orcas

Buck Hodges
Director of Engineering

Last fall, Clark Sell wrote a blog post called Stop, the build is broken!! that introduced a checkin policy that reported errors when the build was broken.  If you are using continuous integration where every checkin starts a build, you want folks to stop and fix build breaks when they occur, rather than pile on more checkins and perhaps make the problem worse (or at least harder to sort out).

Since we’ve added support for continuous integration in Team Build for Orcas (screencasts, demo), we thought that it was a really great idea, and we’ve added a simple checkin policy in Orcas Team Build that does this (it will be in beta 1, but it is not in the March Orcas CTP).  It works differently than his does (he had was constrained to what v1 had to offer), and it only works with Orcas clients (not with TFS 2005 clients, which would see an error message about not finding the checkin policy).

Here’s what the policy does.

  1. Request from the server a list of build definitions affected by this check in
  2. For each build definition returned where the last build was not “good,” create a checkin policy error message containing the build definition’s name and the user that triggered the build.

If the policy detects a broken CI build, you’ll get a message like the following when you attempt to check in.

The last build of definition WebProjects_SimpleWebService, triggered by user buck, failed.

A “good” build is one where compilation and testing were successful.  If something goes wrong after the test phase, it’s still considered a good build.  This notion of a good build is the same as it was in v1, and it has some shortcomings.  We’re going to refine it and make it more flexible in the release after Orcas.

There’s nothing to configure for this checkin policy, so you aren’t stuck with maintaining a list of build definitions for the checkin policy to monitor.  The first step calls the same code on the server that is used by the continuous integration feature.  Based on the list of pending changes’ server paths involved in the checkin and the workspace mappings for each of the build definitions, the server is able to quickly determine which build definitions are affected by your changes.  It’s all automatic!

See Walkthrough: Customizing Check-in Policies and Notes for how to enable a checkin policy for a team project.

We’re interested in your feedback, so post a comment and let us know what you think.

Low-level details

If you want to see how it works and see a little of the new Orcas Build API, I’ll explain the details of how it works.  If you aren’t interested in the low-level details, feel free to skip this.

Here is all of the code that isn’t just “boilerplate” checkin policy code.

To prevent being called repeatedly in a short time span, it uses a timer to ensure that a minimum of 10 seconds elapse between calls.  There’s nothing special about 10 seconds, and we may even lengthen it to a minute.  The important part is that since this policy makes at least one web service call, it needs to make sure being evaluated often doesn’t cause too many web service calls and present a performance problem.

The first thing that the policy’s Evaluate() method does is get a reference to the central object in the Orcas Team Build API, IBuildServer.  Next it gets the list of pending changes that are going to be checked in.

Then it calls GetAffectedBuildDefinitions(), which does what I described in step 1 earlier.  It’s a new web service method on the Orcas server that determines which build definitions are affected by changes to a list of server paths.  Having the workspace mappings for the build definitions stored in the Orcas database, rather than in the old WorkspaceMapping.xml file, is what makes this and continuous integration efficient and automatic.  Otherwise, you’d have to manually specify what paths affect each build definition, which would be a maintenance headache.

After getting the affected build definitions, it checks to see if the artifact URI for the last build is the same as the artifact URI for the last good build.  If those are set to the same URI, the last build was good.  Otherwise, the most recent build was not a good build.  Here we also check to see whether the build type is a continuous integration build, either every checkin (Individual) or a set of checkins over some time period (Batch).

If we have any broken builds, we need to get the details for the build so that we can report who may have broken the build.  For CI builds where it is building each checkin individually, it is really the person that broke the build (assuming this is the first broken build).  For CI builds where it’s building the checkins from a period of time, such as the last 30 minutes, it might be the person who broke the build or it may not, since more than one person may have checked in.  Regardless, that’s a good person to start with when investigating the broken build.

        public override void Initialize(IPendingCheckin pendingCheckin)
        {
            base.Initialize(pendingCheckin);

m_timer = new Stopwatch(); }

        public override PolicyFailure[] Evaluate()
        {
            if (Disposed)
            {
                throw new ObjectDisposedException(null);
            }

IBuildServer buildServer = (IBuildServer) PendingCheckin.GetService(typeof(IBuildServer));

// If there are any pending changes, determine whether there build definitions that are // affected for which the last build was not a good build. Make sure that we don’t call // this rapidly in succession. List<PolicyFailure> failures = new List<PolicyFailure>(); PendingChange[] pendingChanges = PendingCheckin.PendingChanges.CheckedPendingChanges; if (pendingChanges.Length > 0 && (!m_timer.IsRunning || m_timer.ElapsedMilliseconds >= 10000)) { IBuildDefinition[] definitions = buildServer.GetAffectedBuildDefinitions(
PendingChange.ToServerItems(pendingChanges));

List<Uri> brokenBuilds = new List<Uri>(); List<IBuildDefinition> brokenBuildDefs = new List<IBuildDefinition>(); foreach (IBuildDefinition definition in definitions) { // Since this policy is geared toward folks using continuous integration, only fail for build // definitions that have CI trigger. if (definition.LastBuildUri != definition.LastGoodBuildUri && (definition.ContinuousIntegrationType == ContinuousIntegrationType.Batch || definition.ContinuousIntegrationType == ContinuousIntegrationType.Individual)) { brokenBuilds.Add(definition.LastBuildUri); brokenBuildDefs.Add(definition); } }

if (brokenBuilds.Count > 0) { // Look up the broken builds to see who triggered them. IBuildDetail[] buildDetails = buildServer.QueryBuildsByUri(brokenBuilds.ToArray(), null, QueryOptions.None);

// Create a failure for each broken build, skipping any build that wasn’t returned due to // insufficient permissions or being deleted. for (int i = 0; i < buildDetails.Length; i++) { if (buildDetails[i] != null) { String requestedFor = UserNameUtil.MakePartial(buildDetails[i].RequestedFor, PendingCheckin.PendingChanges.Workspace.VersionControlServer.AuthenticatedUser); failures.Add(new PolicyFailure(ResourceStrings.Format(ResourceStrings.BuildPolicyBuildBroken, brokenBuildDefs[i].Name, requestedFor), this)); } } } }

m_timer.Reset(); m_timer.Start();

return failures.ToArray(); }

[NonSerialized] private Stopwatch m_timer;

Author

Buck Hodges
Director of Engineering

Director of Engineering, Azure DevOps

0 comments