Monitoring build resources with the TFS 2017 management pack

The Microsoft System Center Management Pack for Visual Studio Team Foundation Server 2017 (what a name!) has been available for about a month now. One important change to note in this version of the management pack is that it no longer supports monitoring of build resources. But don’t worry – you can still easily monitor these resources using other capabilities of System Center.

Team Foundation Server Management Packs for System Center have handled build resources (XAML controllers and agents; Build/Release agents) in various ways over the years.

Early versions discovered build resources by starting from TFS deployments, enumerating build resources for those deployments, and then reaching out to them. This had some advantages, in that it was easy to get all build machines monitored without having to explicitly enumerate them. It also had some disadvantages, however, in that all build machines had to be monitored in order for any monitoring to work.

Later, we moved to an approach where build resources were discovered and monitored independently by System Center agents running on the machines where the build resources lived. This made it possible to monitor a subset of build resources by not installing management agents on the machines that did not need monitoring. Discovering build resources locally required using things like registry keys, however, which then had to be created by our various agents/controllers. This conflicted with our progress toward extremely lightweight agent installation/configuration, and the TFS 2015 management pack couldn’t discover agents running interactively. (Or agents on non-Windows machines, for that matter.)

The registry keys used to discover TFS 2015 build agents were versioned as well, which was going to create a problem moving forward with monitoring TFS deployments across multiple versions from the same System Center deployment.

In TFS 2017, we decided to go a different route with the management pack and abandon our attempts to monitor build resources. Partly this was because of the effort involved in properly monitoring all the different build resource types we support from TFS 2017 (which include XAML controllers and agents from 2010, 2013, and 2015; as well as our new agents from 2015 and 2017, which can run on Windows, OSX, and various Linux OSes). And partly this was because our build monitoring has never added any real value beyond the existing monitoring capabilities available within System Center.

System Center has built-in capabilities to monitor machine availability, processes and Windows Services on Windows computers, and processes on Linux computers. Our recommendation moving forward, for teams who wish to monitor their build resources in addition to their TFS deployments, is to do the following:

Add all the machines where your critical build resources are running to the list of machines monitored by your System Center deployment. This will ensure that you notice when the machines themselves get into trouble, go offline, etc.
For your critical XAML build resources, create a Windows Service monitor for the appropriate service name – TFSBuildServiceHost.2013, for example. Windows Service monitors are nice in that they only monitor those services that are set to start automatically when Windows starts, meaning that you can choose to apply this monitor to all machines in your System Center deployment without worrying that it will start generating alerts on machines that are not TFS XAML build machines.
For your critical Build/Release resources, create process monitors for the appropriate OSes and agent versions – Agent.Listener.exe, for example, for TFS 2017 Windows agents. Process monitors are a bit trickier than Windows Service monitors in that they will apply to every machine in the group you select. As such, your best bet for these monitors is to create a group that includes all and only your critical Build/Release machines. This can be done either by explicitly specifying these machines, or by using naming conventions for your build resources in combination with dynamic rules in your System Center group definition.

We believe this combination of monitors will do as good a job or better of monitoring your TFS build resources for TFS 2017 deployments. Using this approach, you can monitor agents running interactively, agents running on Linux machines, etc. You can even get alerts if CPU or Memory consumption by the agent process or service exceeds a custom threshold. None of this has ever been supported by the TFS management pack monitoring.

Please let us know if you have feedback or encounter issues using this approach to monitor your TFS 2017 build resources.