How VSTS is Accelerating the Engineering Group Behind Windows
As part of our engineering processes in Microsoft, we often share best practices and stories of change across the engineering teams in the company. At our latest internal engineering conference as I listened in to sessions, I was struck by the sheer scale of the effort the Windows and Devices Group (WDG) undertook and the problems they’ve solved using Visual Studio Team Services (VSTS) and wanted to write up some of my key takeaways here.
WDG here at Microsoft powers the operating systems of computing devices across the planet. It looks after not only the Windows operating system, but also Xbox, Surface, HoloLens, the Microsoft Store and much more. With over 22,000 employees and 7,000 software developers in the group, it’s larger than many companies.
WDG was formed from divisions across Microsoft. When that many engineers came together from different areas, there were lots of ways of working and lots of different systems and ways to build and deploy software. There was duplication of effort and logistical difficulties in sharing code, processes and learnings. How do you get everyone to work together and across all the other disciplines within the team?
Four years ago, WDG started to adopt VSTS as part of the ‘One Engineering System’ (1ES) effort in Microsoft – an effort to bring together our engineering people, processes and tools across the whole company. The WDG team has been leading much of the performance work we’ve done to migrate teams to Git hosted in VSTS. The Windows core repo contains over 270Gb of source for Windows in a massive mono-repo. That repo alone has over 4000 engineers working on it (meaning around 400 are actively making changes at any one time).
While Windows core is the largest Git repo, WDG has a lot of other repos and they try to make code more modular and self-contained when it makes sense. Large mono-repos can be good for developer productivity but they come with the cost of some additional complexity along with process and tooling limitations. In total the WDG team edits, reviews, builds and deploys from around 6,000 Git repos across the entire group. When you have that many people working in that many repos, pull requests become essential.
The recent Fall Creators Update to Windows 10 consisted of around 4 million individual commits grouped into around 500,000 pull requests. All changes to the Windows code are reviewed via pull requests which has proven to be the best way to work in any Git team in Microsoft. It has taken significant engineering work to build a pull request system in VSTS that is able to handle pull requests at the scale of a group like WDG. There are many improvements we have made over the years based on our learning such as running a PR queue in the background when you press ‘Merge’ on your pull request to prevent the merge race conditions and collisions that can otherwise happen. The WDG team has also published an extension to the VSTS marketplace to allow merge conflicts to be resolved directly in a pull request from VSTS rather than having to merge them locally and then push back to the server.
WDG also puts a similar level of demand on the VSTS work tracking system. They currently have over 10 million work items tracking bugs, features and tasks etc. Note: If you ever get asked to send a crash report to Microsoft please do! – there is a high probability that data from that crash report will end up in a work item for the WDG team to triage, allocate to an engineer to investigate and create a fix. Those crash reports and the detailed diagnostic information they contain are great for helping us improve our products – just know when you send one of those in you’ve made an engineer’s day easier by helping her track something down that might have been previously reported anecdotally but without detailed diagnostics.
Bringing WDG together in VSTS wasn’t all plain sailing – but like most organizational changes, it’s the people and the processes that are the hardest bits to adjust. We like teams to have a high degree of autonomy here in Microsoft. But when WDG came together from separate groups, those groups would often have different names for very similar things for no good reason – just because that is how it had always been for them. To give just a couple of examples, one team’s ‘bug’ was another’s ‘defect’ or ‘issue’. Depending on the team, ‘done’ could mean the work item had a status of ‘Complete’, ‘Completed’ or ‘Closed’ depending on where you worked in the org. When you start scaling that up you get a lot of complexity making it hard for anyone to know how to log a bug and be able to have it flow through the group across responsible teams where necessary. After bringing together engineering leads to drive a process rationalization and bring some alignment across the group, the team was able to dramatically reduce the number of work item types, fields and states. This not only brought greater simplicity to the process and made it much easier for engineers to use – it also helped improve the performance of their VSTS account as the forms no longer had to render hundreds of fields that were rarely (if ever) used. Rationalizing work items also allowed for better communication and reporting about what was happening within the group.
The WDG experience led to direct improvements in VSTS for things like tag support. More subtle changes include how you assign a work item to someone. (A regular drop-down combo works fine for a small team, but not when you could potentially assign the work item to one of the 80,000 VSTS users in Microsoft.)
The WDG team has seen massive improvements in moving to VSTS, not just in pure throughput, but also in the satisfaction levels of engineers on the team. Engineer satisfaction is the one of the most important management metrics for WDG. This also reflects the change of culture not just in their group but also across Microsoft as a whole towards rewarding sharing and reuse. In turn, the WDG team has helped improve VSTS for all our other customers, either by making direct feature contributions as a pull request to the VSTS codebase, by creating extensions and making them publicly available in the VSTS marketplace, or by leading the way so that feature gaps and performance issues are identified well before customers outside Microsoft run into them.
As a best practice, WDG has also released a number of their tools and extensions as open source projects so that customers outside of Microsoft can make use of them. My personal favorites are the Work Item Migrator (a way to copy work item content from one VSTS account into another) and also Mohit Bagra’s work item form extensions which adds one-click power user commands into the VSTS work item form.
It’s been an incredible four years; we’ve come further than we could have imagined together through the power of DevOps and the benefits of encouraging a culture of sharing inside and outside the company.