{"id":12615,"date":"2017-02-03T16:12:23","date_gmt":"2017-02-03T21:12:23","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/bharry\/?p=12615"},"modified":"2019-02-16T22:46:06","modified_gmt":"2019-02-16T22:46:06","slug":"scaling-git-and-some-back-story","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/bharry\/scaling-git-and-some-back-story\/","title":{"rendered":"Scaling Git (and some back story)"},"content":{"rendered":"<p>A couple of years ago,\u00a0Microsoft made the decision to begin a multi-year investment in revitalizing our engineering system across the company.\u00a0 We are a big company with tons of teams &#8211; each with their own products, priorities, processes and tools.\u00a0 There are some &#8220;common&#8221; tools but also a lot of diversity &#8211; with VERY MANY internally developed one-off tools (by team I kind of mean division &#8211; thousands of engineers).\nThere are a lot of downsides to this:<\/p>\n<ol>\n<li>Lots of redundant investments in teams building similar tooling<\/li>\n<li>Inability to fund any of the tooling to &#8220;critical mass&#8221;<\/li>\n<li>Difficulty for employees to move around the company due to different tools and process<\/li>\n<li>Difficulty in sharing code across\u00a0organizations<\/li>\n<li>Friction for new hires getting started due to an overabundance of &#8220;MS-only&#8221; tools<\/li>\n<li>And more&#8230;<\/li>\n<\/ol>\n<p>We set out on an effort we call the &#8220;One Engineering System&#8221; or &#8220;1ES&#8221;. \u00a0Just yesterday we had\u00a0a 1ES day where thousands of engineers\u00a0gathered to celebrate the progress we&#8217;ve made, to learn about the current state and to discuss the path forward.\u00a0 It was a surprisingly good event.\nAside&#8230; You might be asking yourself &#8211; hey, you&#8217;ve been telling us for years Microsoft uses TFS, have you\u00a0been lying to us?\u00a0 No, I haven&#8217;t.\u00a0 Over 50K\u00a0people have regularly used TFS but they don&#8217;t always use it for everything.\u00a0 Some use it for everything.\u00a0 Some use only work item tracking.\u00a0 Some only version control.\u00a0 Some build &#8230;\u00a0 We had internal versions (and in many cases more than one) of virtually everything TFS does and someone somewhere used them all.\u00a0 It was a bit of chaos, quite honestly.\u00a0 But, I think I can safely say, when aggregated and weighed &#8211; TFS had more adoption than any other set of tools.\nI also want to point\u00a0out that, when I say engineering system here, I am using the term VERY broadly.\u00a0 It includes but is not limited to:<\/p>\n<ol>\n<li>Source control<\/li>\n<li>Work management<\/li>\n<li>Builds<\/li>\n<li>Release<\/li>\n<li>Testing<\/li>\n<li>Package management<\/li>\n<li>Telemetry<\/li>\n<li>Flighting<\/li>\n<li>Incident management<\/li>\n<li>Localization<\/li>\n<li>Security scanning<\/li>\n<li>Accessibility<\/li>\n<li>Compliance management<\/li>\n<li>Code signing<\/li>\n<li>Static analysis<\/li>\n<li>and much, much more<\/li>\n<\/ol>\n<p>So, back to the story.\u00a0 When we embarked on this journey, we had\u00a0some heated debates about where we were going, what to prioritize, etc.\u00a0 You know, developers never have opinions. \ud83d\ude42\u00a0 There&#8217;s no way to try to address everything at once, without failing miserably so we agreed to start by tackling 3 problems:<\/p>\n<ul>\n<li>Work planning<\/li>\n<li>Source control<\/li>\n<li>Build<\/li>\n<\/ul>\n<p>I won&#8217;t go into detailed reasons other than to say those are foundational and so much else integrates with them, builds on them etc. that they made sense.\u00a0 I&#8217;ll also observe that we had a HUGE amount of pain around build times and reliability\u00a0due to the size of our products &#8211; some hundreds of millions of lines of code.\nOver the intervening time those initial 3 investments have grown and, to varying degrees, the 1ES effort touches almost every aspect of our engineering process.\nWe put some interesting stakes in the ground.\u00a0 Some included:\n<strong>The cloud is the future <\/strong>&#8211;\u00a0Much of our infrastructure and tools were hosted internally (including TFS).\u00a0 We agreed that the cloud is the future &#8211; mobility, management, evolution, elasticity, all the reasons you can think of.\u00a0\u00a0A few\u00a0years ago, that was very controversial.\u00a0 How could\u00a0Microsoft put all our IP in the cloud?\u00a0 What about performance?\u00a0 What about security?\u00a0 What about reliability?\u00a0 What about compliance and control?\u00a0 What about&#8230;\u00a0 It took time but we eventually got a critical mass OK with the\u00a0idea and as the years have passed, that decision has only made more and more sense and everyone is excited about moving to cloud.\n<strong>1st party == 3rd party<\/strong> &#8211; This is an expression we use internally that means, as much as possible, we want to use what we ship and ship what we use.\u00a0 It&#8217;s not 100% and it&#8217;s not always concurrent but it&#8217;s the direction &#8211; the\u00a0default assumption, unless there&#8217;s a good reason to do something else.\n<strong>Visual Studio Team Services is the foundation<\/strong> &#8211; We made a bet on Team Services as the backbone.\u00a0 We need a\u00a0fabric that ties our engineering system together &#8211; a hub\u00a0from which you learn about and reach everything.\u00a0 That hub needs to be modern,\u00a0rich, extensible, etc.\u00a0 Every team needs to be able to contribute and share their distinctive contributions to the engineering system.\u00a0 Team Services fits the bill perfectly.\u00a0 Over the past year usage of Team services within Microsoft has grown from a couple of thousand to over 50,000\u00a0committed users.\u00a0 Like with TFS, not every team uses it for everything yet, but momentum in that direction is\u00a0strong.\n<strong>Team Services work planning<\/strong> &#8211; Having chosen\u00a0Team Services, it was pretty natural to choose the associated work management capabilities.\u00a0 We&#8217;ve on-boarded teams like the Windows group, with many thousands of users and many millions of work items, into a single Team Services account.\u00a0 We had to do a fair amount of performance and scale work to make that viable, BTW.\u00a0 At this point virtually every team at Microsoft has made this transition and all of our engineering work is being managed in Team\u00a0Services\n<strong>Team Services Build orchestration &amp;\u00a0CloudBuild<\/strong> &#8211; I&#8217;m not going to drill on this topic too much because it&#8217;s a mammoth post in and of itself.\u00a0 I&#8217;ll summarize it to say we&#8217;ve chosen the Team Services Build service as our build orchestration system and the Team Services Build management experience\u00a0as our UI.\u00a0 We have also built a new &#8220;make engine&#8221; (that we don&#8217;t yet ship) for some of our largest code bases\u00a0that does extremely high scale and fine grained caching, parallelization and incrementality.\u00a0 We&#8217;ve seen multi-hour builds drop sometimes to minutes.\u00a0 More on this\u00a0in a future post at\u00a0some point.\nAfter much backstory, on to the meat \ud83d\ude42\n<strong>Git for source control<\/strong>\nMaybe the most controversial decision was what to use for source control.\u00a0 We had an internal source control system called Source Depot that virtually everyone used in the early 2000&#8217;s.\u00a0 Over time, TFS and its Team Foundation Version Control solution won over much of the company but\u00a0never made progress with the biggest teams &#8211; like Windows and Office.\u00a0 Lots of reasons I think &#8211; some of it was just that the cost for such large teams to migrate was extremely high and the two systems (Source Depot and TFS) weren&#8217;t different enough to justify it.\nBut source\u00a0control systems generate intense loyalty &#8211; more so than just about any other developer tool.\u00a0 So the argument between TFVC, Source Depot, Git, Mercurial, and more was ferocious and, quite honestly, we made a decision without ever getting consensus &#8211; it just wasn&#8217;t going\u00a0to happen.\u00a0 We chose to standardize on Git for many reasons.\u00a0 Over time, that decision has gotten more and more\u00a0adherents.\nThere were many arguments\u00a0against choosing Git but the most concrete one was scale.\u00a0 There aren&#8217;t many companies with\u00a0code bases the size of some of ours. \u00a0Windows and Office, in particular (but there are others), are massive.\u00a0 Thousands of engineers, millions of files, thousands of build machines constantly building it, quite\u00a0honestly, it&#8217;s\u00a0mind boggling.\u00a0 To be clear, when I refer to Window in this post, I&#8217;m actually painting a very broad brush &#8211; it&#8217;s Windows for PC, Mobile, Server, HoloLens, Xbox, IOT, and more.\u00a0 And Git is a distributed version control system (DVCS).\u00a0 It copies the entire repo and all its history to your local machine.\u00a0 Doing that with\u00a0Windows is laughable (and we got laughed at plenty).\u00a0 TFVC and Source Depot had both been\u00a0carefully optimized for huge code bases and teams.\u00a0 Git had *never* been applied to a problem like this (or probably even within an order of magnitude of this) and many asserted it would *never* work.\nThe first big debate was &#8211; how many repos\u00a0do you have &#8211;\u00a0one for the whole company at one extreme\u00a0or one for each small component?\u00a0 A big spectrum.\u00a0 Git is proven to work extremely well for a very large number of modest repos so we spent a bunch of time exploring what it would take to factor our\u00a0large codebases into lots of tenable repos.\u00a0 Hmm.\u00a0 Ever worked in a huge code base for 20 years?\u00a0 Ever tried to go back afterwards and decompose it into small repos?\u00a0 You can guess what we discovered.\u00a0 The code is very hard to decompose.\u00a0 The\u00a0cost would be very high.\u00a0 The risk from that level of churn would be enormous.\u00a0 And, we really do have scenarios where a single engineer needs to make sweeping changes across a very large swath of code.\u00a0 Trying to coordinate that across hundreds of repos would be\u00a0very problematic.\nAfter much hand wringing we decided\u00a0our strategy needed to be &#8220;the right number of repos based on the character of the code&#8221;.\u00a0 Some code is separable (like microservices) and is ideal for isolated repos.\u00a0 Some code is not (like Windows core) and needs to be treated like a single repo.\u00a0 And, I want\u00a0to emphasize, it&#8217;s not just about the difficulty of decomposing the code.\u00a0 Sometimes, in big highly related code bases, it really is better to treat the codebase as a whole.\u00a0 Maybe someday I&#8217;ll tell the story of Bing&#8217;s effort to componentize the core Bing platform into\u00a0packages and the versioning problems that caused for them.\u00a0 They are currently backing\u00a0away from\u00a0that strategy.\nThat meant we had to embark\u00a0upon scaling Git to work on codebases that are millions of files, hundreds of gigabytes and used by thousands of developers.\u00a0 As a contextual side note, even Source Depot did not scale to the entire Windows codebase.\u00a0 It had been split across 40+ depots so that we could scale it out but a layer was built over it so that, for most use cases, you could treat it like one.\u00a0 That abstraction wasn&#8217;t perfect and definitely created some friction.\nWe started down at least 2 failed paths\u00a0to scale Git.\u00a0 Probably the most extensive one was to use Git submodules to stitch together lots of repos into a single &#8220;super&#8221; repo.\u00a0 I won&#8217;t go into details but after 6 months of working on that we realized it wasn&#8217;t going to work &#8211; too many edge cases, too much complexity and fragility.\u00a0 We needed a bulletproof solution that would be well supported by almost all Git tooling.\nClose to a year ago we reset and focused on\u00a0how we\u00a0would actually get Git to scale to a single\u00a0repo that could hold the entire Windows codebase (include estimates of growth and history) and\u00a0support all the developers and build machines.\nWe tried an approach of &#8220;virtualizing&#8221; Git.\u00a0 Normally Git downloads *everything* when you clone. \u00a0But what if it didn&#8217;t?\u00a0 What if we virtualized the storage under it so that it only downloaded the things\u00a0you need.\u00a0 So clone of a massive 300GB repo becomes very fast.\u00a0 As I perform Git commands or\u00a0read\/write files in my enlistment, the system seamlessly fetches the content from the cloud (and then stores it locally so future\u00a0accesses to that data are all local).\u00a0 The one downside to this is that you lose offline support.\u00a0 If you want that you have to &#8220;touch&#8221; everything to manifest it locally but you don&#8217;t lose anything else &#8211; you still\u00a0get the 100% fidelity\u00a0Git experience.\u00a0 And for our huge code bases, that was OK.\nIt was a promising approach and we began to prototype it. \u00a0We called the effort Git Virtual File System or GVFS.\u00a0 We set out with the goal of making as few changes to git.exe as possible.\u00a0 For sure we didn&#8217;t want to fork Git &#8211; that would be a disaster.\u00a0 And we didn&#8217;t want to change it in a way that the community would never take our contributions back either.\u00a0 So we walked a fine line doing as much &#8220;under&#8221; Git with a virtual file system driver as we could.\nThe file system driver basically\u00a0virtualizes 2 things:<\/p>\n<ol>\n<li>The .git\u00a0folder &#8211; This is where all your\u00a0pack files, history, etc. are stored.\u00a0 It&#8217;s the &#8220;whole thing&#8221; by default.\u00a0 We virtualized this to pull down only the\u00a0files we needed when we needed them.<\/li>\n<li>The &#8220;working directory&#8221; &#8211; the place you go to actually\u00a0edit your source, build it, etc. \u00a0GVFS monitors the working directory and automatically &#8220;checks out&#8221; any file that you touch making it feel like all the files are there but not\u00a0paying the cost unless you actually access them.<\/li>\n<\/ol>\n<p>As we progressed, as you&#8217;d imagine, we learned a lot.\u00a0 Among them, we learned the Git server has to be smart.\u00a0 It has to pack the\u00a0Git files in an optimal fashion\u00a0so that it doesn&#8217;t have to send more to the client than absolutely necessary &#8211; think of it as optimizing locality of reference.\u00a0 So we made lots of enhancements\u00a0to the\u00a0Team Services\/TFS\u00a0Git server.\u00a0 We also discovered that Git has lots of scenarios where it touches stuff it really doesn&#8217;t need to.\u00a0 This never really mattered before because it was all local and used for modestly sized repos so it was fast &#8211; but when touching it means\u00a0downloading it from the server or scanning 6,000,000 files, uh oh.\u00a0 So we&#8217;ve been investing heavily in is performance optimizations to Git.\u00a0 Many of them also benefit &#8220;normal&#8221; repos to some degree but they are\u00a0critical for mega repos.\u00a0 We&#8217;ve been\u00a0submitting\u00a0many of these improvements\u00a0to the Git OSS project and have enjoyed a good working relationship with them.\nSo, fast forward to today.\u00a0 It works!\u00a0 We have all the code from 40+ Windows\u00a0Source Depot servers in a single Git repo hosted on VS Team\u00a0Services &#8211; and it&#8217;s very usable.\u00a0 You can enlist in a few minutes and do all your normal Git operations in seconds.\u00a0 And, for all intents and purposes, it&#8217;s transparent.\u00a0 It&#8217;s just Git.\u00a0 Your devs keep working the way they work, using the tools they use.\u00a0 Your builds just work.\u00a0 Etc.\u00a0 It&#8217;s pretty frick&#8217;n amazing.\u00a0 Magic!\nAs a side effect, this approach also has some very nice characteristics for large binary files.\u00a0 It doesn&#8217;t extend Git with a new mechanism like LFS does, no turds, etc. \u00a0It allows you to treat large binary files like any other file but it only downloads the blobs you actually ever touch.\n<strong>Git Merge<\/strong>\nToday, at the Git Merge conference in Brussels, Saeed <span style=\"margin: 0px;font-family: 'Times New Roman',serif;font-size: 12pt\"><span style=\"color: #000000\">Noursalehi<\/span><\/span> shared the work we&#8217;ve been doing &#8211; going into excruciating detail on what we&#8217;ve done and what we&#8217;ve learned. \u00a0At the same time, we open sourced all our work.\u00a0 We&#8217;ve also included\u00a0some additional\u00a0server protocols we needed to introduce.\u00a0 You can find the <a href=\"https:\/\/github.com\/Microsoft\/gvfs\">GVFS project <\/a>and the changes we&#8217;ve made to <a href=\"https:\/\/github.com\/Microsoft\/git\">Git.exe <\/a>in the\u00a0Microsoft GitHub organization.\u00a0 GVFS relies on a new Windows filter driver (the moral equivalent of the FUSE driver in Linux) and we&#8217;ve worked with the Windows team to make an early drop of that available so you can try GVFS.\u00a0 You can read more and get more resources on <a href=\"https:\/\/blogs.msdn.microsoft.com\/visualstudioalm\/2017\/02\/03\/announcing-gvfs-git-virtual-file-system\/\">Saeed&#8217;s blog post<\/a>. \u00a0I encourage you to check it out.\u00a0 You can even install it and give it a try.\nWhile I&#8217;ll celebrate that it works, I also want to emphasize that it is still very much a work in progress.\u00a0 We aren&#8217;t done with any aspect of it.\u00a0 We think we have proven the concept\u00a0but there&#8217;s much\u00a0work to be done to make it a reality.\u00a0 The point of announcing this now and open sourcing it is to engage with the community to work together to help scale\u00a0Git to the largest code bases.\nSorry for the long post but I hope it was interesting.\u00a0 I&#8217;m very excited about\u00a0the work &#8211; both on 1ES at Microsoft and on scaling Git.\nBrian\n&nbsp;\n&nbsp;\n&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A couple of years ago,\u00a0Microsoft made the decision to begin a multi-year investment in revitalizing our engineering system across the company.\u00a0 We are a big company with tons of teams &#8211; each with their own products, priorities, processes and tools.\u00a0 There are some &#8220;common&#8221; tools but also a lot of diversity &#8211; with VERY MANY [&hellip;]<\/p>\n","protected":false},"author":244,"featured_media":14617,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[9],"class_list":["post-12615","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-vs-team-services"],"acf":[],"blog_post_summary":"<p>A couple of years ago,\u00a0Microsoft made the decision to begin a multi-year investment in revitalizing our engineering system across the company.\u00a0 We are a big company with tons of teams &#8211; each with their own products, priorities, processes and tools.\u00a0 There are some &#8220;common&#8221; tools but also a lot of diversity &#8211; with VERY MANY [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/12615","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/users\/244"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/comments?post=12615"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/12615\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media\/14617"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media?parent=12615"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/categories?post=12615"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/tags?post=12615"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}