{"id":12996,"date":"2017-05-24T11:11:44","date_gmt":"2017-05-24T16:11:44","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/bharry\/?p=12996"},"modified":"2019-02-27T23:17:29","modified_gmt":"2019-02-27T23:17:29","slug":"the-largest-git-repo-on-the-planet","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/bharry\/the-largest-git-repo-on-the-planet\/","title":{"rendered":"The largest Git repo on the planet"},"content":{"rendered":"<p>It&#8217;s been 3 months since I first wrote about <a href=\"https:\/\/devblogs.microsoft.com\/bharry\/scaling-git-and-some-back-story\/\">our efforts to scale Git to extremely large projects and teams <\/a>with an effort we called &#8220;Git Virtual File System&#8221;.\u00a0 As a reminder, GVFS, together with a set of enhancements to Git, enables Git to scale to VERY large repos by virtualizing both the .git folder and the working directory.\u00a0 Rather than download the entire repo and checkout all the files, it dynamically downloads only the portions you need based on what you use.<\/p>\n<p>A lot has happened and I wanted to give you an update.\u00a0\u00a0Three months ago, GVFS was still a dream.\u00a0 I don&#8217;t mean it didn&#8217;t exist &#8211; we had a concrete implementation, but rather, it was unproven.\u00a0 We had validated on some big repos but we hadn&#8217;t rolled it out to any meaningful number of engineers so we had only conviction that it was going to work.\u00a0 Now we have proof.<\/p>\n<p>Today,\u00a0I want\u00a0\u00a0to share our results.\u00a0 In addition, we\u2019re announcing the next steps in our GVFS journey for customers, including expanded open sourcing to start taking contributions and improving how it works for us at Microsoft, as well as for partners and customers.<\/p>\n<p><strong>Windows is live on Git<\/strong><\/p>\n<p>Over the past 3 months, we have largely completed the rollout of Git\/GVFS to the Windows team at Microsoft.<\/p>\n<p>As a refresher, the Windows code base is approximately 3.5M files and, when checked in to a Git repo, results in a repo of about 300GB.\u00a0 Further, the Windows team is about 4,000 engineers and the engineering system produces 1,760 daily &#8220;lab builds&#8221; across 440 branches\u00a0in addition to thousands of pull request validation builds.\u00a0 All 3 of the dimensions (file count, repo size and activity), independently, provide daunting scaling challenges and taken together they make it unbelievably challenging to create a great experience.\u00a0 Before the move to Git, in Source Depot, it was spread across 40+ depots and we had a tool to manage operations that spanned them.<\/p>\n<p>As of my writing 3 months ago, we had all the code in one Git repo, a few hundred engineers using it and a small fraction (&lt;10%) of the daily build load.\u00a0 Since then, we have rolled out in waves across the engineering team.<\/p>\n<p>The first, and largest, jump happened on March 22nd when we rolled out to the Windows OneCore team of about 2,000 engineers.\u00a0 Those 2,000 engineers worked in Source Depot on Friday, went home for the weekend and came back Monday morning to a new experience based on Git.\u00a0 People on my team\u00a0were holding their breath that whole weekend,\u00a0praying we weren&#8217;t going be\u00a0pummeled by a mob of angry engineers who showed up Monday unable to get any work done.\u00a0 In truth, the Windows team had done a great job preparing backup plans in case of\u00a0mishap and, thankfully, we didn&#8217;t have to use any of them.<\/p>\n<p>Much to\u00a0my surprise, quite honestly, it went\u00a0very smoothly and engineers were productive from day one.\u00a0 We had some issues, no doubt.\u00a0 For instance, Windows, because of the size of the team and the nature of the work, often has VERY large merges across branches (10,000&#8217;s of changes with 1,000&#8217;s of conflicts).\u00a0 We discovered that first week that our UI for pull requests and merge conflict resolution simply didn&#8217;t scale to changes that large.\u00a0 We had to scramble to virtualize lists and incrementally fetch data so the UI didn&#8217;t just hang.\u00a0 We had it resolved within a couple of days and overall, sentiment that week was much better than we expected.<\/p>\n<p>One of the ways we measured our success was by doing surveys of the engineering team.\u00a0 The main question we asked was &#8220;How satisfied are you?&#8221; but, of course, we also mined a lot more detail.\u00a0\u00a0Two weeks into the rollout, our first survey resulted in:\n<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/GitSurvey.png\"><img decoding=\"async\" class=\"alignnone wp-image-13006\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/GitSurvey.png\" alt=\"gitsurvey\" width=\"430\" height=\"127\" \/><\/a><\/p>\n<p>I&#8217;m not going to jump up and down and celebrate those numbers, but for a team that had just had their whole life changed, had to learn a new way of working and were living through a transition that was very much a work in progress, I felt reasonably good about it.\u00a0 Yes, it&#8217;s only 251 survey responses out of 2,000 people but welcome to the world of trying to get people to respond to surveys. \ud83d\ude42<\/p>\n<p>Another way we measured success was to look at &#8220;engineering activity&#8221; to see if people were still getting their work done.\u00a0 For instance, we measured number of &#8220;checkins&#8221; to official branches.\u00a0 Of course, half the team was still on Source Depot and half had moved to Git so we looked at combined activity over time.\u00a0 In the chart below you can see the big drop in Source Depot checkins and\u00a0the big jump in Git pull requests but overall the sum of the two stayed reasonable consistent.\u00a0 We felt that the data showed that the system was working and there were no major blockers.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Activity1.png\"><img decoding=\"async\" class=\"alignnone wp-image-13036\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Activity1.png\" alt=\"activity\" width=\"511\" height=\"309\" \/><\/a><\/p>\n<p>On April 22nd, we onboarded the next wave of about 1,000 engineers<span style=\"margin: 0px; font-family: 'Segoe UI',sans-serif; font-size: 11pt;\"><span style=\"color: #000000;\">.\u00a0 And then on May 12th we onboarded another 300-400.\u00a0 Each successive wave followed roughly the same pattern and we now have about 3,500 of the roughly 4,000 Windows engineers on Git.\u00a0 The remaining teams are currently working to deadlines and trying to figure out when is the best time to schedule their move, but I expect, in the next few months we&#8217;ll complete the full engineering team.<\/span><\/span>\nThe scale the system is operating at is really amazing.\u00a0 Let&#8217;s look at some numbers&#8230;<\/p>\n<ul>\n<li>There are over 250,000 reachable Git commits in the history for this repo, over the past 4 months.<\/li>\n<li>8,421 pushes per day (on average)<\/li>\n<li>2,500 pull requests, with 6,600 reviewers per work day\u00a0(on average)<\/li>\n<li>4,352 active topic branches<\/li>\n<li>1,760 official builds per day<\/li>\n<\/ul>\n<p>As you can see, it&#8217;s just a tremendous amount of activity over an immensely large codebase.\n<b><\/b><\/p>\n<p><b>GVFS performance at scale<\/b><\/p>\n<p>If you look at those satisfaction survey numbers, you&#8217;ll see there are\u00a0people who aren&#8217;t happy yet.\u00a0 We have lots of data on why and there are many reasons &#8211; from tooling that didn&#8217;t support Git yet to frustration at having to learn something new.\u00a0 But, the top issue is performance, and I want to drill into that.\u00a0 We knew when we rolled out Git that lots of our performance work wasn&#8217;t done yet and we also learned some new things along the way.\u00a0 We track the performance of some of the key Git operations.\u00a0 Here is data collected by telemetry systems for the ~3,500 engineers using GVFS.\n<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Performance.png\"><img decoding=\"async\" class=\"alignnone wp-image-13025\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Performance.png\" alt=\"performance\" width=\"317\" height=\"187\" \/><\/a><\/p>\n<p>You see the &#8220;goal&#8221; (which was designed to be a worst case, the system isn&#8217;t usable if it&#8217;s slower than this value, not a &#8220;this is where we want to be&#8221; value).\u00a0 You also see the 80th percentile result for the past 7 days and the delta from the previous 7 days (you&#8217;ll notice everything is getting slower &#8211; more on that in a minute).<\/p>\n<p>For context, if we tried this with &#8220;vanilla Git&#8221;, before we started our work, many of the commands would take 30 minutes up to hours and a few would never complete.\u00a0 The fact that most of them are less than 20 seconds is a huge step but it still sucks if you have to wait 10-15 seconds for everything.\nWhen we first rolled it out, the results were much better.\u00a0 That&#8217;s been one of our key learnings.\u00a0 If you read <a href=\"https:\/\/devblogs.microsoft.com\/bharry\/scaling-git-and-some-back-story\/\">my post that introduced GVFS<\/a>, you&#8217;ll see I talked about how we did work in Git and GVFS to change many operations from being proportional to the number of files in the repo to instead be proportional to the number of files &#8220;read&#8221;.\u00a0 It turns out that, over time, engineers crawl across the code base and touch more and more stuff leading to a problem we call &#8220;over hydration&#8221;.\u00a0 Basically, you end up with a bunch of files that were touched at some point but aren&#8217;t really used any longer and certainly never modified.\u00a0 This leads to a gradual degradation in performance.\u00a0 Individuals can &#8220;clean up&#8221; their enlistment but that&#8217;s a hassle and people don&#8217;t, so the system gets slower and slower.<\/p>\n<p>That led us to embark upon another round of performance improvements we call &#8220;O(modified)&#8221; which changes the proportionality of many key commands to instead be proportional to the number of files I&#8217;ve modified (meaning I have current, uncommitted edits on).\u00a0 We are rolling these changes out to the org over the next week so I don&#8217;t have broad statistical data on the results yet but we do have good results from some early pilot users.<\/p>\n<p>I don&#8217;t have all the data but I&#8217;ve picked a few examples from the table above and copied the performance results into the column called &#8220;O(hydrated)&#8221;.\u00a0 I&#8217;ve added another column called O(modified) with the results for the same commands using the performance enhancements we are rolling out next week.\u00a0 All the numbers are in seconds.\u00a0 As you can see we are getting performance improvements across the board &#8211; some are small, some are ~2X and status is almost 5X faster.\u00a0 We&#8217;re very optimistic these improvements are going to move the needle on perf perception.\u00a0 I&#8217;m still not fully satisfied (I won&#8217;t be until Status is under 1 second), but it&#8217;s\u00a0fantastic progress.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/BeforeAfter.png\"><img decoding=\"async\" class=\"alignnone wp-image-13035\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/BeforeAfter.png\" alt=\"beforeafter\" width=\"344\" height=\"119\" \/><\/a><\/p>\n<p>Another key performance area that I didn&#8217;t talk about in my last post is distributed teams.\u00a0 Windows has engineers scattered all over the globe &#8211; the US, Europe, the Middle East, India, China, etc.\u00a0 Pulling large amounts of data across very long distances, often over less than ideal bandwidth is a big problem.\u00a0 To tackle this problem, we invested in building a Git proxy solution for GVFS that allows us to cache Git data &#8220;at the edge&#8221;.\u00a0 We have also used proxies to offload very high volume traffic (like build servers) from the main Visual Studio Team Services service to avoid compromising end user&#8217;s experiences during peak loads.\u00a0 Overall, we have 20 Git proxies (which, BTW, we&#8217;ve just incorporated into the existing Team Foundation Server Proxy) scattered around the world.<\/p>\n<p>To give you an idea of the effect, let me give an example.\u00a0 The Windows Team Services account is located in an Azure data center on the west coast of the US.\u00a0 Above you saw that the 80th percentile for Clone for a Windows engineer is 127 seconds.\u00a0 Since a high percentage of our Windows engineers are in Redmond, that number is dominated by them.\u00a0 We ran a test from our North Carolina office (which is both further away and has a much lower bandwidth network).\u00a0 A clone from North Carolina with no proxy server took almost 25 minutes.\u00a0 With a proxy configured and up to date, it took 70 seconds (faster than Redmond because the Redmond team doesn&#8217;t use a proxy and they have to go hundreds of miles over the internet to the Azure data center).\u00a0 70 seconds vs almost 25 minutes is an almost 95% improvement.\u00a0 We see similar improvements when GVFS &#8220;faults in&#8221; files as they are accessed.<\/p>\n<p>Overall Git with GVFS is completely usable at crazy large scale and the results are proving that our engineers are effective.\u00a0 At the same time, we have a lot of work to do to get the performance to the point that our engineers are &#8220;happy&#8221; with it.\u00a0 The O(modified) work rolling out next week will be a big step but we have months of additional performance work still on the backlog before we can say we&#8217;re done.<\/p>\n<p>To learn more about the details of the technical challenges we&#8217;ve faced in scaling Git and getting good performance, check out\u00a0the series of articles that Saeed Noursalehi is writing on <a href=\"http:\/\/www.visualstudio.com\/learn\/git-at-scale\">scaling Git and GVFS<\/a>.\u00a0 It&#8217;s fascinating to read.<\/p>\n<p><strong>Trying GVFS yourself<\/strong><\/p>\n<p><a href=\"https:\/\/github.com\/Microsoft\/GVFS\">GVFS <\/a>is an open source project and you are welcome to try it out.\u00a0 All you need to do is download and install it, create a Visual Studio Team Services account with a Git repo in it and you are ready to go.\u00a0 Since we initially published GVFS, we&#8217;ve made some good progress.\u00a0 Some of the key changes include:<\/p>\n<ol>\n<li>We&#8217;ve started doing regular updates to the published code base &#8211; moving towards &#8220;development in the open&#8221;.\u00a0 As of now, all our latest changes (including the new O(modified) work) are published to the public repo and we will be updating it regularly.<\/li>\n<li>When we first published, we were not ready to start taking external contributions.\u00a0 With this milestone today, we are now, officially ready to start.\u00a0 We feel like enough of the basic infrastructure is in place that people can start picking it up and moving it forward with us.\u00a0 We welcome anyone who wants to pitch in and help.<\/li>\n<li>GVFS relies on a Windows filesystem driver we call GVFlt.\u00a0 Until now, the drop of that driver that we made available was unsigned (because it was very much a work in progress).\u00a0 That clearly creates some friction in trying it out.\u00a0 Today, we released a signed version of GVFlt that will eliminate that friction (for instance, you no longer need to disable BitLocker to install it).\u00a0 Although we have a signed GVFlt driver, that&#8217;s not the long term delivery method.\u00a0 We expect this functionality to be incorporated into a future shipping version of Windows and we are still working through those details.<\/li>\n<li>Starting with our talk at Git Merge, we&#8217;ve begun engaging with the broader Git community on the problem of scaling Git and GVFS, in particular.\u00a0 We&#8217;ve had some great conversations with other large tech companies (like Google and Facebook) who have similar scaling challenges and we are sharing our experiences and approaches.\u00a0 We have also worked with several of the popular Git clients to make sure they work well with GVFS.\u00a0 These include:\n<ol>\n<li><strong>Atlassian SourceTree<\/strong> &#8211; SourceTree was the first tool to validate with GVFS and have already released an update with a few changes to make it work well.<\/li>\n<li><strong>Tower<\/strong> &#8211; The Tower Git team is excited to add GVFS support and they are already working on include GVFS in the Windows version of their app.\u00a0 It will be available as a free update in the near future.<\/li>\n<li><strong>Visual Studio<\/strong> &#8211; Of course, it would be good for our own Visual Studio Git integration to work well with GVFS too.\u00a0 We are including GVFS support in VS 2017.3 and the first preview with the necessary support will be available in early June.<\/li>\n<li><strong>Git for Windows<\/strong> &#8211; As part of our effort to scale Git, we have also made a bunch of contributions to Git for Windows (the Git command line) and that includes support for GVFS.\u00a0 Right now, we still have a <a href=\"https:\/\/github.com\/Microsoft\/git\/\">private fork of Git for Windows <\/a>but, over time, we are working to get all of those changes contributed back to the mainline.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p><strong>Summary<\/strong><\/p>\n<p>We&#8217;re continuing to push hard on scaling Git to large teams and code bases at Microsoft.\u00a0 A lot has happened in the 3 months since we first talked about the effort.\u00a0 We&#8217;ve&#8230;<\/p>\n<ul>\n<li>Successfully rolled it out to 3,500 Windows engineers<\/li>\n<li>Made some significant performance improvements and introduced Git proxies<\/li>\n<li>Updated the open source projects with the latest code and opened it\u00a0for contributions<\/li>\n<li>Provided a signed GVFlt driver to make trying it out easier<\/li>\n<li>Worked with the community to begin to build support into popular tools &#8211; like SourceTree, Tower, Visual Studio, etc.<\/li>\n<li>Published some articles with more insights into the technical approach we are taking to <a href=\"http:\/\/www.visualstudio.com\/learn\/git-at-scale\">scale Git and GVFS<\/a>.<\/li>\n<\/ul>\n<p>This is an exciting transition for Microsoft and a challenging project for my team and the Windows team.\u00a0 I&#8217;m elated at the progress we&#8217;ve made and humbled by the work that remains.\u00a0 If you too find there are times where you need to work with very large codebases and, yet you really you really want to move to Git, I encourage you to give GVFS a try.\u00a0 For now, Visual Studio Team Services is the only backend implementation that supports the GVFS protocol enhancements.\u00a0 We will add support in a future release of Team Foundation Server if we see enough interest and we have talked to other Git services who have some interest in adding support in the future.<\/p>\n<p>Thanks and enjoy.<\/p>\n<p>Brian<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s been 3 months since I first wrote about our efforts to scale Git to extremely large projects and teams with an effort we called &#8220;Git Virtual File System&#8221;.\u00a0 As a reminder, GVFS, together with a set of enhancements to Git, enables Git to scale to VERY large repos by virtualizing both the .git folder [&hellip;]<\/p>\n","protected":false},"author":244,"featured_media":14617,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[9],"class_list":["post-12996","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-vs-team-services"],"acf":[],"blog_post_summary":"<p>It&#8217;s been 3 months since I first wrote about our efforts to scale Git to extremely large projects and teams with an effort we called &#8220;Git Virtual File System&#8221;.\u00a0 As a reminder, GVFS, together with a set of enhancements to Git, enables Git to scale to VERY large repos by virtualizing both the .git folder [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/12996","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/users\/244"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/comments?post=12996"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/12996\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media\/14617"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media?parent=12996"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/categories?post=12996"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/tags?post=12996"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}