November 15th, 2017

GVFS Updates: More Performance, More Availability

Edward Thomson
Principal Program Manager

It’s been a few months since we last talked about GVFS, the technology that allows Git to support Enterprise-scale Git repositories. And it’s been a busy few months.  Not only have we been working on a ton of performance improvements, we’ve also been getting it ready for a wider audience so that we can bring modern version control and DevOps practices to everybody working in giant repositories, even those that aren’t hosting their repository Visual Studio Team Services.

Performance

When it comes to GVFS, performance is job number one. The last time we talked about GVFS, we had just introduced a huge performance improvement that we call O(modified). As the name suggests, this work changed a number of Git commands so that their performance scales with the number of files that have been edited in a repository. This is a big improvement over the initial version of GVFS, which scaled with the number of files that had been downloaded in the repository. The problem is that as a developer works in a repository, they tend to read files, which causes them to be downloaded from the server. So even though your repository starts out fast with GVFS, the more you work in it, the slower it gets. With the O(modified) improvements, performance scales with the number of files you edit, which is a lot more stable, and thus, a lot faster.

But, like with most software projects, optimizing GVFS is usually more about making a lot of little improvements rather than about one big change. And that’s what we’ve been doing for the last several months. We’ve been relentlessly examining everything about Git and GVFS and shaving down every operation we can.

For example: we noticed that occasionally git status would become incredibly slow, and could even take a few minutes to execute. This shouldn’t happen: GVFS sets up Git as a “sparse repository”, and sets up its configuration files so that Git knows that it can ignore almost all of the repository. When you actually modify a file, GVFS updates the configuration so that Git will start examining that file for status. This is part of what makes Git with GVFS so performant. But investigation showed the git was sometimes spending a lot of time looking for untracked files on-disk, so we made a small design change to update the Git configuration with more granularity. This added precision means that Git avoids these slowdowns during status.

With another change, we were able to remove an entire network roundtrip every time you make a change to a file and want to stage the changes. When you run git add, it first puts the contents of your file into an object file, and calculates the object’s ID, which is its SHA-1 hash. But before Git actually puts that file into the database, it checks to see if the object already exists so that it doesn’t overwrite it.

Now, sometimes you do have the object in your repository already. You might have a common .gitignore file checked in to several folders in your repository. Or you might have a zero-byte file checked in as a placeholder. You might even be reverting the contents of a file to a previous revision. But typically, when you run git add, it’s new content that the repository has never seen.

And since GVFS virtualizes the Git repository in conjunction with the server, it will first look locally to see if the object already exists and then it will query the server to see if the object exists there. This is incredibly inefficient in the most common case, when the file simply doesn’t exist. By disabling this check on the server, we were able to cut out an entire network round trip on calls to both git add and git commit.

Another performance improvement is the implementation of our own database implementation for storing metadata. When we started building GVFS, we used the Extensible Storage Engine (aka “JET Blue”) that ships with Windows. ESE was a no-brainer to get started with since it’s easy to use and it’s built-in to Windows. But in our ongoing quest to get every ounce of performance out of GVFS, we decided to create our own database engine that was tuned for our needs.

Improvements Since June

Even if these changes are small compared to the O(modified) work that we did, these are all respectable performance improvements in their own right. But what’s truly important is taking the combination of them all working together. With all these modifications, we’ve been able to improve the experience of working in enormous repositories like the Windows repository even further.

GVFS Performance November 2017

Now that every Windows developer — over 3000 of them — are working in Git and VSTS, every little performance improvement adds up across the entire team. So we’re very excited by these numbers.

GVFS 1.0 with Pre-built Binaries

Even though the performance of GVFS is critical, all that speed is no good unless people can use it. So we’ve also been working to make it easier to get started working with GVFS. Until today, if you wanted to investigate GVFS or include support for it in your application, you would need to build it yourself from our GitHub repository. And that’s after carefully configuring a set of dependencies like a particular set of SDK versions.

Today we’re proud to announce that we have the first release of GVFS available: GVFS 1.0 includes pre-built binaries, so that you don’t need to compile it yourself. You’ll still need Windows 10 Creators Update (version 1703) or later, since the Windows team has made filesystem driver changes to support GVFS. But now you only need to run the installer to evaluate GVFS.

GVFS Across the Industry

These changes are amazing. But the GVFS news that I’m most excited about isn’t a technical improvement: it’s an addition to the ecosystem. We’re thrilled to announce that other Git hosting providers have chosen to adopt GVFS as the industry standard for hosting Enterprise-scale Git repositories.

Today at the Microsoft Connect() conference, GitHub‘s Sam Lambert joined us to announce that GitHub was adding GVFS support to their development platform. It’s a natural fit, since GitHub.com is the world’s largest Git hosting provider, and GVFS supports the world’s largest Git repositories.

The GVFS ecosystem continues to grow, with GitHub joining Atlassian as industry partners for GVFS hosting; Bitbucket added experimental GVFS support earlier this fall, as a marketplace extension. Atlassian’s graphical client application, SourceTree, also supports GVFS, as do the Git clients Tower for Windows and gmaster.

This is another great example of how in open source development, cooperation triumphs over competition. GVFS brings modern version control practices to software engineers working in massive repositories, whether they host their projects in Visual Studio Team Services, in Bitbucket, or in GitHub.

We’re so proud of these next steps of the GVFS evolution. When we first envisioned GVFS, we were excited about the possibility of bringing modern version control and DevOps practices to some of Microsoft’s oldest and largest code bases. We were thrilled with the successes we had with bringing the Windows team to Git, hosted in Visual Studio Team Services. And now, VSTS is no longer the only hosting provider to support Enterprise-scale repositories with GVFS.

But we’re not done working yet, so we hope you’ll stay tuned for our next announcement.

Author

Edward Thomson
Principal Program Manager

Edward Thomson is a Program Manager for Azure DevOps, where he ensures that customers are successful with Git, CI/CD and DevOps concepts. Before becoming a Program Manager, he was a Software Engineer at GitHub and Microsoft working on Git tools.

0 comments

Discussion are closed.

Feedback