March 15th, 2018

GVFS for Mac

Over the last couple of years, we built GVFS to enable the Windows team to work in a Git repo that is orders of magnitude larger than any repo in existence before it.

GVFS is currently only supported on Windows 10. We recently announced that we’re investigating how to build GVFS on other platforms. I’m excited to say that we’ve made good progress on a prototype design for GVFS for Mac, and I’ll share some of those details here.

GVFS for Windows is made of up three main pieces:

  1. Patches to Git, to make it work efficiently with a virtualized file system.
  2. A file system filter driver that intercepts some file system operations and asks a user mode process about how to respond to things like directory enumerations and file contents.
  3. A user mode process that knows how to interpret the state of a Git repo to respond to file system queries, and how to respond to file system events to modify the state of the Git repo.

If you’re interested, you can read much more about how it works and peruse the code.

Items 1 and 3 are not inherently tied to any operating system, but item 2 is very much tied to the specific operating system, file system, and driver model. This is the piece that has to be rewritten for each OS, and we now have a decent idea of how we’ll build it for the Mac.

Design approach

Windows has a concept of file system filter drivers, which can stack on top of an existing file system driver and modify its behavior. MacOS does not support stacking drivers in this way, so we explored different approaches. The option we’ve landed on uses a Kauth kernel extension to a similar effect.

When an application accesses a file or directory, the kernel needs to decide if the current user has access to that object or not. Kauth allows a custom kernel extension (kext) to register for these authorization requests and have a say in answering that question. This gives us some interesting capabilities:

  • We can block an application as it is about to enumerate a directory, so that we can fill in placeholder files/directories for its children. This gives us the ability to dynamically enumerate the structure of the file system, on demand.
  • We can block an application that is about to read a file, so that we can fill in its contents. This gives us the ability to download file contents on demand.
  • We can detect when the user is modifying files, so that we can inform Git that these files might be dirty.
  • We can reject certain operations, like deletes to special files, or requests from file system crawlers that would otherwise aggressively hydrate the entire virtualized file system and break any attempts at loading things on demand.

To make this all work, we’ve created a concept of placeholder directories and placeholder files. We indicate that a directory or file is a placeholder by setting a file flag on it, so that the kext can quickly check a vnode to see if it is a virtual item that it needs to worry about. If not, the kext quickly returns, to avoid adding any unnecessary overhead to system IO.

When a new virtualization root is created, we initially create an empty placeholder directory, with no children on disk. The first time our kext (GvKext) receives a request to authorize enumeration on this directory, the kext blocks the enumeration request and sends a message over a port to a user mode process (GVFS) and asks it to write out placeholder entries on disk for the children of that directory. Once those child placeholders have been written, GVFS sends a message to GvKext and lets it know that it can unblock the enumeration request. The application then completes its enumeration and, other than seeing a slight delay, doesn’t notice that there is anything special about this directory.

Similarly whenever GvKext receives a request to authorize read access to a file, it checks if the file is still empty, blocks the IO, and asks GVFS to fill in the contents of the file. Once the contents are available, GVFS sends a message to GvKext, which then unblocks the read.

Prototype results

Our current prototype GVFS doesn’t actually do anything with a Git repo. All it does for now is just mirror an existing physical folder on disk, and project it as a virtual folder. This simple setup has allowed us to validate the approach of building a Kauth kext to virtualize the file system and hit two critical goals:

  • App compat, especially for build tools.
  • Perf of accessing an already-hydrated file.

To validate these goals, we cloned some large, complex Mac codebases, then created virtual projections of them, and built the codebases on top of the virtual projection. This helped us find and fix some interesting bugs, but we now have reliably passing builds on top of virtualized folders.

In addition, we have been able to demonstrate that a second full build finishes within about 10% of the time of a full build on a non-virtualized folder. We still need to work on bringing this number down, but it’s quite encouraging that a non-optimized prototype implementation is already within 10%.

For reference, with a FUSE-based prototype, we were never able to get better than 150% overhead for accessing an already-hydrated file.

What’s next?

Currently we’ve just built a prototype version of the kext. We still have a lot of work left to make it full-featured, robust, performant, diagnosable, and production ready. We also plan to do quite a bit of cleanup and refactoring in the GVFS user mode codebase, to allow that same code to run on Mac using .NET Core. As these pieces start to come together over the next few months, we’ll get some brave early adopters to start using it and send us feedback.

We plan to publish all of the code for GVFS for Mac in our public repo. We’re also planning to change our development process to code in the open, rather than do occasional code dumps to the public repo. More info will be available on this very soon.

Category
DevOps

Author

0 comments

Discussion are closed.

Feedback