Git repo tokens for the security service
The VSTS platform offers a security REST endpoint which allows you to add and remove permissions on resources. (To understand the rest of this blog post, you’re going to want to skim those docs first.) Several of the security APIs, as well as TFSSecurity.exe, expect a token identifying the resource to operate on. The token format varies across resources. A Git repository’s token is different from a work item’s token since they have distinct needs. How do you determine the token for a resource you want to secure?
You can often use the VSTS web UI coupled with Fiddler to learn the token for a particular resource. But maybe the thing you want to protect doesn’t exist yet – for example, if you want to put permissions on a Git branch that doesn’t currently exist. It would be nice to be able to generate a token without snooping on the web.
For a historical reason we’ll discuss below, Git tokens are, shall we say, a bit inscrutable. Here’s an actual example from our main development repo “ (I’ve obfuscated the GUIDs):
The token is part of a hierarchical security namespace. It’s a sequence of token parts separated by “/”. Breaking down those components:
|token root – lets the security service know this is for a Git repo
|branch – this is “master”
Since the namespace is hierarchical, the token could have stopped at any separator. Any ACEs you added would then apply to everything below that point in the hierarchy. In this case, the token goes all the way to the master branch. The project and repo IDs are fairly straightforward – you can get them from several different APIs (repo, project, or both). But how the heck did we get “refs/heads/6d0061007300740065007200” from “master”?!
The historical reason I mentioned above is that security tokens are case-insensitive. But in Git, ref names like branches are case-sensitive, so “master” and “Master” are not the same branch. They should not be affected by the same security token! We needed a way to encode case-sensitive names in a case-insensitive transport. There are many possible ways to do this. We chose one that has a combination of good performance and relatively easy debuggability.
The algorithm looks a little complicated until you realize what’s going on. It’s encoding the hexadecimal value of each character as a string. Also, because we’re in C#, the source string is (and thus characters are) UTF-16 little-endian. For instance, in UTF-16 the character “m” is represented by the bytes 0x6d00. Thus, the first four characters of the encoded string are “6d00”. Only the non-slash characters are encoded this way; a slash in the source remains a slash in the output. So a branch “user/mattc/feature1” becomes “7500730065007200/6d006100740074006300/66006500610074007500720065003100/”. Also, “refs/heads/” is the place where Git stores branch refs. On VSTS that’s always lowercase and from a small range of characters, so we don’t have to encode it. Instead, we use it directly to save a bit of space.
Here are code snippets in two different languages which implement the algorithm. Note that these are like Raymond Chen’s “little programs” – they ignore a lot of error-checking and defensive coding practices in the name of being easier to understand.