Keyword expansion in TFS
Periodically, the topic of keyword expansion comes up, which TFS (at least through 2008) does not support. At one point during the v1 product cycle, it was a planned feature and was partially implemented. However, there are lots of challenges to getting it right in TFS version control, and it wasn’t worth the cost to finish the feature. As a result, we ripped it out, and TFS does not support keyword expansion.
Since it’s not supported in the product and not likely to be supported any time soon, folks gravitate toward the idea of using checkin policies to implement keyword expansion. The idea is appealing since the checkin policy will be called prior to checkin, of course, which would seem to provide the perfect opportunity to do keyword expansion.
Personally, I’m not fond of trying to do keyword expansion as a checkin policy. There are a number of issues related to checkin policies to deal with immediately, because any checkin policy that performs keyword expansion is going to modify file contents.
- Checkin policies get called repeatedly. Every time the user clicks on the checkin policy channel in the pending changes tool window in Visual Studio, for instance, the checkin policies are evaluated.
- Whatever a policy does must be done really quickly. Otherwise, you are going to make the VS painful to use. The checkin policy evaluation isn’t done on a background thread, and it wouldn’t really help anyway since you wouldn’t want to have to wait for some long policy evaluation before the checkin process started.
- Checkin policies can be evaluated at any time. The user may or may not actually be checking in at the point that the checkin policies are evaluated. You even have the option of evaluating checkin policies prior to shelving.
- For applications using the version control API, checkin policies are only evaluated if the application chooses to evaluate them (see How to validate check-in policies, evaluate check-in notes, and check for conflicts). Some folks may read this and this it’s a hole in checkin policy enforcement. However, since checkin policy evaluation is done on the client, you can’t rely on it being done (i.e., clients can lie and call the web service directly anyway). The other reason has to do with performance. For an application like Visual Studio, it controls when checkin policies are evaluated, and by the time that it calls the version control API to checkin, there’s no need to evaluate them yet again. Some day there may be server-side checkins, but they don’t exist yet (as of TFS 2008).
- You’ve got to get your checkin policy onto all of the client computers that are used to check in. The deployment story for checkin policies is probably the single biggest hole in the checkin policy feature in the product (the second biggest hole is the lack of built-in support for scoping the policy to something less than an entire team project, though there is a power tool checkin wrapper policy to do that now). Any computer without the checkin policy assembly on it and properly listed in the registry is not going to do keyword expansion.
If you read that and still want to do it, you would need to pend an edit on each file that does not already have the edit bit set (for example, non-edit branch, merge, rename, and undelete) and is not a delete (can’t edit a pending delete). I’m pretty sure that VS and the command line will have problems with changes being pended during a checkin policy evaluation, because they’ve already queried for the pending changes and won’t re-query after the pending checkin policy evaluation. This would result in edits not being uploaded. This pretty much makes pending edits on files via the checkin policy impractical.
Alternatively, you could do keyword expansion only for changes where the edit bit is already set in the pending change. That’s sort of the “least evil solution.” You would just use the checkin policy for keyword expansion in files that already have pending edits (i.e., check to see that the Edit bit is set in the pending change’s ChangeType).
Some of the files with pending edits may not have actually been changed (e.g., you pended edits on all of the files in a directory as a convenience because you knew you would be editing at least half of them via a script). When the server detects that a file that’s being checked in with only a pending edit hasn’t been modified, it doesn’t commit a change for that file (i.e., create a new version of that file). You can read a bit about that in the post, VC API: CheckIn() may return 0. To detect for yourself whether this is the case, you can compute the MD5 hash of the file content and compare that to the HashValue property of the PendingChange class. If the two are equal, then the file didn’t change. For those of you doing government work, you’ll want to watch out for FIPS enforcement. When that’s turned on in Windows, MD5 hashes are unavailable because the MD5CryptoServiceProvider class in .NET throws when you try to create one. In that environment, the hash values are empty arrays.
But wait, there’s more! You would also have to make sure that you read and write the file in the correct encoding (e.g., reading in DBCS as ASCII or Unicode would be bad – for example, Japanese or Chinese DBCS files). There are probably more encoding issues to contend with. One thing that’s probably on your side, though, is that if you do read in the file in the wrong encoding, you won’t likely find the markers indicating that the file needs keyword expansion. To avoid randomly finding the tags when you know you don’t want to, you’d likely want to skip all binary files that your expansion logic doesn’t know how to handle (e.g., you could conceivably handle keyword expansion in JPEG file headers, but that doesn’t seem too likely).
The other thing to consider is how keyword expansion interacts with branching and merging. Imagine putting the date in every file in a keyword expansion system. It’s going to be a merge conflict every time your merge branches. The same is true for log (history) information. You would need to write a tool to handle the content conflicts; otherwise, merging large branches would be a real bear.
Even with all of that, you are not going to get one of the things that often comes up (and this was true when we were thinking of putting in the product, because it was all going to be done on the client prior to checking in) which is the ability to record the changeset number in the keyword expansion comment.
So, to sum it all up, you are going to need to consider the following.
- Your checkin policy will get evaluated multiple times and often not when someone is actually checking in.
- You’ll want to only modify the files that already have pending edits, as pending new changes from within a checkin policy may lead to new content changes being uploaded with the rest of the checkin.
- You’ll want to test out how it’s going to interact when merging files from one branch to another. What’s it like to deal with the conflicts your keyword expansion introduces?
- There will be checkins where keyword doesn’t happen for whatever reason, so it’s not completely reliable. This is in addition to the fact that changes that do not already involve an edit won’t have keyword expansion at all.
If we had done keyword expansion as a feature, what would have been different (other than the fact that you wouldn’t have to think about all of this 🙂 )? We wouldn’t have the limitation of 1. Just like in 2, we’d have to think hard about whether every rename, branch, undelete, and merge should have an edit pended also. At best it would have been an option, and it wouldn’t have been the default. Regarding the branch merging issue, doing something on the server would be a performance hit that may be excessive with large branches (keep in mind that the server stores the content as compressed and doesn’t uncompress it — the client takes care of compressing and decompressing content), so the client would need to have some logic to help make it bearable (e.g., before performing a three-way merge, collapse all of the expanded keywords).
Our conclusion was that the feature was too expensive to implement and test relative to the value provided and other features could be implemented and tested with a similar effort (e.g., making history better). Folks either strongly agree (can’t live without it) or disagree (don’t care about it at all) with that conclusion. There’s rarely anyone on the fence. The feedback we’ve received to this point indicates that we’ve made the right tradeoff for the vast majority of our customers (and potential customers).