Finding the changes between two labels in TFS version control
Carl Daniel (code) and Robert Downey (code) each wrote and posted code to show the changesets between two labels in TFS version control. They each took the same basic approach: call QueryLabels() to get the set of items in each label, find the highest changeset version for each set of items, and then call QueryHistory() with the range of versions specified as the two highest changeset versions found.
There’s one technical flaw with this approach. A label in Team Foundation Server is not a point in time. In TFS labels are collections of files, each at a particular version. Brian Harry wrote about this difference in a post titled, Why TFS labels aren’t like SourceSafe labels.
In Team Foundation, labels are more powerful. Instead of being a single point in time, they are able to have versions of each file in the label from different points in time. The canonical scenario is that you label a build and then find some bugs and want to go back change the versions of a few of the files (either omitting changes that introduced bugs or adding changes that fixed bugs). Now the label does not represent a point in time, but rather a collection of points in time. This makes it very hard to display it in a list mixed with change sets because there is not “correct” ordering of the list. As a result we treat the list of changesets and the list of labels separately.
Another very common approach is creating a label using the versions of files in a workspace. A workspace is also a collection of files, each potentially from a different point in time. Team Build labels the files in its workspace as part of the build process. By doing this Team Build guarantees that it is labeling what it’s building. Steve St. Jean talks about this in his post, Why I Like Labels in TFS Version Control.
By finding the highest changeset in each label to compute the differences, the label is inherently being represented at a point time, as a changeset is just a precise point in time with respect to the state of the repository. As a result, the report produced by these apps won’t necessarily be completely accurate if the label has been changed, such as Brian describes in his example, or the label was based on the versions in a workspace where not all versions are from the same point in time.
Computing the actual set of changes between two labels, taking into account that labels are not points in time, becomes a much more expensive computation involving computing the history of each item between the two changesets in the label (and handling adds and deletes). In fact, the GenCheckinNotesUpdateWorkItems task in Team Build takes this fully accurate approach, and it’s why that task is so slow (see GenCheckinNotesUpdateWorkItems task is expensive and Measuring Performance of Team Build Build Process). By the way, you can set SkipPostBuild to true in your TfsBuild.proj (add this line: <SkipPostBuild>true</SkipPostBuild>) file to prevent that task from running if you find it too expensive and don’t need the changesets and work items recorded in the build information and the “fixed in field” updated in those same work items.
So, should you go with the fast approach used by Carl and Robert or take the slow approach used by Team Build? The answer depends on whether your labels are points in time and what level of accuracy you need when determining what’s changed between two labels. The approach used by Carl and Robert is completely accurate if your labels are strictly points. If you need complete accuracy and your labels are not points in time, you’ll need the more expensive approach. If you don’t need complete accuracy, the fast approach may give an answer that’s “close enough” for what you are doing.