November 13th, 2005

A Branching and Merging Primer

Brian Harry
Corporate Vice President

Hmm, OK I guess I jumped too quickly into using unfamiliar terminology.  Let me step back and define some of the concepts/terms a little more and then hopefully that last post will make more sense. The Source Tree Let’s start with what the source tree in the Developer Division looks like.  It has the following top level folders (not a complete list but a relevant subset).  Each of these folders has its own subtree of subfolders and files. CSharp
DDSuites
Public
Tools
VB
VC
VSCommon
VSET
CSharp – is a tree that contains all of the code for the CSharp compiler, project system and related components.
DDSuites – is a very large tree of tests for all of the components in the system.  Any developer can get this folder and run the tests.  This is where we put our unit tests.
Public – This is all of the .h files, import libraries, .NET Framework assemblies, etc. that are needed to build.  For example we check in the .NET Framework assemblies, Windows SDK, etc.  Source code elsewhere in the tree references standard assemblies and header files from this directory.
Tools – A big tree that contains all of our compilers, linkers, source control tools, build configuration files, etc.  Basically everything needed to build the system.  It does not include the IDE – just command line tools.
VB – All the source code that the VB team has written.
VC – All of the source code that the VC team has written.
VSCommon – A set of shared utility library source code, midl files and the like that are shared across many components in VS.
VSET – The source code for the Team System tools.
And of course there’s more – probably 30 or 40 top level folders in our tree but this is a good representative sample. Why do we check in all of our tools and includes?  Developers already have VS installed on their machine, right?  Well, yes but there are several advantages.  First by versioning them with the source code we can ensure that we always have a consistent set.  If we need to go back and reconstruct a build from 6 months ago we also have the tools we used to build it at that time.  One thing to keep in mind is that we are building the tools too so every few months we check in a new version of the compilers and libraries, etc.  Using the version control system is a great way to distribute the tools to everyone.  Another benefit of doing it this way is that the system is self contained.  You can walk up to a newly installed machine (just the OS), create a Team Foundation workspace, do a get, build and everything works.
How a Developer Uses the Tree I create a workspace on my machine (workspace is a mapping construct that describes what folders to get and where to put them).  Very few developers put the entire tree in their workspace because it is so big.  Pretty much everyone includes VSCommon, Public and Tools.  Beyond that developers include the folders they need. To give an example (from Team Foundation – which I enlist in), my workspace looks something like this. $/Main/tools -> d:\dd\tools
$/Main/public -> d:\dd\public.
$/Main/vscommon -> d:\dd\vscommon
$/Main/ddsuites -> d:\dd\ddsuites
$/Main/vset -> d:\dd\vset
One of the nice things about our build system is that it allows me to build at any level in the tree.  I can build everything in my workspace by going to d:\dd and typing “build”.  Or I can build just the Team System components by going to d:\dd\vset and typing build – or just Version control from d:\dd\vset\scm\SourceControl, etc. After I build, all my built binaries end up in d:\binaries.<cpu><build type>.  For example if I build x86 debug then they end up in d:\binaries.x86dbg.  If I build retail they end up in d:\binaries.x86fre (don’t ask me why retail is called fre :)).
On to Branching and Merging OK, hopefully with a little background on what the “tree” is, the branching part will be a little easier to understand.  What I’ve described above is how it would all work if all of the developers worked together on the same source at the same time.  As I described in my last blog post, this is impractical – too many developers changing things. So we created branches off of Main and, in fact, no developer actually works in Main – they all work in some branch.  So start at the beginning.  We checked all of our source into the tree under $/Main.  We then created branches of main.  Using the Source Control Explorer in Team Foundation you can do this by selecting $/Main and choosing File -> Source Control -> Branch.  When the dialog comes up we’d choose $/Lab21. This would create a whole new copy of the source tree that would look something like: $/Main
      CSharp
      DDSuites
      Public
      Tools
      VB
      VC
      VSCommon
      VSET
$/Lab21
      CSharp
      DDSuites
      Public
      Tools
      VB
      VC
      VSCommon
      VSET

Right after the branch, $/Main and $/Lab21 are basically exact copies.  Fortunately, however, it doesn’t double your disk space usage.  The new branch ($/Lab21) references the same copies of the files that the original, or Parent branch ($/Main) contains.  Only when you actually modify (checkin) a file in one of the branches is another copy of the modified file made. Let’s talk for a second about Branch lineage or Parenting.  It’s very complicated and takes a while to internalize.  Let’s take a file in the system as an example. There is a file (you can probably guess what it is :)) $/Main/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs When we branched the $/Main folder above a whole copy of the tree was created.  So there is now another “copy” (remember we don’t actually duplicate the contents until you change it but looking at the tree you can’t tell the difference) of CommandCheckin.cs at: $/Lab21/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs From the perspective of the “folder hierarchy” the two files are in very different parts of the tree – one is deep down under $/Main and the other is deep down under $/Lab21. However when speaking from a “branch hierarchy” perspective rather than a “folder hierarchy” perspective $/Main/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs is the parent of $/Lab21/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs.  When the Lab21 tree was created each file was branched from the corresponding file in the Main tree and this relationship is maintained.  Having this relationship allows changes in CommandCheckin.cs to be easily merged back and forth between the two branches.  So, imagine I had a change to $/Lab21/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs that I want to move to the main branch.  I can do this by (again using the Source Control Explorer) selecting CommandCheckin.cs under $/Lab21 and choosing File -> Source Control -> Merge.  This will give me a choice of files to merge with and one of them will be the CommandCheckin.cs under $/Main.  After you hit OK, the changes to CommandCheckin.cs under $/Lab21 will be incorporated into the CommandCheckin.cs under $/Main. To make managing branches easier, you can do this merging at a higher level too.  For example, using the Source Control Explorer as above, I can select the $/Lab21 folder itself rather than the CommandCheckin.cs file way down below it.  When picking the merge target, I pick the $/Main folder.  Because it tracks all of the relationships, it looks down the tree and finds that CommandCheckin.cs has been changed in the $/Lab21 tree and merges it with the corresponding CommandCheckin.cs in the $/Main tree.  Being able to do this makes managing merging changes between branches dramatically easier. Because changes can be merged in either direction and it’s confusing which one you mean.  If I say I’m merging Lab21 and Main, what do I mean?  In order to do this we coined some terminology to indicate the direction of the merge.  A “Reverse Integration” (abbreviated RI) is a merge from the “Branch child” to the “Branch parent”. And a “Forward Integration” (abbreviated FI) is a merge from the “Branch parent” into a “Branch child”.  So using the example from above, if I said I’m going to RI Lab21.  We know that means we are going to merge changes that have been made to files under $/Lab21 into the corresponding files under $/Main. Hopefully that helps understand the difference between the folder hierarchy and the branch hierarchy.  We can talk meaningfully about both.  When we do this I represent them as follows: Repeating the folder hierarchy from above: $/Main
      CSharp
      DDSuites
      Public
      Tools
      VB
      VC
      VSCommon
      VSET
$/Lab21
      CSharp
      DDSuites
      Public
      Tools
      VB
      VC
      VSCommon
      VSET
However, I’d represent the Branch Hierarchy as follows: Main
      Lab21
What this means is that Lab21 was created by branching from Main.  Even though in the folder hierarchy they are peers, in the branch hierarchy Lab21 is a “child” of Main.  All of the files in Lab21 were branched from the corresponding files in Main.  So looking at a more complex example from my incomprehensible blog post: Main
      Lab21
            Lab21dev
                  Clr
                  …
      Lab22
            Lab22dev
                  VB
                  …
      Lab23
            Lab23dev
                  TeamFoundation
                  …
      RTM
            Servicing
                  VSTFRTM
      …
This says that Lab21, Lab22, Lab23 and RTM were created by branching from Main.  Lab21dev was created by branching from Lab21.  Clr was created by branching from Lab21dev and so forth.  When it gets this complicated it becomes even more useful to be able to talk about RI’s and FI’s.  For example, I RI changes from Clr into main (done by merging Clr into Lab21dev, then merging Lab21dev into Lab21, then merging Lab21 into Main).  An I FI Main into TeamFoundation (done by merging Main into Lab23, then Lab23 into Lab23dev and finally Lab23dev into TeamFoundation).
Closing I hope that’s enough background to understand the previous blog post.  That post is more about the end result and the rationale behind it than the mechanics and concepts behind it.  I hope this has enough of those to make the former comprehensible.  If not, let me know and I’ll try again 

Brian

Author

Brian Harry
Corporate Vice President

Corporate Vice President for Cloud Developer Services.

0 comments

Discussion are closed.