Why perform all these complex git machinations when you can just tweak the command line to get what you want?
The recent series on splitting out files (and the earlier series on combining files) went through complex git machinations to get the desired effect. Are we just doing things the hard way? Just make a commit that contains the desired changes all in one go, no fuss, no muss, no bother. By fiddling with command line options, say by using
-C -C -C, we can get
git blame to produce the desired results.
But that’s not practical for various reasons.
First of all, everybody in the team¹ has to remember to use the correct command line switches. The people over in QA who field the bug are probably not going to remember to use Greg’s special command line options when running
git blame on this specific file. They’re just going to use the default options, and that means that skipping the fancy git commit tricks and relying on special command line options means that all of the tickets will end up assigned to Greg.
Greg could try to correct every person who blames him for a line of code by saying, “No, I’m not responsible for that line of code. If you run
git blame with this special command line, you’ll see that it’s not really me. It was Bob!”
This solution makes nobody happy. Greg is frustrated at having to explain this over and over again. And all of Greg’s teammates are frustrated that every time they assign a ticket to Greg, there’s a good chance Greg is going to get all annoyed with them and say, “But if you run
git blame in this very specific way, then you can see that I’m not the one to blame.”
From the point of view of everybody who isn’t Greg, it looks like Greg is just shirking his responsibility and trying to shift blame. He found a very special command line that causes his name to disappear from
git blame, and he insists that everyone use it. “Greg probably spent days trying out different command line options until he found the magic one that causes his name to disappear.”
Furthermore, Greg’s preferred command line switches may conflict with somebody else’s preferred command line switches. Greg eventually strong-arms everybody into running
git blame with the
-C20% switch. A few weeks later, Hannah does another split-out and says, “Oh, in order for this to work, you need to pass
-C15%.” But passing
-C15% causes Isaac’s change from two weeks ago to start producing false-positives and treat a file as having been copied, even though it wasn’t.
Basically, this is another case of using a global solution to a local problem. The local problem is “This brief sequence of commits needs special treatment in order to produce the results I want.” The command line is a shared resource, and for each command, there is only one. If different sequences of commits require different settings, then you’ll never find a setting that works for everyone.
What’s more, the preferred command line switches may not be suitable for your repo. For example, the Windows repo has 3 million files, and it is not unusual to see tens of thousands of files modified by a single commit. Any nonzero value for
-M would cause git to perform millions of file-to-file comparisons. Instead, you’ll probably hit the
renameLimit, and the
-M options will be ignored entirely. When your change merges into another branch as part of a large payload, no special command line will save you.
Finally, you may not have control over the command line switches at all. When you view the
git log or
git blame on a Web-hosted repo, the server decides what command line options to use, not you. (Maybe there’s an administrative setting to control the options used by the Web server, but then you’re back to the “global solution” problem.)
The techniques I’ve been using will get the desired results independent of the command line options. Even in the face of
-M0 -C0, the special commit sequences will still work because they provide perfect hash matches, which means that git will detect the rename without having to do any file-by-file comparison.
In other words, these special commit sequences are, in a sense, universal: They will work on any repo, no matter how it is configured.
¹ And there is definitely a team here, because if there is only one developer working in the repo, then there would be no need to do fancy blame games. Everything in the repo is blamed on one person!
The above all makes sense. However, I am still left with the feeling that while you’re right that it’s a bad idea to use a global solution for a local problem, the broader issue is that git makes you jump through all these hoops (mentioned earlier this week) just to get it to accurately reflect the history of a file.
Sure, other source repository software may at times struggle with similar things, but I have had unusually frequent difficulty with git in this respect. It seems fundamentally broken in a number of ways, requiring extraordinary user hoop-jumping to get it to do things that are just run-of-the-mill operations in other environments.
On top of that, while not surprising at all, I find that this makes it a poster-child for the counter-argument to the claim that open-source is inherently better because it will enjoy broad community support to make it better. No, instead in practice there’s no economic incentive for git to get better, and no one has enough personal incentive to spend the significant amount of time it would take to fix git to handle these scenarios better. But, since it’s “free” it becomes the industry standard and everyone gets to live with its limitations, regardless, while commercially-available competitors can’t compete with “free” and so also can’t justify investing resources in improving themselves.
To be clear: I’m not against the fundamental idea of open-source. But it’s clear that its use should be approached much more carefully than it is.
I agree that it’s frustrating to have to jump through these hoops, but the point of these articles is not to complain about hoop-jumping, but rather to show that with the correct sequence of hoop-jumps, it is at least possible to get what you want with what you have today. You’re not blocked on a feature request being approved and implemented.
Most “proprietary” version control systems simply can’t keep line history in this situation at all – they don’t even try.
So yes, git is objectively better – while imperfect, of course.
For example, the well-known proprietary centralised system currently used at my workplace needs multiple commands, with the output of each fed into complex powershell/bash/python etc scripts to give you the username and commit message that probably most recently changed a line.
(And only a line. Not a range or a function)
Worse, it requires special incantations to rename a file – cannot detect rename, the user renaming/moving the file must explicitly say it’s a rename/move or it will be treated as a commit with a delete-file and unrelated create-file.
It has no way of dealing with a file split or join at all. The part that was split off or added to that filename simply vanishes or appears.
So it’s always Greg’s fault, no matter what.
Worst, its tracking across branches is strangely broken, often giving clearly wrong answers for no obvious reason. (Presumably if it were obvious then the vendor would have fixed it by now)
Which tends to make everything the fault of whoever accepts changes into the “master”/”release” branch. Sometimes.
I’m lead to believe that Mercurial is the only proprietary system that can do “XXX blame” properly, though I’ve never tried it myself.
Humm… I think SVN also have “Copy” command that helps you copy file / folder and its history to another location within repository (or even as another branch)
Mercurial is actually open source as well (GPL 2+).
Even if there’s only one person working in the repo, it might be desirable to play the blame game, because one might want to know when a line has been last changed.