September 19th, 2019

How to duplicate a file while preserving git line history

Today, we’re going to duplicate a file while preserving git line history.

This could be useful if you want two copies of a component, say, one where you are doing a bunch of disruptive work, and another that remains largely unchanged. The project continues to use the old, stable version, but there’s a feature flag to switch to the new, exciting one. Eventually, you’ll make the new, exciting one the default version.

When you do this, you want the line history of the new version to be the same as the line history of the old version, because the new version is basically a fork of the old version.

Again, let’s use the same scratch repo as we did for the last few days. You can follow the same copy/paste script, or you can take your existing scratch repo and git reset --hard ready to get it back into its “ready to start experimenting” state.

Let’s set up a scratch repo to demonstrate. I’ve omitted the command prompts so you can copy-paste this into your shell of choice and play along at home. (The timestamps and commit hashes will naturally be different.)

git init

>foods echo apple
git add foods
git commit --author="Alice <alice>" -m created

>>foods echo orange
git commit --author="Bob <bob>"    -am orange

git blame foods

^62ef37c (Alice 2019-09-19 07:00:00 -0700 1) apple
335acb1b (Bob   2019-09-19 07:00:01 -0700 2) orange

We employ our standard trick: Create a branch where the desired new file appears to have been created via a rename of the original file. And then restore the original file.

git checkout -b dup

git mv foods foods-new
git commit --author="Greg <greg>" -m "duplicate foods to foods-new"

git checkout HEAD~ foods
git commit --author="Greg <greg>" -m "restore foods"

git checkout -

On this branch, we renamed foods to foods-new. When git traces the history of the foods-new file, it’ll see that the file was created via rename from foods, so git will use food‘s history to build the line history.

And then we bring back the original foods file. We use the git checkout HEAD~ foods command to restore the file from a specific commit, namely the commit before we renamed it away.

git merge --no-ff dup

Merge made by the 'recursive' strategy.
 foods-new | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 foods-new

The dup branch deleted the foods file, and then restored it. That means there was no net change to the file in the dup branch, and even git log won’t notice it by default. If you do a log of the foods file, the merge doesn’t even show up.

git log --oneline foods

                ← the merge doesn't appear
335acb1 orange
62ef37c created

The line histories of the two files are identical, because the foods-new was created at the same time an identical foods file disappeared, which made git consider the operation to be a rename for the purpose of history tracking.

git blame foods

^62ef37c (Alice 2019-09-19 07:00:00 -0700 1) apple
335acb1b (Bob   2019-09-19 07:00:01 -0700 2) orange

git blame foods-new

^62ef37c foods (Alice 2019-09-19 07:00:00 -0700 1) apple
335acb1b foods (Bob   2019-09-19 07:00:01 -0700 2) orange
Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

4 comments

Discussion is closed. Login to edit/delete existing comments.

  • Neil Rashbrook

    > When git traces the history of the foods-new file, it’ll see that the file was created via rename from foods, so git will use food‘s history to build the line history.

    If only it would. Just this week, I wanted to merge master into a branch which had been forked 443 commits ago. The merge failed because of a modify/delete conflict, although this was really a rename. To save time I simply merged in both...

    Read more
    • Daniel Sturm

      You probably ran into the changed file limit where git uses a simpler algorithm tutto figure out renames. You can increase the limit to give git more time to figure it out.

      Don’t remember the command-line argument for that anymore

      • Neil Rashbrook

        Actually the file hadn’t been changed, but just to be sure, I tried it with find-renames=100 and that worked…

  • cheong00

    I suggest to add “git” tag to this series. These posts contains useful tricks that I would very likely want to revisit later.