{"id":102493,"date":"2019-05-14T07:00:00","date_gmt":"2019-05-14T14:00:00","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/oldnewthing\/?p=102493"},"modified":"2020-07-02T08:23:47","modified_gmt":"2020-07-02T15:23:47","slug":"20190514-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190514-00\/?p=102493","title":{"rendered":"Mundane git tricks: Combining two files into one while preserving line history"},"content":{"rendered":"<p>Suppose you have two files that you want to combine into one. Let&#8217;s set up a scratch repo to demonstrate. I&#8217;ve omitted the command prompts so you can copy-paste this into your shell of choice and play along at home. (The timestamps and commit hashes will naturally be different.)<\/p>\n<pre>git init\r\n\r\n&gt;fruits echo apple\r\ngit add fruits\r\ngit commit --author=\"Alice &lt;alice&gt;\" -m \"create fruits\"\r\n&gt;&gt;fruits echo grape\r\ngit commit --author=\"Bob &lt;bob&gt;\"     -am \"add grape\"\r\n&gt;&gt;fruits echo orange\r\ngit commit --author=\"Carol &lt;carol&gt;\" -am \"add orange\"\r\n\r\n&gt;veggies echo celery\r\ngit add veggies\r\ngit commit --author=\"David &lt;david&gt;\" -m \"create veggies\"\r\n&gt;&gt;veggies echo lettuce\r\ngit commit --author=\"Eve &lt;eve&gt;\"     -am \"add lettuce\"\r\n&gt;&gt;veggies echo peas\r\ngit commit --author=\"Frank &lt;frank&gt;\" -am \"add peas\"\r\n\r\ngit tag ready\r\n<\/pre>\n<p>We now have two files, one with fruits and one with vegetables. Each has its own history, and the <code>git blame<\/code> command can attribute each line to the commit that introduced it.<\/p>\n<pre>git blame fruits\r\n\r\n^adbef3a (Alice 2019-05-14 07:00:00 -0700 1) apple\r\n8312990f (Bob   2019-05-14 07:00:01 -0700 2) grape\r\n2259ff53 (Carol 2019-05-14 07:00:02 -0700 3) orange\r\n\r\ngit blame veggies\r\n\r\n2f11bacc (David 2019-05-14 07:00:03 -0700 1) celery\r\n2d7b11e8 (Eve   2019-05-14 07:00:04 -0700 2) lettuce\r\n8c8cf113 (Frank 2019-05-14 07:00:05 -0700 3) peas\r\n<\/pre>\n<p>Now you decide that <code>fruits<\/code> and <code>veggies<\/code> should be combined into a single file called <code>produce<\/code>. How do you do this while still preserving the commit and histories of the contributing files?<\/p>\n<p>The na\u00efve way of combining the files would be to do it in a single commit:<\/p>\n<pre>cat fruits veggies &gt; produce\r\ngit rm fruits veggies\r\ngit add produce\r\ngit commit --author=\"Greg &lt;greg&gt;\" -m \"combine\"\r\n<\/pre>\n<p>The resulting file gets blamed like this:<\/p>\n<pre>eefddfb1 produce (Greg  2019-05-14 07:01:00 -0700 1) apple\r\neefddfb1 produce (Greg  2019-05-14 07:01:00 -0700 2) grape\r\neefddfb1 produce (Greg  2019-05-14 07:01:00 -0700 3) orange\r\n7a542f13 veggies (David 2019-05-14 07:00:03 -0700 4) celery\r\n2c258db0 veggies (Eve   2019-05-14 07:00:04 -0700 5) lettuce\r\n87296161 veggies (Frank 2019-05-14 07:00:05 -0700 6) peas\r\n<\/pre>\n<p>The history from <code>veggies<\/code> was preserved, but the history from <code>fruits<\/code> was not. What git saw in the commit was that one file appeared and two files vanished. The rename detection machinery kicked in and decided that since the majority of the <code>produce<\/code> file matches the <code>veggies<\/code> file, it infers that what you did was delete the <code>fruits<\/code> file, renamed the <code>veggies<\/code> file to <code>produce<\/code>, and then added three new lines to the top of <code>produce<\/code>.<\/p>\n<p>You can tweak the <code>git blame<\/code> algorithms with options like <code>-M<\/code> and <code>-C<\/code> to get it to try harder, but in practice, you don&#8217;t often have control over those options: The <code>git blame<\/code> may be performed on a server, and the results reported back to you on a web page. Or the <code>git blame<\/code> is performed by a developer sitting at another desk (whose command line options you don&#8217;t get to control), and poor Greg has to deal with all the tickets that get assigned to him from people who used the <code>git blame<\/code> output to figure out who introduced the line that&#8217;s causing problems.<\/p>\n<p>What we want is a way to get <code>git blame<\/code> to report the correct histories for both the fruits and the vegetables.<\/p>\n<p>The trick is to use a merge. Let&#8217;s reset back to the original state.<\/p>\n<pre>git reset --hard ready\r\n<\/pre>\n<p>We set up two branches. In one branch, we rename <code>veggies<\/code> to <code>produce<\/code>. In the other branch, we rename <code>fruits<\/code> to <code>produce<\/code>.<\/p>\n<pre>git checkout -b rename-veggies\r\ngit mv veggies produce\r\ngit commit --author=\"Greg &lt;greg&gt;\" -m \"rename veggies to produce\"\r\n\r\ngit checkout -\r\ngit mv fruits produce\r\ngit commit --author=\"Greg &lt;greg&gt;\" -m \"rename fruits to produce\"\r\n\r\ngit merge -m \"combine fruits and veggies\" rename-veggies\r\n<\/pre>\n<p>The merge fails with a rename-rename conflict:<\/p>\n<pre>CONFLICT (rename\/rename):\r\nRename fruits-&gt;produce in HEAD.\r\nRename veggies-&gt;produce in rename-veggies\r\n\r\nRenaming fruits to produce~HEAD\r\nand veggies to produce~rename-veggies instead\r\n\r\nAutomatic merge failed; fix conflicts and then commit the result.\r\n<\/pre>\n<p><b>Update<\/b>: Version 2.25.1 <a href=\"https:\/\/github.com\/git\/git\/commit\/d1075adfdf2d2008d665dc57b37c1f027f4ffd42\">changed what happens in the case of a rename\/rename conflict<\/a>.<\/p>\n<pre>CONFLICT (rename\/rename):\r\nRename fruits-&gt;produce in HEAD.\r\nRename veggies-&gt;produce in rename-veggies\r\n\r\nAuto-merging produce\r\n\r\nAutomatic merge failed; fix conflicts and then commit the result.\r\n<\/pre>\n<p>At this point, you create the resulting <code>produce<\/code> file by combining the two originals.<\/p>\n<p>If running pre-2.25.1:<\/p>\n<pre>cat \"produce~HEAD\" \"produce~rename-veggies\" &gt;produce\r\n<\/pre>\n<p>If running post-2.25.1:<\/p>\n<pre>git cat-file --filters HEAD:produce &gt;produce\r\ngit cat-file --filters rename-veggies:produce &gt;&gt;produce\r\n<\/pre>\n<p>Once you&#8217;ve generated the combined file, you can treat the file as resolved.<\/p>\n<pre>git add produce\r\ngit merge --continue\r\n<\/pre>\n<p>The resulting <code>produce<\/code> file was created by a merge, so git knows to look in both parents of the merge to learn what happened. And that&#8217;s where it sees that each parent contributed half of the file, and it also sees that the files in each branch were themselves created via renames of other files, so it can chase the history back into both of the original files.<\/p>\n<pre>^fa19403 fruits  (Alice 2019-05-14 07:00:00 -0700 1) apple\r\n00ef7240 fruits  (Bob   2019-05-14 07:00:01 -0700 2) grape\r\n10e90730 fruits  (Carol 2019-05-14 07:00:02 -0700 3) orange\r\n7a542f13 veggies (David 2019-05-14 07:00:03 -0700 4) celery\r\n2c258db0 veggies (Eve   2019-05-14 07:00:04 -0700 5) lettuce\r\n87296161 veggies (Frank 2019-05-14 07:00:05 -0700 6) peas\r\n<\/pre>\n<p>Magic! Greg is nowhere to be found in the blame history. Each line is correctly attributed to the person who introduced it in the original file, whether it&#8217;s <code>fruits<\/code> or <code>veggies<\/code>. People investigating the <code>produce<\/code> file get a more accurate history of who last touched each line of the file.<\/p>\n<p>Greg might need to do some editing to the two files before committing. Maybe the results need to be sorted, and maybe Greg figures he should add a header to remind people to keep it sorted.<\/p>\n<pre>&gt;produce echo # keep sorted\r\ngit cat-file --filters HEAD:produce &gt;&gt;produce\r\ngit cat-file --filters rename-veggies:produce &gt;&gt;produce\r\nsort -o produce produce\r\ngit add produce\r\ngit merge --continue\r\n\r\ngit blame produce\r\n\r\n057507c7 produce (Greg  2019-05-14 07:01:00 -0700 1) # keep sorted\r\n^943c65d fruits  (Alice 2019-05-14 07:00:00 -0700 2) apple\r\ncfce62ae veggies (David 2019-05-14 07:00:03 -0700 3) celery\r\n43c9aeb6 fruits  (Bob   2019-05-14 07:00:01 -0700 4) grape\r\n5f60490e veggies (Eve   2019-05-14 07:00:04 -0700 5) lettuce\r\n143eb20f fruits  (Carol 2019-05-14 07:00:02 -0700 6) orange\r\n75a1ad0c veggies (Frank 2019-05-14 07:00:05 -0700 7) peas\r\n<\/pre>\n<p>For best results, your rename commit should be a pure rename. Resist the tempotation to edit the file&#8217;s contents at the same time you rename it. A pure rename ensure that git&#8217;s rename detection will find the match. If you edit the file in the same commit as the rename, then whether the rename is detected as such will depend on git&#8217;s &#8220;similar files&#8221; heuristic.\u00b9 If you need to edit the file as well as rename it, do it in two separate commits: One for the rename, another for the edit.<\/p>\n<p>Wait, we didn&#8217;t use <code>git commit-tree<\/code> yet. What&#8217;s this doing in the <i>Mundane git commit-tree tricks<\/i> series?<\/p>\n<p>We&#8217;ll add <code>commit-tree<\/code> to the mix next time. Today was groundwork, but this is a handy technique to keep in your bag of tricks, even if you never get around to the <code>commit-tree<\/code> part.<\/p>\n<p>\u00b9 If you cross the <code>merge.renameLimit<\/code>, then git won&#8217;t look for similar files; it requires exact matches. The Windows repo is so large that the rename limit is easily exceeded. The &#8220;similar files&#8221; detector is <var>O<\/var>(<var>m<\/var> \u00d7 <var>n<\/var>) in the number of files changed in the two parents, and when your repo has 3 million files, that quadratic growth becomes a problem.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Starting with the two-file case.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-102493","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>Starting with the two-file case.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/102493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=102493"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/102493\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=102493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=102493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=102493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}