{"id":110692,"date":"2024-12-30T07:00:00","date_gmt":"2024-12-30T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110692"},"modified":"2024-12-30T07:46:00","modified_gmt":"2024-12-30T15:46:00","slug":"20241230-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20241230-00\/?p=110692","title":{"rendered":"How various git diff viewers represent file encoding changes in pull requests"},"content":{"rendered":"<p>In addition to the git command line tool, there are other tools or services that let you view changes in git history. The most interesting cases are those which present changes as part of a pull request, since those are changes you are reviewing and approving. But a common problem is that what they show you might not be what actually changed.<\/p>\n<p>I&#8217;ll limit my discussion to services and tools I have experience with, which means that it&#8217;s the git command line, Azure DevOps, GitHub, and Visual Studio. You are welcome to share details for other services that you use, particularly those used for code reviews.<\/p>\n<p>First, let&#8217;s consider a commit that changes the encoding of a file. For concreteness, let&#8217;s say that the file is this:<\/p>\n<pre>I just checked.\r\nIt costs <span style=\"border: solid 1px currentcolor;\">A3<\/span>1.\r\n<\/pre>\n<p>where <tt><span style=\"border: solid 1px currentcolor;\">A3<\/span><\/tt> represents a single byte with hex value <tt>0xA3<\/tt>. This is the representation of \u00a3 in the Windows 1252 code page.<\/p>\n<p>Suppose you change the encoding of this file to UTF-8:<\/p>\n<pre>It costs <span style=\"border: solid 1px currentcolor;\">C2<\/span><span style=\"border: solid 1px currentcolor;\">A3<\/span>1.\r\n<\/pre>\n<p>If you view this in the command line with <tt>git show<\/tt> you get<\/p>\n<div style=\"border: solid 1px currentcolor; width: 20em;\"><tt>\u00a0\u00a0I just checked.<\/tt><\/p>\n<div style=\"color: red;\"><tt>- It costs <span style=\"background-color: red; color: black;\">&lt;A3&gt;<\/span>1.<\/tt><\/div>\n<div style=\"color: green;\"><tt>+ It costs \u00a31.<\/tt><\/div>\n<\/div>\n<p>The command line version shows you that there used to be a byte <tt>0xA3<\/tt> but now there is a \u00a3 character.<\/p>\n<p>Next up is GitHub. Its diff says<\/p>\n<div style=\"border: solid 1px black; width: 20em;\">\n<div style=\"background-color: #f61acfa; color: #1f2328;\"><tt>\u00a0\u00a0I just checked.<\/tt><\/div>\n<div style=\"background-color: #ffebe9; color: #1f2328;\"><tt>- It costs \ufffd1.<\/tt><\/div>\n<div style=\"background-color: #d1f8d9; color: #1f2328;\"><tt>- It costs \u00a31.<\/tt><\/div>\n<\/div>\n<p>GitHub assumes that all files are in UTF-8, so it interprets the <span style=\"border: solid 1px currentcolor;\">A3<\/span> as an illegal UTF-8 code unit sequence and represents it with U+FFFD REPLACEMENT CHARACTER.<\/p>\n<p>Next up is <span style=\"text-decoration: line-through;\">Team Foundation Services<\/span> <span style=\"text-decoration: line-through;\">Visual Studio Online<\/span> <span style=\"text-decoration: line-through;\">Visual Studio Team Services<\/span> Azure DevOps. Azure DevOps. That&#8217;s the name. Azure DevOps.<\/p>\n<p>Here&#8217;s what Azure DevOps shows:<\/p>\n<div style=\"border: solid 1px black; width: 20em; background-color: rgba(251,242,236,1); color: black;\">\u26a0 The file differs only in whitespace.<\/div>\n<p>And if you expand the file and enable &#8220;Show whitespace changes&#8221;, it shows you no changes, not even whitespace changes!<\/p>\n<div style=\"border: solid 1px black; width: 20em; background-color: white; color: black;\">\n<div><tt>I just checked.<\/tt><\/div>\n<div><tt>It costs \u00a31.<\/tt><\/div>\n<\/div>\n<p>This is quite concerning, because it means that if you made a change to the text of a file and also changed the encoding, Azure DevOps highlights the text changes, but does not give any indication that the encoding changed!<\/p>\n<p>For example, maybe somebody changed the first line of text and accidentally changed the encoding from 1252 to UTF-8. Azure DevOps shows this as<\/p>\n<div style=\"border: solid 1px black; width: 20em; background-color: white; color: black;\">\n<div style=\"background-color: #fed2d0; color: black;\"><tt>I just <span style=\"background-color: #fdaeab;\">checked<\/span>.<\/tt><\/div>\n<div style=\"background-color: #ebf1de; color: black;\"><tt>I just <span style=\"background-color: #dbe6c5;\">looked<\/span>.<\/tt><\/div>\n<p><tt>It costs \u00a31.<\/tt><\/div>\n<p>It happily shows you the text change, but completely ignores the encoding change.<\/p>\n<p>That encoding change might have caused you to <a title=\"The Resource Compiler defaults to CP_ACP, even in the face of subtle hints that the file is UTF-8\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190607-00\/?p=102569\"> inadvertently change a bunch of strings in a Resource Script<\/a>, resulting in mojibake.<\/p>\n<p>If you ask Visual Studio to view the diff, it indicates that the file has been modified (M), but when you ask to see the diff, it says &#8220;0 changes&#8221;, and nothing is highlighted.<\/p>\n<p>Now let&#8217;s consider a commit that inserted a UTF-8 BOM at the start of a file.<\/p>\n<p>From the command line with <tt>git<\/tt>, you get this:<\/p>\n<div style=\"border: solid 1px currentcolor; width: 20em;\">\n<div style=\"color: red;\"><tt>- I just checked.<\/tt><\/div>\n<div style=\"color: green;\"><tt>+\u00a0 I just checked.<\/tt><\/div>\n<\/div>\n<p>The BOM displays as a space. Not great, but at least there is a +\/\u2212 to show you that <i>something<\/i> changed, and if the first line is not otherwise blank, the shifted contents tell you that <i>something<\/i> got inserted at the start of the file.<\/p>\n<p>For GitHub, the diff shows up like this:<\/p>\n<div style=\"border: solid 1px black; width: 20em;\">\n<div style=\"background-color: #ffebe9; color: #1f2328;\"><tt>- I just checked.<\/tt><\/div>\n<div style=\"background-color: #d1f8d9; color: #1f2328;\"><tt>- I just checked.<\/tt><\/div>\n<\/div>\n<p>The highlights tell you that something changed on that line, but squint all you want, you don&#8217;t see any change. The change must be invisible, but at least you&#8217;re told that there&#8217;s a change <i>somewhere<\/i> on that line; you just can&#8217;t see it.<\/p>\n<p>And finally, we have Azure DevOps:<\/p>\n<div style=\"border: solid 1px black; width: 20em; background-color: rgba(251,242,236,1); color: black;\">\u26a0 The file differs only in whitespace.<\/div>\n<p>As before, even if you expand the file and enable &#8220;Show whitespace changes&#8221;, you get no changes.<\/p>\n<div style=\"border: solid 1px black; width: 20em; background-color: white; color: black;\">\n<div><tt>I just checked.<\/tt><\/div>\n<div><tt>It costs \u00a31.<\/tt><\/div>\n<\/div>\n<p>So Azure DevOps tells you that the file changed in whitespace, but when you ask to see it, you are shown no changes.<\/p>\n<p>If you ask Visual Studio to view the diff, it once again indicates that the file has been modified (M), but when you ask to see the diff, it says &#8220;0 changes&#8221;, and nothing is highlighted.<\/p>\n<p>I suspect that in the cases where GitHub, Azure DevOps, or Visual Studio show no visible changes, most users will just conclude, &#8220;Must be a bug,&#8221; and not realize that no really, there&#8217;s a change in there that you can&#8217;t see.<\/p>\n<p>So let&#8217;s summarize these results in a table.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>\u00a0<\/th>\n<th>git command line<\/th>\n<th>GitHub<\/th>\n<th>Azure DevOps<\/th>\n<th>Visual Studio<\/th>\n<\/tr>\n<tr>\n<th>Code page<\/th>\n<td>UTF-8<\/td>\n<td>UTF-8<\/td>\n<td>Guess<\/td>\n<td>Guess<\/td>\n<\/tr>\n<tr>\n<th>Encoding changes<\/th>\n<td>Shown in diff<\/td>\n<td>Shown in diff<\/td>\n<td>No change shown<\/td>\n<td>No change shown<\/td>\n<\/tr>\n<tr>\n<th>BOM change<\/th>\n<td>Show as space<\/td>\n<td>Invisible<\/td>\n<td>No change shown<\/td>\n<td>No change shown<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>My take-away from this table is that if you do your work with any of these systems, you need to pay close attention when dealing with files that contain characters outside the 7-bit ASCII set because changes to encoding or the presence of a BOM can be hard to spot, or even become outright invisible, even though it drastically changes what the contents of the file mean.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The invisible UTF-8 BOM, and sometimes invisible encoding changes.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-110692","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>The invisible UTF-8 BOM, and sometimes invisible encoding changes.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110692","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110692"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110692\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110692"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110692"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110692"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}