{"id":110131,"date":"2024-08-15T07:00:00","date_gmt":"2024-08-15T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110131"},"modified":"2024-08-15T07:39:46","modified_gmt":"2024-08-15T14:39:46","slug":"20240815-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20240815-00\/?p=110131","title":{"rendered":"Instead of putting a hash in the Portable Executable timestamp field, why not create a separate field for the hash?"},"content":{"rendered":"<p>Some time ago, we learned <a title=\"Why are the module timestamps in Windows 10 so nonsensical?\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180103-00\/?p=97705\"> why the module timestamps in Windows 10 are so nonsensical<\/a>: Because they aren&#8217;t timestamps any more. They are a hash of the resulting binary.<\/p>\n<p>But why not invent a new field called, say, <tt>UniqueValue<\/tt> for the hash, rather than putting it in the timestamp field?<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\"><a href=\"https:\/\/t.co\/iPc0RdM9vc\"> https:\/\/t.co\/iPc0RdM9vc<\/a> <br \/>\nyes, stupid decision imho; could use a diff. field for that<\/p>\n<p>\u2014 Adam (@Hexacorn) <a href=\"https:\/\/twitter.com\/Hexacorn\/status\/1758107579782221952?ref_src=twsrc%5Etfw\">February 15, 2024<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Well, for one thing, that would be a breaking change. If you take a binary produced by a linker that puts the hash in a new field and run it on an older system, the older system will ignore the hash and use the timestamp, so the hash does nothing.<\/p>\n<p>But wait, why are we gathered here in the first place? The reason for using a hash instead of a timestamp is to permit <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reproducible_builds\"> reproducible builds<\/a>, and the Wikipedia page specifically <a href=\"https:\/\/en.wikipedia.org\/w\/index.php?title=Reproducible_builds&amp;oldid=1209753618#Challenges\"> calls out timestamps for scorn<\/a>:<\/p>\n<blockquote class=\"q\"><p>According to the Reproducible Builds project, timestamps are &#8220;<a href=\"https:\/\/reproducible-builds.org\/docs\/timestamps\/\">the biggest source of reproducibility issues<\/a>.&#8221;<\/p><\/blockquote>\n<p>If you put a timestamp in the binary, then it&#8217;s no longer reproducible: Making no changes and rebuilding will produce a different binary because the timestamp will be different.<\/p>\n<p>If we want a reproducible build, we simply have to get rid of the timestamp.<\/p>\n<p>Remember what the timestamp is used for: It&#8217;s used by the module loader to detect whether precalculated addresses of imported functions should be trusted: When the addresses are precalculated (by <a title=\"What is DLL import binding?\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20100318-00\/?p=14563\">binding<\/a>), the timestamp of the module that was used as the basis of the precalculation is recorded by the importing module. When the loader loads the importing module, it checks whether that timestamp matches the timestamp recorded in the module from which the functions are being imported. If it matches, then the precalculated values are used. If it doesn&#8217;t match, then the precalculated values are ignored and new values are calculated from scratch<\/p>\n<p>Okay, so maybe we can use some other source as the timestamp, rather than the timestamp of the build itself. How about the timestamp of the most recent commit?<\/p>\n<p>That still doesn&#8217;t work because you can build multiple binaries from the same source code. Any precalculated values from a debug build will not be correct for a release build, and vice versa. Any switches that affect code generation must change the timestamp because the resulting binary is different and in particular the addresses of exported functions may change.<\/p>\n<p>Okay, but maybe we can start with the timestamp and, say, hash the compiler switches into a 16-bit value that gets added to the original timestamp. That way, you still get a pseudo-timestamp that is within a day of the actual timestamp.<\/p>\n<p>But now you&#8217;ve swung the pendulum too far the other way. Previously, the problem was that the timestamp didn&#8217;t change when it should have. Now the problem is that the timestamp changes when it didn&#8217;t need to. Maybe you made a commit to a <tt>README.md<\/tt> file. This isn&#8217;t even part of the source code, but it&#8217;ll change the &#8220;most recent commit&#8221; timestamp. Okay, so maybe you look only at commits that modify source code. But now you add a new <tt>enum<\/tt> to a header file (say, <tt>windows.h<\/tt>) that is included by every component, but only one component actually takes advantage of it. The change to the header file will update the &#8220;most recent source code commit&#8221; timestamp of every component, even though only one of the components actually changed as a result of the new <tt>enum<\/tt>. The other components are binary identical, or would be if it weren&#8217;t for the timestamp.<\/p>\n<p>The way to get the timestamp to change when the binary changes, but only when the binary changes, is to make the timestamp depend only on the binary itself (minus the timestamp field).\u00b9<\/p>\n<p><b>Bonus chatter<\/b>: Making the timestamp a hash of the binary contents simplifies the process of determining which binaries were affected by a change: Look for binaries whose timestamp hashes changed. Not only does this make things easier for the servicing team (to identify which binaries need to be included in the next monthly update), it&#8217;s also handy as part of your regular workflow: If you change a header file with the intention of fixing an issue in one component, and several dozen files changed timestamps, then that&#8217;s a signal that what you thought was a change with very limited scope turned out to have a much larger scope than you thought, and maybe you should figure out what unintended consequences your change precipitated.\u00b2<\/p>\n<p>\u00b9 This is a tautology, but sometimes it helps to state the tautology explicitly.<\/p>\n<p>\u00b2 One common example of this is adding a new method to a COM interface. This causes the IID to change, which in turn causes every module that produces or consumes that interface to change. What you thought was a simple change to one binary ended up pulling a dozen binaries into the next monthly patch. Instead, you should create a new interface for your new method and leave the original interface alone.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>That would defeat the purpose.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-110131","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>That would defeat the purpose.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110131","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110131"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110131\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110131"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110131"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110131"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}