Meet me here on Savvyday 29 Oatmeal 94

Raymond Chen

Raymond

The Internet­Time­To­System­Time function takes an HTTP time/date string and converts it to a windows SYSTEM­TIME structure. A customer noticed that the Internet­Time­To­System­Time returns strange results when given strange data.

InputResultNotes
Sat, 29 Oct 1994 09:43:31 GMTOctober 29, 1994 at 09:43:31 GMTAs expected
Sat 29 Oct 1994 09:43:31 GMTOctober 29, 1994 at 09:43:31 GMTMissing comma
Sat 29 Oct 1994 9:43:31 GMTOctober 29, 1994 at 09:43:31 GMTMissing leading zero
Sat Oct 29 9:43:31 1994October 29, 1994 at 09:43:31 GMTFlipped month/day and trailing year
Sat 29 Oct 1994 9:43October 29, 1994 at 09:43:00 GMTMissing seconds and time zone
Sat 29 Oct 1994October 29, 1994 at 00:00:00 GMTMissing time
Sat 29 Oct 94October 29, 1994 at 00:00:00 GMTTwo-digit year
Savvyday 29 Oatmeal 94October 29, 1994 at 00:00:00 GMTHorrible spelling errors
1March 4, 2020 at 15:00:00 GMTReturns current time

What’s going on?

The Internet­Time­To­System­Time function tries really hard to make sense out of what you give it. This sometimes leads to the absurd, like treating Savvyday as if it were a misspelling of Saturday and Oatmeal as if it were a misspelling of October.

The Internet­Time­To­System­Time is not a validator. It’s a best-guess parser. The expectation is that you are giving Internet­Time­To­System­Time a string that you got from an HTTP header, and you need to make as much sense out of it as you can, per Postel’s Principle.¹

Back in Windows 7, the feature team tried to make the parser more strict. This effort was a total disaster, because applications in practice were using the function to parse all sorts of things that didn’t even pretend to adhere to the RFC. For example, a photo processing shell extension used this function to parse dates, and the attempt to enforce strict validation caused the shell extension to stop working entirely.

Consequently, all the changes were backed out, and the parser to this day remains as tolerant of malformed dates as it was when it was originally written. The generous parsing is now a required feature.

¹ There are those who believe that Postel’s Principle is wrong.

16 comments

Comments are closed. Login to edit/delete your existing comments

  • Avatar
    Mason Wheeler

    > There are those who believe that Postel’s Principle is wrong.

    There definitely are, for exactly the reason you described in the article: if you tolerate malformed input, people will come to rely on it, and then you’ll never be able to fix it without breaking their workflow. (It gets even worse when you have multiple different implementations tolerating malformed input in different ways. Then you get the hideous mess that is cross-browser HTML compatibility!)

    • Avatar
      Antonio Rodríguez

      > Then you get the hideous mess that is cross-browser HTML compatibility!

      Sadly, that is one of the clearest cases of the usefulness of Postel’s Principle. Sometimes you have to write software which deals with existing, less-than-ideal implementations of a specification, protocol or API. You can’t be strict, because if you were, your program wouldn’t work with half the products on the market. And guess who the customer is going to place the blame on?

      This is, IMHO, both the bane and the main advantage of Windows: unlike MacOS and Linux, Windows is tolerant with old software which missuses its API or calls undocumented functions, and because of that, it’s often able to run 30-year-old software without a hiccup.

      Of course, both MacOS and Linux have reasons not to do that. In Linux, updating the distribution installs current versions of everything, and software is usually open source, so if something gets broken and is important enough for you, you can fix it yourself (most users do not have knowledge on how to do that, but technically it’s possible). About MacOS… well, its maker is in the hardware market, so it’s more than plausible to see that as a planned obsolescence strategy (note that MacOS and iOS are the only 64-bit OSes that ban 32-bit applications).

      • Avatar
        Zander

        > You can’t be strict

        Of course most of the non-strictness of HTML parsing is actually in the spec now (and some of it always was) so a compliant parser is strictly non-strict

        The whole streaming parser idea so browsers can paint early is neat, but boy does it add complexity

      • Avatar
        Mason Wheeler

        Yes, you’re making my point for me. Modern browsers have to use broken parsing because early browsers used it. If early browser makers had understood that Postel’s Principle was a terrible idea and had implemented strict parsing instead, things would likely be a lot better today, but because they didn’t, we can’t fix it now without breaking the Web.

        • Avatar
          Wayne Venables

          If browsers had implemented strict parsing from the start the modern web would never have possible. There would be no IMG tags, no JavaScript, no CSS, no AJAX, nothing. The loose parsing allowed new features to be created completely undreamed of by the creators of HTML. Those features could be created without breaking existing clients.

      • Avatar
        Antonio Rodríguez

        Which is why date handling is so difficult to get right. It’s no coincidence that the article’s focus is in a misused date parsing function: if you need to parse a date in any kind of human-readable format, is almost impossible to make it fool-proof, and very easy to slip some bug and make it fail in some corner case. I bet most developers think “why spend two weeks developing and testing a parser when you can abuse an OS API?”

    • Avatar
      Pierre Baillargeon

      No, it’s a choice and Postel’s Principles get applied for all the right reasons.

      Every system has hidden assumptions, and every widely used system will have user unwittingly depending on them. This is entirely independent of Postel’s Principle. It does not depend on how well defined and strict your implementation is.

      Given that, there are two effects at play.

      The first effect is that you can choose as a system provider to either stay backward compatible forever. Or you can choose to let some users be broken at every release. Contrary to your claims, this is independent of Postel’s Principle. The only consequence of ignoring Postel is that you can take the moral high ground that “my system is working according to spec”. This is like saying “I don’t care that you crash”. Users have little sympathy for that when things don’t work. They will interpret your insistence of being right according to spec as being pedant and ignoring their complaints. They’ll move elsewhere.

      The second effect is that as a system implementer, you can choose to maximize ease of use and compatibility with the world at large or not. This is Postel’s Principle. The usual result is that being strict will cost you market share as your system will be deemed unusable or giving bad results. It even has secondary effect: every competitor that is less flexible will be seen as being worse. Postel’s Principle result in a Darwinian’s selection for the most flexible specie.

      • Jake Boeckerman
        Jake Boeckerman

        A citation for the second paragraph:

        With a sufficient number of users of an API,
        it does not matter what you promise in the contract:
        all observable behaviors of your system
        will be depended on by somebody.
        Hyrum’s Law

  • Avatar
    Adam Rosenfield

    Does “Savvyday 29 Oatmeal 94” get parsed as Saturday+October due to those being the nearest day+month according to Levenshtein distance in English, or is there some other language where “Sav” and “Oat” are valid 3-letter abbreviations for that day+month? (Or some other reason?)

    I’m guessing it’s due to the Levenshtein distance, since RFC 822 specifies only the English day and month names, but I wouldn’t be surprised if some servers used localized names for those and Internet­Time­To­System­Time() was trying to parse those.

    • Avatar
      Mike Morrison

      I’m interested to know what the parser does with the name-of-day in the HTTP string since the name-of-day doesn’t appear in the systemtime. Does it ignore words that don’t even come close to the English words for days of the week?

      • Avatar
        Ivan Kljajic

        I was also curious. Omitting a day name (real or savvy) seems to produce error_invalid_parameter, and the day name is used to set wDayOfWeek in the systemtime. An invalid day name / day of month combination (eg: Mon, 05 Mar 2020) does not produce an error: the wDayOfWeek is just set to what it interprets from the given string (ie: 1 for Monday).
        And from my experience (heh), conversion functions that take a systemtime as input tend to ignore wDayOfWeek and use the other fields to generate the converted time. At any rate, my curiosity waned after reading / observing the behaviour of SystemTimeToFileTime / GetDateFormat (which correctly outputs Thursday for long time format).