Meet me here on Savvyday 29 Oatmeal 94

Raymond Chen

March 4th, 202016 0

The InternetTimeToSystemTime function takes an HTTP time/date string and converts it to a windows SYSTEMTIME structure. A customer noticed that the InternetTimeToSystemTime returns strange results when given strange data.

Input	Result	Notes
`Sat, 29 Oct 1994 09:43:31 GMT`	October 29, 1994 at 09:43:31 GMT	As expected
`Sat 29 Oct 1994 09:43:31 GMT`	October 29, 1994 at 09:43:31 GMT	Missing comma
`Sat 29 Oct 1994 9:43:31 GMT`	October 29, 1994 at 09:43:31 GMT	Missing leading zero
`Sat Oct 29 9:43:31 1994`	October 29, 1994 at 09:43:31 GMT	Flipped month/day and trailing year
`Sat 29 Oct 1994 9:43`	October 29, 1994 at 09:43:00 GMT	Missing seconds and time zone
`Sat 29 Oct 1994`	October 29, 1994 at 00:00:00 GMT	Missing time
`Sat 29 Oct 94`	October 29, 1994 at 00:00:00 GMT	Two-digit year
`Savvyday 29 Oatmeal 94`	October 29, 1994 at 00:00:00 GMT	Horrible spelling errors
`1`	March 4, 2020 at 15:00:00 GMT	Returns current time

What’s going on?

The InternetTimeToSystemTime function tries really hard to make sense out of what you give it. This sometimes leads to the absurd, like treating Savvyday as if it were a misspelling of Saturday and Oatmeal as if it were a misspelling of October.

The InternetTimeToSystemTime is not a validator. It’s a best-guess parser. The expectation is that you are giving InternetTimeToSystemTime a string that you got from an HTTP header, and you need to make as much sense out of it as you can, per Postel’s Principle.¹

Back in Windows 7, the feature team tried to make the parser more strict. This effort was a total disaster, because applications in practice were using the function to parse all sorts of things that didn’t even pretend to adhere to the RFC. For example, a photo processing shell extension used this function to parse dates, and the attempt to enforce strict validation caused the shell extension to stop working entirely.

Consequently, all the changes were backed out, and the parser to this day remains as tolerant of malformed dates as it was when it was originally written. The generous parsing is now a required feature.

¹ There are those who believe that Postel’s Principle is wrong.

Raymond Chen

16 comments

Discussion is closed. Login to edit/delete existing comments.

Mason Wheeler March 4, 2020 7:05 am 0

> There are those who believe that Postel’s Principle is wrong.

There definitely are, for exactly the reason you described in the article: if you tolerate malformed input, people will come to rely on it, and then you’ll never be able to fix it without breaking their workflow. (It gets even worse when you have multiple different implementations tolerating malformed input in different ways. Then you get the hideous mess that is cross-browser HTML compatibility!)
- Antonio Rodríguez March 4, 2020 7:54 am 0
  
  > Then you get the hideous mess that is cross-browser HTML compatibility!
  
  Sadly, that is one of the clearest cases of the usefulness of Postel’s Principle. Sometimes you have to write software which deals with existing, less-than-ideal implementations of a specification, protocol or API. You can’t be strict, because if you were, your program wouldn’t work with half the products on the market. And guess who the customer is going to place the blame on?
  
  This is, IMHO, both the bane and the main advantage of Windows: unlike MacOS and Linux, Windows is tolerant with old software which missuses its API or calls undocumented functions, and because of that, it’s often able to run 30-year-old software without a hiccup.
  
  Of course, both MacOS and Linux have reasons not to do that. In Linux, updating the distribution installs current versions of everything, and software is usually open source, so if something gets broken and is important enough for you, you can fix it yourself (most users do not have knowledge on how to do that, but technically it’s possible). About MacOS… well, its maker is in the hardware market, so it’s more than plausible to see that as a planned obsolescence strategy (note that MacOS and iOS are the only 64-bit OSes that ban 32-bit applications).
  - Zander March 5, 2020 6:18 am 0
    
    > You can’t be strict
    
    Of course most of the non-strictness of HTML parsing is actually in the spec now (and some of it always was) so a compliant parser is strictly non-strict
    
    The whole streaming parser idea so browsers can paint early is neat, but boy does it add complexity
  - Mason Wheeler March 5, 2020 6:59 am 0
    
    Yes, you’re making my point for me. Modern browsers have to use broken parsing because early browsers used it. If early browser makers had understood that Postel’s Principle was a terrible idea and had implemented strict parsing instead, things would likely be a lot better today, but because they didn’t, we can’t fix it now without breaking the Web.
    - Wayne Venables March 12, 2020 8:25 pm 0
      
      If browsers had implemented strict parsing from the start the modern web would never have possible. There would be no IMG tags, no JavaScript, no CSS, no AJAX, nothing. The loose parsing allowed new features to be created completely undreamed of by the creators of HTML. Those features could be created without breaking existing clients.
- David Lowndes March 4, 2020 8:56 am 0
  
  And it almost always happens when software uses human readable strings.
  - Antonio Rodríguez March 4, 2020 10:29 am 0
    
    Which is why date handling is so difficult to get right. It’s no coincidence that the article’s focus is in a misused date parsing function: if you need to parse a date in any kind of human-readable format, is almost impossible to make it fool-proof, and very easy to slip some bug and make it fail in some corner case. I bet most developers think “why spend two weeks developing and testing a parser when you can abuse an OS API?”
    - David Lowndes March 5, 2020 1:57 am 0
      
      It’s not only dates, it’s any string that needs parsing – a great source of bugs, ambiguity, and the waste of so many cpu cycles.
- Pierre Baillargeon March 5, 2020 10:50 am 0
  
  No, it’s a choice and Postel’s Principles get applied for all the right reasons.
  
  Every system has hidden assumptions, and every widely used system will have user unwittingly depending on them. This is entirely independent of Postel’s Principle. It does not depend on how well defined and strict your implementation is.
  
  Given that, there are two effects at play.
  
  The first effect is that you can choose as a system provider to either stay backward compatible forever. Or you can choose to let some users be broken at every release. Contrary to your claims, this is independent of Postel’s Principle. The only consequence of ignoring Postel is that you can take the moral high ground that “my system is working according to spec”. This is like saying “I don’t care that you crash”. Users have little sympathy for that when things don’t work. They will interpret your insistence of being right according to spec as being pedant and ignoring their complaints. They’ll move elsewhere.
  
  The second effect is that as a system implementer, you can choose to maximize ease of use and compatibility with the world at large or not. This is Postel’s Principle. The usual result is that being strict will cost you market share as your system will be deemed unusable or giving bad results. It even has secondary effect: every competitor that is less flexible will be seen as being worse. Postel’s Principle result in a Darwinian’s selection for the most flexible specie.
  - Jake Boeckerman March 5, 2020 11:59 am 0
    
    A citation for the second paragraph:
    
    With a sufficient number of users of an API,
    it does not matter what you promise in the contract:
    all observable behaviors of your system
    will be depended on by somebody.
    –Hyrum’s Law
Adam Rosenfield March 4, 2020 8:47 am 0

Does “Savvyday 29 Oatmeal 94” get parsed as Saturday+October due to those being the nearest day+month according to Levenshtein distance in English, or is there some other language where “Sav” and “Oat” are valid 3-letter abbreviations for that day+month? (Or some other reason?)

I’m guessing it’s due to the Levenshtein distance, since RFC 822 specifies only the English day and month names, but I wouldn’t be surprised if some servers used localized names for those and InternetTimeToSystemTime() was trying to parse those.
- Mike Morrison March 4, 2020 10:59 am 0
  
  I’m interested to know what the parser does with the name-of-day in the HTTP string since the name-of-day doesn’t appear in the systemtime. Does it ignore words that don’t even come close to the English words for days of the week?
  - Ivan Kljajic March 4, 2020 9:39 pm 0
    
    I was also curious. Omitting a day name (real or savvy) seems to produce error_invalid_parameter, and the day name is used to set wDayOfWeek in the systemtime. An invalid day name / day of month combination (eg: Mon, 05 Mar 2020) does not produce an error: the wDayOfWeek is just set to what it interprets from the given string (ie: 1 for Monday).
    And from my experience (heh), conversion functions that take a systemtime as input tend to ignore wDayOfWeek and use the other fields to generate the converted time. At any rate, my curiosity waned after reading / observing the behaviour of SystemTimeToFileTime / GetDateFormat (which correctly outputs Thursday for long time format).
Paul Jackson March 4, 2020 2:36 pm 0

“Missing seconds and time zone” – the timezone went missing one row earlier
Yuhong Bao March 4, 2020 6:10 pm 0

You mean in IE8, right? (this is a function in WinInet)
John Elliott March 5, 2020 3:08 pm 0

At first I thought this was going to be about parsing Discordian dates.

Meet me here on Savvyday 29 Oatmeal 94

Raymond Chen

Read next

16 comments