The InternetÂTimeÂToÂSystemÂTime
function takes an HTTP time/date string and converts it to a windows SYSTEMÂTIME
structure. A customer noticed that the InternetÂTimeÂToÂSystemÂTime
returns strange results when given strange data.
Input | Result | Notes |
---|---|---|
Sat, 29 Oct 1994 09:43:31 GMT | October 29, 1994 at 09:43:31 GMT | As expected |
Sat 29 Oct 1994 09:43:31 GMT | October 29, 1994 at 09:43:31 GMT | Missing comma |
Sat 29 Oct 1994 9:43:31 GMT | October 29, 1994 at 09:43:31 GMT | Missing leading zero |
Sat Oct 29 9:43:31 1994 | October 29, 1994 at 09:43:31 GMT | Flipped month/day and trailing year |
Sat 29 Oct 1994 9:43 | October 29, 1994 at 09:43:00 GMT | Missing seconds and time zone |
Sat 29 Oct 1994 | October 29, 1994 at 00:00:00 GMT | Missing time |
Sat 29 Oct 94 | October 29, 1994 at 00:00:00 GMT | Two-digit year |
Savvyday 29 Oatmeal 94 | October 29, 1994 at 00:00:00 GMT | Horrible spelling errors |
1 | March 4, 2020 at 15:00:00 GMT | Returns current time |
What’s going on?
The InternetÂTimeÂToÂSystemÂTime
function tries really hard to make sense out of what you give it. This sometimes leads to the absurd, like treating Savvyday as if it were a misspelling of Saturday and Oatmeal as if it were a misspelling of October.
The InternetÂTimeÂToÂSystemÂTime
is not a validator. It’s a best-guess parser. The expectation is that you are giving InternetÂTimeÂToÂSystemÂTime
a string that you got from an HTTP header, and you need to make as much sense out of it as you can, per Postel’s Principle.¹
Back in Windows 7, the feature team tried to make the parser more strict. This effort was a total disaster, because applications in practice were using the function to parse all sorts of things that didn’t even pretend to adhere to the RFC. For example, a photo processing shell extension used this function to parse dates, and the attempt to enforce strict validation caused the shell extension to stop working entirely.
Consequently, all the changes were backed out, and the parser to this day remains as tolerant of malformed dates as it was when it was originally written. The generous parsing is now a required feature.
¹ There are those who believe that Postel’s Principle is wrong.
At first I thought this was going to be about parsing Discordian dates.
You mean in IE8, right? (this is a function in WinInet)
“Missing seconds and time zone” – the timezone went missing one row earlier
Does "Savvyday 29 Oatmeal 94" get parsed as Saturday+October due to those being the nearest day+month according to Levenshtein distance in English, or is there some other language where "Sav" and "Oat" are valid 3-letter abbreviations for that day+month? (Or some other reason?)
I'm guessing it's due to the Levenshtein distance, since RFC 822 specifies only the English day and month names, but I wouldn't be surprised if some servers used localized names for those...
I’m interested to know what the parser does with the name-of-day in the HTTP string since the name-of-day doesn’t appear in the systemtime. Does it ignore words that don’t even come close to the English words for days of the week?
I was also curious. Omitting a day name (real or savvy) seems to produce error_invalid_parameter, and the day name is used to set wDayOfWeek in the systemtime. An invalid day name / day of month combination (eg: Mon, 05 Mar 2020) does not produce an error: the wDayOfWeek is just set to what it interprets from the given string (ie: 1 for Monday).
And from my experience (heh), conversion functions that take a systemtime as...
> There are those who believe that Postel’s Principle is wrong.
There definitely are, for exactly the reason you described in the article: if you tolerate malformed input, people will come to rely on it, and then you'll never be able to fix it without breaking their workflow. (It gets even worse when you have multiple different implementations tolerating malformed input in different ways. Then you get the hideous mess that is cross-browser HTML...
No, it's a choice and Postel's Principles get applied for all the right reasons.
Every system has hidden assumptions, and every widely used system will have user unwittingly depending on them. This is entirely independent of Postel's Principle. It does not depend on how well defined and strict your implementation is.
Given that, there are two effects at play.
The first effect is that you can choose as a system provider to either stay backward compatible forever. Or...
A citation for the second paragraph:
With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.
–Hyrum’s Law
And it almost always happens when software uses human readable strings.
Which is why date handling is so difficult to get right. It's no coincidence that the article's focus is in a misused date parsing function: if you need to parse a date in any kind of human-readable format, is almost impossible to make it fool-proof, and very easy to slip some bug and make it fail in some corner case. I bet most developers think "why spend two weeks developing and testing a parser when...
It’s not only dates, it’s any string that needs parsing – a great source of bugs, ambiguity, and the waste of so many cpu cycles.
> Then you get the hideous mess that is cross-browser HTML compatibility!
Sadly, that is one of the clearest cases of the usefulness of Postel's Principle. Sometimes you have to write software which deals with existing, less-than-ideal implementations of a specification, protocol or API. You can't be strict, because if you were, your program wouldn't work with half the products on the market. And guess who the customer is going to place the blame on?
This is,...
Yes, you’re making my point for me. Modern browsers have to use broken parsing because early browsers used it. If early browser makers had understood that Postel’s Principle was a terrible idea and had implemented strict parsing instead, things would likely be a lot better today, but because they didn’t, we can’t fix it now without breaking the Web.
If browsers had implemented strict parsing from the start the modern web would never have possible. There would be no IMG tags, no JavaScript, no CSS, no AJAX, nothing. The loose parsing allowed new features to be created completely undreamed of by the creators of HTML. Those features could be created without breaking existing clients.
> You can’t be strict
Of course most of the non-strictness of HTML parsing is actually in the spec now (and some of it always was) so a compliant parser is strictly non-strict
The whole streaming parser idea so browsers can paint early is neat, but boy does it add complexity