March 4th, 2020

Meet me here on Savvyday 29 Oatmeal 94

The Internet­Time­To­System­Time function takes an HTTP time/date string and converts it to a windows SYSTEM­TIME structure. A customer noticed that the Internet­Time­To­System­Time returns strange results when given strange data.

Input Result Notes
Sat, 29 Oct 1994 09:43:31 GMT October 29, 1994 at 09:43:31 GMT As expected
Sat 29 Oct 1994 09:43:31 GMT October 29, 1994 at 09:43:31 GMT Missing comma
Sat 29 Oct 1994 9:43:31 GMT October 29, 1994 at 09:43:31 GMT Missing leading zero
Sat Oct 29 9:43:31 1994 October 29, 1994 at 09:43:31 GMT Flipped month/day and trailing year
Sat 29 Oct 1994 9:43 October 29, 1994 at 09:43:00 GMT Missing seconds and time zone
Sat 29 Oct 1994 October 29, 1994 at 00:00:00 GMT Missing time
Sat 29 Oct 94 October 29, 1994 at 00:00:00 GMT Two-digit year
Savvyday 29 Oatmeal 94 October 29, 1994 at 00:00:00 GMT Horrible spelling errors
1 March 4, 2020 at 15:00:00 GMT Returns current time

What’s going on?

The Internet­Time­To­System­Time function tries really hard to make sense out of what you give it. This sometimes leads to the absurd, like treating Savvyday as if it were a misspelling of Saturday and Oatmeal as if it were a misspelling of October.

The Internet­Time­To­System­Time is not a validator. It’s a best-guess parser. The expectation is that you are giving Internet­Time­To­System­Time a string that you got from an HTTP header, and you need to make as much sense out of it as you can, per Postel’s Principle.¹

Back in Windows 7, the feature team tried to make the parser more strict. This effort was a total disaster, because applications in practice were using the function to parse all sorts of things that didn’t even pretend to adhere to the RFC. For example, a photo processing shell extension used this function to parse dates, and the attempt to enforce strict validation caused the shell extension to stop working entirely.

Consequently, all the changes were backed out, and the parser to this day remains as tolerant of malformed dates as it was when it was originally written. The generous parsing is now a required feature.

¹ There are those who believe that Postel’s Principle is wrong.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

16 comments

Discussion is closed. Login to edit/delete existing comments.

  • John Elliott

    At first I thought this was going to be about parsing Discordian dates.

  • Yuhong Bao

    You mean in IE8, right? (this is a function in WinInet)

  • Paul Jackson

    “Missing seconds and time zone” – the timezone went missing one row earlier

  • Adam Rosenfield

    Does "Savvyday 29 Oatmeal 94" get parsed as Saturday+October due to those being the nearest day+month according to Levenshtein distance in English, or is there some other language where "Sav" and "Oat" are valid 3-letter abbreviations for that day+month? (Or some other reason?)

    I'm guessing it's due to the Levenshtein distance, since RFC 822 specifies only the English day and month names, but I wouldn't be surprised if some servers used localized names for those...

    Read more
    • Mike Morrison

      I’m interested to know what the parser does with the name-of-day in the HTTP string since the name-of-day doesn’t appear in the systemtime. Does it ignore words that don’t even come close to the English words for days of the week?

      • Ivan Kljajic

        I was also curious. Omitting a day name (real or savvy) seems to produce error_invalid_parameter, and the day name is used to set wDayOfWeek in the systemtime. An invalid day name / day of month combination (eg: Mon, 05 Mar 2020) does not produce an error: the wDayOfWeek is just set to what it interprets from the given string (ie: 1 for Monday).
        And from my experience (heh), conversion functions that take a systemtime as...

        Read more
  • Mason Wheeler

    > There are those who believe that Postel’s Principle is wrong.

    There definitely are, for exactly the reason you described in the article: if you tolerate malformed input, people will come to rely on it, and then you'll never be able to fix it without breaking their workflow. (It gets even worse when you have multiple different implementations tolerating malformed input in different ways. Then you get the hideous mess that is cross-browser HTML...

    Read more
    • Pierre Baillargeon

      No, it's a choice and Postel's Principles get applied for all the right reasons.

      Every system has hidden assumptions, and every widely used system will have user unwittingly depending on them. This is entirely independent of Postel's Principle. It does not depend on how well defined and strict your implementation is.

      Given that, there are two effects at play.

      The first effect is that you can choose as a system provider to either stay backward compatible forever. Or...

      Read more
      • Jake Boeckerman

        A citation for the second paragraph:

        With a sufficient number of users of an API,
        it does not matter what you promise in the contract:
        all observable behaviors of your system
        will be depended on by somebody.
        Hyrum’s Law

    • David Lowndes

      And it almost always happens when software uses human readable strings.

      • Antonio Rodríguez

        Which is why date handling is so difficult to get right. It's no coincidence that the article's focus is in a misused date parsing function: if you need to parse a date in any kind of human-readable format, is almost impossible to make it fool-proof, and very easy to slip some bug and make it fail in some corner case. I bet most developers think "why spend two weeks developing and testing a parser when...

        Read more
      • David Lowndes

        It’s not only dates, it’s any string that needs parsing – a great source of bugs, ambiguity, and the waste of so many cpu cycles.

    • Antonio Rodríguez

      > Then you get the hideous mess that is cross-browser HTML compatibility!

      Sadly, that is one of the clearest cases of the usefulness of Postel's Principle. Sometimes you have to write software which deals with existing, less-than-ideal implementations of a specification, protocol or API. You can't be strict, because if you were, your program wouldn't work with half the products on the market. And guess who the customer is going to place the blame on?

      This is,...

      Read more
      • Mason Wheeler

        Yes, you’re making my point for me. Modern browsers have to use broken parsing because early browsers used it. If early browser makers had understood that Postel’s Principle was a terrible idea and had implemented strict parsing instead, things would likely be a lot better today, but because they didn’t, we can’t fix it now without breaking the Web.

      • Wayne Venables

        If browsers had implemented strict parsing from the start the modern web would never have possible. There would be no IMG tags, no JavaScript, no CSS, no AJAX, nothing. The loose parsing allowed new features to be created completely undreamed of by the creators of HTML. Those features could be created without breaking existing clients.

      • Zander

        > You can’t be strict

        Of course most of the non-strictness of HTML parsing is actually in the spec now (and some of it always was) so a compliant parser is strictly non-strict

        The whole streaming parser idea so browsers can paint early is neat, but boy does it add complexity