June 6th, 2024

Can INI files be Unicode? Yes, they can, but it has to be your idea

INI files were introduced by 16-bit Windows, and 16-bit Windows predates Unicode, so INI files naturally did not support Unicode at the time they were introduced. The relatively simple format of INI files means that many people parse (and sometimes even modify) them directly, without using the INI file manager. This in turn means that the format of INI files is pretty much locked and cannot be extended, since there is no mechanism for extending them in a way that won’t break those manual INI file parsers.

This “locked in” nature of the INI file format means that even if you call the Unicode version, Write­Private­Profile­StringW, the resulting INI file will not be Unicode. It will be a best approximation of your Unicode data in the ambient ANSI code page. The system doesn’t know whether the INI file is going to be processed by somebody’s homemade INI file parser, and writing it out in Unicode would break them.

You might think, “Aw, c’mon. If you use the Unicode Write­Private­Profile­StringW function, then clearly the resulting INI file can be Unicode. After all, this is a new function, so there’s no need to preserve legacy behavior.” However, Michael Kaplan noted that this would mean that converting a program from ANSI to Unicode (which was a frequent occurrence back in the day) would invisibly modify file formats, and your program may not be ready for that.

But that doesn’t mean that INI files could never support Unicode.

Because if the INI file was already Unicode, then there would be no harm in keeping it in Unicode. The decision to create a Unicode INI file came from somewhere else, and we’re just following somebody else’s decision.

So the rule is that the INI file functions will preserve Unicode-ness, but will never take it upon themselves to create a Unicode INI file. In particular, if you use a Write function to create an INI file, that INI file will be created as ANSI, for backward compatibility.

This behavior is called out in the documentation:

If the file was created using Unicode characters, the function writes Unicode characters to the file. Otherwise, the function writes ANSI characters.

Michael says, “I have almost no idea what this text is trying to say, but I am 100% sure it is wrong.”

What it’s trying to say is what Michael inferred: The function writes Unicode characters to the file if the file is already Unicode.

I think what confused Michael was the phrase “If the file was created”. This is not referring to the creation of the file by the Write­Private­Profile­StringW function itself, but rather to whether the file had already been created as a Unicode file before Write­Private­Profile­StringW was called.

Arguably, the text could be made a little clearer:

If the file already exists and consists of Unicode characters, the function writes Unicode characters to the file. Otherwise, the function writes ANSI characters.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

8 comments

Discussion is closed. Login to edit/delete existing comments.

Newest
Newest
Popular
Oldest
  • Simon Geard

    Java had a similar problem with their "properties" format, which despite being used for localisation since the beginning, was defined as 7-bit ASCII... requiring either a build step to transform sensibly-encoded files into a mess of escape characters, or writing your own frontend to deal with IO and encodings, bypassing the actual ResourceBundle classes.

    Fortunately in later Java versions, they've taken the intelligent step of declaring that the encoding is actually UTF8... retaining compatibility, while bringing...

    Read more
  • Paul Jackson

    How does the API determine that the file “consists of Unicode characters”? I assume the answer is BOM, but you didn’t mention it.

    • Bill Godfrey · Edited

      (If I may speculate...)

      An INI file will always have to start with an ASCII character, either ';' marking a comment or '[' introducing a section. (I'd be interested to learn if that assertion I made is actually correct.)

      Read the first two bytes and look for NUL bytes. If neither are NUL then its ASCII-only. If there's one, it's UTF-16 and you know the byte order.

      (Edited. The original stated that '#' marked a comment.)

      Read more
      • Paul Jackson

        I see that Raymond answered it in the linked page.

        > the code determines this is through our favorite dodgy API – IsTextUnicode. (The BOM serves as a big hint.)

  • Marco Comerci

    What do you think about UTF8? I developed a programmable VST plugin and supported UTF8 for configuration and instrument files, that can store some strings (the language tokens are ANSI characters only). I developed UTF8 helpers like SendMessageU8, that, unsurprisingly, converts to UTF16 and calls SendMessageW.

  • Rutger Ellen

    Rip Michael Kaplan, I read most of his blogs in the days, nice to see him being remembered

    • Ian Boyd

      Rip. His blog was, like this one is, an excellent source of knowledge.

      Once you understand the why, the what makes sense. And Michael was great for that. I read it in real time, and reference it all the time still.

  • IS4

    So if I create an “empty” file consisting of just the UTF-8 BOM, is that enough to make all INI operations on the file use Unicode and preserve the BOM?

Feedback