December 10th, 2021

It’s okay to be contrary, but you need to be consistently contrary: Going against the ambient character set

In Windows, you declare your character set preference implicitly by defining or not defining the symbol UNICODE before including the windows.h header file. (Related: TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE.) This determines whether undecorated function names redirect to the ANSI version or the Unicode version, but it doesn’t make the opposite-version inaccessible. You just have to call them by their explicit names. And it’s important that you be consistent about it. If you miss a spot, the characters get all messed up.

// UNICODE not defined
#include <windows.h>

void UpdateTitle(HWND hwnd, PCWSTR title)
{
    SetWindowTextW(hwnd, title);
}

In the above example, we did not define the symbol UNICODE, so the ambient character set is ANSI. Since we want to call the Unicode version of Set­Window­Text, we must use its explicit Unicode name Set­Window­TextW.

Most of the time, these errors are detected at compile time due to type mismatches. For example, if we forgot to put the trailing W on the function name, we would get the error

error C2664: 'BOOL SetWindowTextA(HWND,const char *)': cannot convert argument 2 from 'const wchar_t *' to 'const char *'
note: Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

And that’s your clue that you forgot to W-ize the Set­Window­Text call. You should have called the W version explicitly: Set­Window­TextW.

However, there’s a category of functions that elude this compile-time detection: The functions that have separate ANSI and Unicode versions but take only character-set-independent parameters. Common examples are Dispatch­Message, Translate­Message, Translate­Accelerator, Create­Accelerator­Table, and most notably, Def­Window­Proc.

For some reason, when I get called in to investigate this sort of problem, it’s usually the Def­Window­Proc that is the source of the problem.

But I don’t think it’s because people get the others right and miss the Def­Window­Proc. I think it’s because the mistakes in the other functions are much less noticeable. The mistakes are still there, and maybe you’ll get a bug report from a user in Japan when they run into it, but that’s not something that is going to be noticed in English-based testing as much as a string that is truncated down to its first letter.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

10 comments

Discussion is closed. Login to edit/delete existing comments.

Newest
Newest
Popular
Oldest
  • Paul Topping

    I prefer to use all explicit names line FunctionNameA or FunctionNameW. It's too bad windows.h doesn't have a macro that forces this by suppressing all the unadorned names like FunctionName. I understand that the unadorned names allow one to switch between Unicode and non-Unicode versions by changing a single macro definition but that's rarely practical anyway. Of course, my new macro (#define NO_UNICODE_NAMES?) wouldn't prevent anyone from doing that if that's what they wanted.

    Read more
  • Neil Rashbrook

    When you invent that time machine, change LPARAM into an LPTSTR. Problem solved. While you’re there, make the header detect C++ and use overloaded functions to automatically select the right A or W function depending on the type of the arguments.

    • Raymond ChenMicrosoft employee Author

      C++ is not an ABI, however. Different compilers decorate differently. The overloads would have to be inline functions, but that creates function identity problems.

      • Me Gusta

        If I remember correctly, changing LPARAM and overloading that wouldn’t really fix the problem anyway because the window expecting ANSI/UNICODE is a property of the HWND. For example, if you register the class using RegisterClass(Ex)A and then use W functions after that, you would still have problems.

  • Harold H

    The ANSI version of a function is called “FunctionNameA”. Why is the Unicode version of a function called “FunctionNameW” and not “FunctionNameU” ?

    • MNGoldenEagle

      The A/W dichotomy existed back in Windows 3.1, with the W functions being failing stubs (for most regions). Given that Windows 3.1 predated Unicode’s existence as a standard, that’s probably part of the reason why. Microsoft knew they wanted “wide character” support, but what charset they were going to use wasn’t defined yet, and wouldn’t be until the release of Windows NT which used UCS-2.

    • Me Gusta · Edited

      Having a bit of a prod around the Windows NT SDK (yes, the SDK for the original version of Windows NT), UCHAR is defined as a typedef for unsigned char. So it may have been a bit iffy to use U in that case.
      But IIRC, this kind of naming came from the fact that characters that were made up of single byte units like ASCII (ISO 646), ISO 2022, ISO 8859 and the like...

      Read more
    • Michael Taylor

      Yes the `W` stands for wide since the character set is known as the `wide character` and hence Windows uses `WCHAR` as the type. Why didn't they use `U` and `UCHAR` then? Couple of possibilities.

      1) While most people think of Unicode as 16-bits it is in fact either 8 or 16-bits depending upon whether you're using UTF-8 or UTF-16. `WCHAR` is for UTF-16, hence wide. A `FuncU` function could potentially be called with either a...

      Read more
      • Solomon Ucko

        Or 32 bits per code unit for UTF-32.

    • Solomon Ucko

      I’m pretty sure it stands for “wide” or “wchar” but I’m not sure why the inconsistency.

Feedback