It’s okay to be contrary, but you need to be consistently contrary: Going against the ambient character set

Raymond Chen

In Windows, you declare your character set preference implicitly by defining or not defining the symbol UNICODE before including the windows.h header file. (Related: TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE.) This determines whether undecorated function names redirect to the ANSI version or the Unicode version, but it doesn’t make the opposite-version inaccessible. You just have to call them by their explicit names. And it’s important that you be consistent about it. If you miss a spot, the characters get all messed up.

// UNICODE not defined
#include <windows.h>

void UpdateTitle(HWND hwnd, PCWSTR title)
{
    SetWindowTextW(hwnd, title);
}

In the above example, we did not define the symbol UNICODE, so the ambient character set is ANSI. Since we want to call the Unicode version of SetWindowText, we must use its explicit Unicode name SetWindowTextW.

Most of the time, these errors are detected at compile time due to type mismatches. For example, if we forgot to put the trailing W on the function name, we would get the error

error C2664: 'BOOL SetWindowTextA(HWND,const char *)': cannot convert argument 2 from 'const wchar_t *' to 'const char *'
note: Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

And that’s your clue that you forgot to W-ize the SetWindowText call. You should have called the W version explicitly: SetWindowTextW.

However, there’s a category of functions that elude this compile-time detection: The functions that have separate ANSI and Unicode versions but take only character-set-independent parameters. Common examples are DispatchMessage, TranslateMessage, TranslateAccelerator, CreateAcceleratorTable, and most notably, DefWindowProc.

For some reason, when I get called in to investigate this sort of problem, it’s usually the DefWindowProc that is the source of the problem.

But I don’t think it’s because people get the others right and miss the DefWindowProc. I think it’s because the mistakes in the other functions are much less noticeable. The mistakes are still there, and maybe you’ll get a bug report from a user in Japan when they run into it, but that’s not something that is going to be noticed in English-based testing as much as a string that is truncated down to its first letter.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

10 comments

Discussion is closed. Login to edit/delete existing comments.

Paul Topping December 13, 2021

I prefer to use all explicit names line FunctionNameA or FunctionNameW. It’s too bad windows.h doesn’t have a macro that forces this by suppressing all the unadorned names like FunctionName. I understand that the unadorned names allow one to switch between Unicode and non-Unicode versions by changing a single macro definition but that’s rarely practical anyway. Of course, my new macro (#define NO_UNICODE_NAMES?) wouldn’t prevent anyone from doing that if that’s what they wanted.
Neil Rashbrook December 11, 2021

When you invent that time machine, change LPARAM into an LPTSTR. Problem solved. While you’re there, make the header detect C++ and use overloaded functions to automatically select the right A or W function depending on the type of the arguments.
- Raymond Chen Author December 13, 2021
  
  C++ is not an ABI, however. Different compilers decorate differently. The overloads would have to be inline functions, but that creates function identity problems.
  - Me Gusta December 13, 2021
    
    If I remember correctly, changing LPARAM and overloading that wouldn’t really fix the problem anyway because the window expecting ANSI/UNICODE is a property of the HWND. For example, if you register the class using RegisterClass(Ex)A and then use W functions after that, you would still have problems.
Harold H December 10, 2021

The ANSI version of a function is called “FunctionNameA”. Why is the Unicode version of a function called “FunctionNameW” and not “FunctionNameU” ?
- MNGoldenEagle December 14, 2021
  
  The A/W dichotomy existed back in Windows 3.1, with the W functions being failing stubs (for most regions). Given that Windows 3.1 predated Unicode’s existence as a standard, that’s probably part of the reason why. Microsoft knew they wanted “wide character” support, but what charset they were going to use wasn’t defined yet, and wouldn’t be until the release of Windows NT which used UCS-2.
- Me Gusta December 11, 2021 · Edited
  
  Having a bit of a prod around the Windows NT SDK (yes, the SDK for the original version of Windows NT), UCHAR is defined as a typedef for unsigned char. So it may have been a bit iffy to use U in that case.
  But IIRC, this kind of naming came from the fact that characters that were made up of single byte units like ASCII (ISO 646), ISO 2022, ISO 8859 and the like were (maybe retroactively) seen as narrow characters, and characters that are made up of two or more bytes are seen as wide characters.
  Now, while...
  Read more
  Having a bit of a prod around the Windows NT SDK (yes, the SDK for the original version of Windows NT), UCHAR is defined as a typedef for unsigned char. So it may have been a bit iffy to use U in that case.
  But IIRC, this kind of naming came from the fact that characters that were made up of single byte units like ASCII (ISO 646), ISO 2022, ISO 8859 and the like were (maybe retroactively) seen as narrow characters, and characters that are made up of two or more bytes are seen as wide characters.
  Now, while it was officially added to the C standard library in 1995, the wchar_t type was being informally used for quite a while. Again that Windows NT SDK from 1992 had the wchar.h and wctype.h headers. Since this is actually getting into the region where I am too young to remember anything more than this, I would guess that wchar_t was informally added prior to NT and it was just repurposed for UCS-2 since it was convenient.
  But I would have to do a bit more digging to see, and that would finding and downloading the Microsoft C compiler from the late 80s to see how things were.
  
  Read less
- Michael Taylor December 10, 2021
  
  Yes the `W` stands for wide since the character set is known as the `wide character` and hence Windows uses `WCHAR` as the type. Why didn’t they use `U` and `UCHAR` then? Couple of possibilities.
  
  1) While most people think of Unicode as 16-bits it is in fact either 8 or 16-bits depending upon whether you’re using UTF-8 or UTF-16. `WCHAR` is for UTF-16, hence wide. A `FuncU` function could potentially be called with either a UTF-8 or UTF-16 string if it meant “Unicode string”.
  2) `unsigned char` is a valid type in C/C++. `UCHAR` might be misinterpreted as unsigned char.
  - Solomon Ucko December 14, 2021
    
    Or 32 bits per code unit for UTF-32.
- Solomon Ucko December 10, 2021
  
  I’m pretty sure it stands for “wide” or “wchar” but I’m not sure why the inconsistency.

It’s okay to be contrary, but you need to be consistently contrary: Going against the ambient character set

Author

10 comments

Read next

What does “Do not launch, but debug my code when it launches” mean?

Why does the precise point at which I get a stack overflow exception change from run to run?

Author

10 comments

Read next

What does “Do not launch, but debug my code when it launches” mean?

Why does the precise point at which I get a stack overflow exception change from run to run?

Stay informed