In Windows, you declare your character set preference implicitly by defining or not defining the symbol UNICODE
before including the windows.h
header file. (Related: TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE.) This determines whether undecorated function names redirect to the ANSI version or the Unicode version, but it doesn’t make the opposite-version inaccessible. You just have to call them by their explicit names. And it’s important that you be consistent about it. If you miss a spot, the characters get all messed up.
// UNICODE not defined
#include <windows.h>
void UpdateTitle(HWND hwnd, PCWSTR title)
{
SetWindowTextW(hwnd, title);
}
In the above example, we did not define the symbol UNICODE
, so the ambient character set is ANSI. Since we want to call the Unicode version of SetÂWindowÂText
, we must use its explicit Unicode name SetÂWindowÂTextW
.
Most of the time, these errors are detected at compile time due to type mismatches. For example, if we forgot to put the trailing W
on the function name, we would get the error
error C2664: 'BOOL SetWindowTextA(HWND,const char *)': cannot convert argument 2 from 'const wchar_t *' to 'const char *' note: Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
And that’s your clue that you forgot to W-ize the SetÂWindowÂText
call. You should have called the W version explicitly: SetÂWindowÂTextW
.
However, there’s a category of functions that elude this compile-time detection: The functions that have separate ANSI and Unicode versions but take only character-set-independent parameters. Common examples are DispatchÂMessage
, TranslateÂMessage
, TranslateÂAccelerator
, CreateÂAcceleratorÂTable
, and most notably, DefÂWindowÂProc
.
For some reason, when I get called in to investigate this sort of problem, it’s usually the DefÂWindowÂProc
that is the source of the problem.
But I don’t think it’s because people get the others right and miss the DefÂWindowÂProc
. I think it’s because the mistakes in the other functions are much less noticeable. The mistakes are still there, and maybe you’ll get a bug report from a user in Japan when they run into it, but that’s not something that is going to be noticed in English-based testing as much as a string that is truncated down to its first letter.
I prefer to use all explicit names line FunctionNameA or FunctionNameW. It's too bad windows.h doesn't have a macro that forces this by suppressing all the unadorned names like FunctionName. I understand that the unadorned names allow one to switch between Unicode and non-Unicode versions by changing a single macro definition but that's rarely practical anyway. Of course, my new macro (#define NO_UNICODE_NAMES?) wouldn't prevent anyone from doing that if that's what they wanted.
When you invent that time machine, change LPARAM into an LPTSTR. Problem solved. While you’re there, make the header detect C++ and use overloaded functions to automatically select the right A or W function depending on the type of the arguments.
C++ is not an ABI, however. Different compilers decorate differently. The overloads would have to be inline functions, but that creates function identity problems.
If I remember correctly, changing LPARAM and overloading that wouldn’t really fix the problem anyway because the window expecting ANSI/UNICODE is a property of the HWND. For example, if you register the class using RegisterClass(Ex)A and then use W functions after that, you would still have problems.
The ANSI version of a function is called “FunctionNameA”. Why is the Unicode version of a function called “FunctionNameW” and not “FunctionNameU” ?
The A/W dichotomy existed back in Windows 3.1, with the W functions being failing stubs (for most regions). Given that Windows 3.1 predated Unicode’s existence as a standard, that’s probably part of the reason why. Microsoft knew they wanted “wide character” support, but what charset they were going to use wasn’t defined yet, and wouldn’t be until the release of Windows NT which used UCS-2.
Having a bit of a prod around the Windows NT SDK (yes, the SDK for the original version of Windows NT), UCHAR is defined as a typedef for unsigned char. So it may have been a bit iffy to use U in that case.
But IIRC, this kind of naming came from the fact that characters that were made up of single byte units like ASCII (ISO 646), ISO 2022, ISO 8859 and the like...
Yes the `W` stands for wide since the character set is known as the `wide character` and hence Windows uses `WCHAR` as the type. Why didn't they use `U` and `UCHAR` then? Couple of possibilities.
1) While most people think of Unicode as 16-bits it is in fact either 8 or 16-bits depending upon whether you're using UTF-8 or UTF-16. `WCHAR` is for UTF-16, hence wide. A `FuncU` function could potentially be called with either a...
Or 32 bits per code unit for UTF-32.
I’m pretty sure it stands for “wide” or “wchar” but I’m not sure why the inconsistency.