Some time ago, I discussed how the Resource Compiler defaults to CP_ACP, even in the face of subtle hints that the file is UTF-8.
After yet another incident of Visual Studio secretly changing the file encoding from 1252 to UTF-8 and breaking all non-ASCII strings, combined with Azure DevOps and Visual Studio simply ignoring encoding changes when showing diffs, a colleague decided to solve the problem once and for all by using explicit Unicode escapes \x#### to represent non-ASCII characters. That way, it doesn’t matter whether the file encoding is 1252 or UTF-8 because the two code pages agree on the common ASCII subset.
What used to be
IDS_AWESOME "That’s great!"
was changed to
IDS_AWESOME "That\x2019s great!"
Unfortunately, the resulting string that appeared on screen was
That 19s great!
What went wrong?
If you are encoding Unicode into your string, you have to put an L prefix on the quoted string. Otherwise, the \xABCD sequence is interpreted as an 8-bit \xAB escape sequence, followed by two literal characters CD. In this case, the \x2019 was interpreted as \x20 (which encodes a space) followed by the literal characters 19, resulting in the string That␣19s great!.
The correct conversion includes the L prefix.
IDS_AWESOME L"That\x2019s great!"
This is how Java ended up with a "native2ascii" command. The actual localisation framework required properties files to be in 8859-1 — which is stupid for a localisation framework, since people actually wanted their localisations (especially non-roman ones like CJK) to be readable/maintainable in source code — so they gave us a command line tool for converting properties files in arbitrary encodings to 8859-1, encoding everything with \uxxxx sequences that the Java runtime could read. Meanwhile, various frameworks added their own replacements for the built-in APIs, ones which supported UTF-8, etc.
Fortunately, they eventually agreed that this was stupid, and changed...