Last time, we saw that the default C locale is not a very interesting one. So how do you get a locale that does something better?
One way to get functions like _strlwr
and _wcslwr
to follow a specific locale is to set that other locale as the current C runtime locale.
// Set the C runtime locale for character // classification (which includes case mapping) // to the user's default locale _wsetlocale(LC_CTYPE, L""); // Now you can convert to lowercase in a locale-aware manner wchar_t example[] = L"\x00C0" L"BC"; // ÀBC _wcslwr_s(example); // Result: probably à bc
It is convenient that an empty string is interpreted by _wsetlocale()
to mean “the user’s default locale”, as determined by GetÂUserÂDefaultÂLocaleÂName
.¹
A major problem with this approach is that it is using global state to solve a local problem. The C runtime locale is a process-wide setting, so you changed the locale not just for your call to _wcslwr_s
, but for everybody else’s call to _wcslwr_s
as well.
Better would be to leave the global locale alone and just say “For this call to _wcslwr
, use the user’s default locale.”
// Create a locale that represents the user's default locale auto l = _wcreate_locale(LC_CTYPE, L""); // Convert to lowercase according to that locale wchar_t example[] = L"\x00C0" L"BC"; // ÀBC _wcslwr_s_l(example, l); // Result: probably à bc
Even if you go all this trouble, you are still failing to handle the case where changing the case of a string changes its length. For that, you have to go to LCMapStringEx
or the corresponding ICU function u_strToLower
or u_strToUpper
.
wchar_t example[] = L"\x00C0" L"BC"; // ÀBC // Error checking elided for expository purposes wchar_t lowercase[256]; LCMapStringEx(LOCALE_NAME_USER_DEFAULT, LCMAP_LOWERCASE, example, ARRAYSIZE(example), lowercase, ARRAYSIZE(lowercase), nullptr, 0); // Result: probably à bc
Here’s a dirty little secret: When you call _wcslwr
and the locale is not the C locale, then the Visual C++ runtime just calls LCMapStringEx
. So you’re doing the same thing at the end of the day, just with the ability to accommodate strings that change length during a change of case.
Bonus chatter: Not all implementations of wcslwr
or towlower
are high quality.
¹ The user default locale may not be the best locale for your thread because the caller may have called a function like SetÂThreadÂLocale
or SetÂThreadÂPreferredÂUILanguages
to change the thread’s preferred locale to something other than the user’s default. You need to call a function like GetÂThreadÂPreferredÂUILanguages
to see those thread custom locales and pick the one (probably the first one) to use for case mapping.
I don’t like C locales. I prefer Windows API localization functions. That way, if someone runs my program in 20 years in some new modern environment, it has better chance working correctly than anything the contemporary compiler could compile in today.