February 6th, 2025

The default C locale is not a very interesting one

Although the C and C++ languages provide facilities for localization, the default locale is the so-called “C” locale, which barely understands anything.

In the “C” locale, the uppercase characters are “A” through “Z“; the lowercase characters are “a” through “z“, the decimal separator is “.“, and there is no thousands separator.

The “C” locale is designed to be minimal. But it also means that unless you’ve taken special efforts to change your process’s locale to something else, functions like towupper and _wcslwr produce only extremely rudimentary results. All they know is the characters in the 7-bit ASCII set. They don’t even know that the uppercase version of ä is Ä.

Support for any locales beyond the “C” locale is implementation-defined, and the standard considers it a quality of implementation issue. Microsoft’s Visual C++ compiler uses BCP47 for locale names, like sr-Cyrl-BA for “Serbian, Cyrillic script, as used in Bosnia and Herzegovina.” The gcc library appears to use a custom format, such as de_AT.iso885915@euro for “German, as used in Austria, using the ISO-8859-15 character set and the Euro as the currency.”

This means that if you just dive in and call towlower without doing any locale preparation, all you’re going to get support for is characters U+0041 (LATIN CAPITAL LETTER A) through U+005A (LATIN CAPITAL LETTER Z) mapping to U+0061 (LATIN SMALL LETTER A) through U+007A (LATIN SMALL LETTER Z).

The Microsoft Visual C++ compiler standard library comes with bonus functions like _strlwr and _wcslwr for converting strings to lowercase. By default, these follow the current C runtime locale, so again, if you don’t do any locale preparation, you’re going to get the naïve case mapping.

wchar_t example[] = L"\x00C0" L"BC"; // ÀBC
_wcslwr_s(example); // Result: Àbc

Next time, we’ll look at how to get _wcslwr to operate on more interesting locales than the C locale.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

2 comments

  • Dmitry

    While quite understandable for C in 1970, I’ve always wondered who are those guys trying to economize on vowels today. Like if they get punished for every single letter omitting which doesn’t prevent one from guessing right the full words.

  • Paul Jackson

    In a recent project, I needed a function like wcrtomb_l, but MSVC only has wcrtomb, which forced me to use setlocale with it, which is more ugly and error prone