April 30th, 2025

Why does Windows have trouble finding my Win32 resource if it contains an accented character?

Maurice Kayser reported an issue with Win32 API loading of PE resources containing lowercase letters. Maurice did some experiments adding resources named MyIcon, MyIcÖn, and MyIcön, then trying to load them using various names, and built up a table of results. I’ve broken it up into three tables depending on the nature of the accented character.

Arg Can load MyIcon Can load MyIcÖn Can load MyIcön
myicon Yes No No
MyIcon Yes No No
mYiCoN Yes No No
MYICON Yes No No

This table shouldn’t be surprising. The argument passed to LoadResource is compared case-insensitively with the name of the resource, treating accented characters as different from their unaccented versions.

Here’s the next batch.

Arg Can load MyIcon Can load MyIcÖn Can load MyIcön
myicÖn No Yes No (!)
MyIcÖn No Yes No (!)
mYiCÖN No Yes No (!)
MYICÖN No Yes No (!)

The first column is consistent with our previous result, namely that unaccented characters are treated as not the same as accented characters.

The second column is not surprising either, since the strings do match according to a case-insensitive comparison.

The third column is surprising. It seems that accented characters are case-sensitive, even though the documentation says that the comparison is case-insensitive.

Okay, here’s the third block.

Arg Can load MyIcon Can load MyIcÖn Can load MyIcön
myicön No Yes (?) No (!)
MyIcön No Yes (?) No (!)
mYiCöN No Yes (?) No (!)
MYICöN No Yes (?) No (!)

The PE specification says that the resources are sorted “in ascending order”, and the names are sorted “by case-sensitive string.”¹

That’s all it says. The rest is left to interpretation.

First of all, even though the file format specification says that the resource names can be in any case, the FindResource function converts all names to uppercase before searching, so any names with lowercase characters are effectively unfindable. Fortunately, the Resource Compiler also converts names to uppercase before storing them in the resources, so it all cancels out, right?

Well, it cancels out only if the Resource Compiler and the FindResource function agree on how the names are converted to uppercase.

The Resource Compiler uses _wcsupr to convert the names to uppercase, and _wcsupr uses the default C locale,² which as we noted before, is not a very interesting locale. It converts Latin unaccented lowercase letters a-z to Latin unaccented uppercase letters A-Z, and that’s all.

Let’s update the top row of the table by converting the names to uppercase according to the C locale.

Arg Can load MYICON Can load MYICÖN Can load MYICöN

How does the FindResource function convert strings to uppercase? It uses the uppercase table corresponding to the system default language. It is almost certain that Ö and ö are uppercase and lowercase partners in the system default language. That means that the left columns are all effectively MYICON in the first table, and that they are all effectively MYICÖN in the second and third tables.

With these adjustments, the tables make more sense.

Arg Loaded
as
Can load MyIcon Can load MyIcÖn Can load MyIcön
Stored as MYICON Stored as MYICÖN Stored as load MYICöN
myicon MYICON Yes No No
MyIcon
mYiCoN
MYICON
myicÖn MYICÖN No Yes No
MyIcÖn
mYiCÖN
MYICÖN
myicön
MyIcön
mYiCöN
MYICöN

Okay, so after we have accounted for how the Resource Compiler stores names and how FindResource searches for names, the table looks less bonkers.

The moral of the story, I think, is that you should just stick to ASCII characters for resource names. Everybody agrees on that subset.

¹ Note that the specification is incomplete: It doesn’t say what collation to use for sorting. Does it use a locale-sensitive sort, so that Ö comes before P in German, but after P in Swedish?³ Does it use a case-sensitive sort where all punctuation come before all alphabetics? The FindResource function assumes that the resources are sorted lexicographically by code unit (not code point) numerical value. Which is a good thing, because you don’t want a file compiled on a German system to be considered corrupted by a Swedish system.

² But what about the #pragme code_page() directive? That directive tells the Resource Compiler how to convert quoted strings to Unicode, but it does not affect character mapping or collation.

³ In German dictionary sorting, the letter Ö is sorted as if it had no accent mark. But in German phone book sorting, the letter Ö is sorted as if it were two characters O + e. And in Austrian phone book sorting, the letter Ö is sorted as if it were two characters O + ¨, where the ¨ is treated as a character that comes after Z. And in Swedish, the letter Ö is treated as one of the three accented characters that come after Z.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

6 comments

Discussion is closed. Login to edit/delete existing comments.

  • Jan RingoÅ¡

    Could this be, IDK, fixed?
    Have FindResource search for the C locale uppercased string first, and after that fall back to the current behavior?
    You (Microsoft) have the source codes after all. I don’t think it would break compatibility.

    Or even better: Give RC new switch, perhaps /modern, that’d store the resource names unchanged (perhaps alongside with uppercased copy for backward compatibility), and have FindResource try to find the exact string first.

    Ah, I miss the era of 2k/XP/7 when these things actually evolved.

    • Kevin Norris

      If you want to fix it, fix it properly. Unicode specifies multiple algorithms for caseless string matching in section 3.13.5 - for something like this, you probably want an "identifier caseless match," but any of them would entirely solve the problem Raymond describes (albeit, in some cases, the non-default case folds might have backwards compatibility issues due to the use of Unicode normalization forms). None of these algorithms are locale-dependent, although the standard does vaguely gesture at Turkish dotted and dotless i as a potential source of issues.

      ICU, as you might expect, implements the default case fold operation (and so...

      Read more
      • Jan RingoÅ¡

        I get where you’re coming from, but that’d be way too dangerous change in terms of backwards compatibility. And the desire is to simply load the correct resource, even if there are several whose names differ only in case.

  • Maurice Kayser

    Thank you so much for taking the time to look into and write about it!

  • Dmitry · Edited

    @Joshua Hudson At least it’s a good way for zoomers trying to follow your advice to learn why early Basic programmers used line numbers like 10, 20, 30, etc., right? 🙂
    P.S. Commenting to the right branch still doesn’t work right in mobile Chrome.

  • Joshua Hudson

    Lesson of the day: use numeric resource names and macros to get your identifiers. Everybody will thank you.