Why does Windows have trouble finding my Win32 resource if it contains an accented character?

Maurice Kayser reported an issue with Win32 API loading of PE resources containing lowercase letters. Maurice did some experiments adding resources named MyIcon, MyIcÖn, and MyIcön, then trying to load them using various names, and built up a table of results. I’ve broken it up into three tables depending on the nature of the accented character.

Arg	Can load `MyIcon`	Can load `MyIcÖn`	Can load `MyIcön`
`myicon`	Yes	No	No
`MyIcon`	Yes	No	No
`mYiCoN`	Yes	No	No
`MYICON`	Yes	No	No

This table shouldn’t be surprising. The argument passed to LoadResource is compared case-insensitively with the name of the resource, treating accented characters as different from their unaccented versions.

Here’s the next batch.

Arg	Can load `MyIcon`	Can load `MyIcÖn`	Can load `MyIcön`
`myicÖn`	No	Yes	No (!)
`MyIcÖn`	No	Yes	No (!)
`mYiCÖN`	No	Yes	No (!)
`MYICÖN`	No	Yes	No (!)

The first column is consistent with our previous result, namely that unaccented characters are treated as not the same as accented characters.

The second column is not surprising either, since the strings do match according to a case-insensitive comparison.

The third column is surprising. It seems that accented characters are case-sensitive, even though the documentation says that the comparison is case-insensitive.

Okay, here’s the third block.

Arg	Can load `MyIcon`	Can load `MyIcÖn`	Can load `MyIcön`
`myicön`	No	Yes (?)	No (!)
`MyIcön`	No	Yes (?)	No (!)
`mYiCöN`	No	Yes (?)	No (!)
`MYICöN`	No	Yes (?)	No (!)

The PE specification says that the resources are sorted “in ascending order”, and the names are sorted “by case-sensitive string.”¹

That’s all it says. The rest is left to interpretation.

First of all, even though the file format specification says that the resource names can be in any case, the FindResource function converts all names to uppercase before searching, so any names with lowercase characters are effectively unfindable. Fortunately, the Resource Compiler also converts names to uppercase before storing them in the resources, so it all cancels out, right?

Well, it cancels out only if the Resource Compiler and the FindResource function agree on how the names are converted to uppercase.

The Resource Compiler uses _wcsupr to convert the names to uppercase, and _wcsupr uses the default C locale,² which as we noted before, is not a very interesting locale. It converts Latin unaccented lowercase letters a-z to Latin unaccented uppercase letters A-Z, and that’s all.

Let’s update the top row of the table by converting the names to uppercase according to the C locale.

Arg	Can load `MYICON`	Can load `MYICÖN`	Can load `MYICöN`

How does the FindResource function convert strings to uppercase? It uses the uppercase table corresponding to the system default language. It is almost certain that Ö and ö are uppercase and lowercase partners in the system default language. That means that the left columns are all effectively MYICON in the first table, and that they are all effectively MYICÖN in the second and third tables.

With these adjustments, the tables make more sense.

Arg	Loaded as	Can load `MyIcon`	Can load `MyIcÖn`	Can load `MyIcön`
Arg	Loaded as	Stored as `MYICON`	Stored as `MYICÖN`	Stored as load `MYICöN`
`myicon`	`MYICON`	Yes	No	No
`MyIcon`
`mYiCoN`
`MYICON`
`myicÖn`	`MYICÖN`	No	Yes	No
`MyIcÖn`
`mYiCÖN`
`MYICÖN`
`myicön`
`MyIcön`
`mYiCöN`
`MYICöN`

Okay, so after we have accounted for how the Resource Compiler stores names and how FindResource searches for names, the table looks less bonkers.

The moral of the story, I think, is that you should just stick to ASCII characters for resource names. Everybody agrees on that subset.

¹ Note that the specification is incomplete: It doesn’t say what collation to use for sorting. Does it use a locale-sensitive sort, so that Ö comes before P in German, but after P in Swedish?³ Does it use a case-sensitive sort where all punctuation come before all alphabetics? The FindResource function assumes that the resources are sorted lexicographically by code unit (not code point) numerical value. Which is a good thing, because you don’t want a file compiled on a German system to be considered corrupted by a Swedish system.

² But what about the #pragme code_page() directive? That directive tells the Resource Compiler how to convert quoted strings to Unicode, but it does not affect character mapping or collation.

³ In German dictionary sorting, the letter Ö is sorted as if it had no accent mark. But in German phone book sorting, the letter Ö is sorted as if it were two characters O + e. And in Austrian phone book sorting, the letter Ö is sorted as if it were two characters O + ¨, where the ¨ is treated as a character that comes after Z. And in Swedish, the letter Ö is treated as one of the three accented characters that come after Z.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Jan Ringoš May 1, 2025

Could this be, IDK, fixed?
Have FindResource search for the C locale uppercased string first, and after that fall back to the current behavior?
You (Microsoft) have the source codes after all. I don’t think it would break compatibility.

Or even better: Give RC new switch, perhaps /modern, that’d store the resource names unchanged (perhaps alongside with uppercased copy for backward compatibility), and have FindResource try to find the exact string first.

Ah, I miss the era of 2k/XP/7 when these things actually evolved.

Kevin Norris May 1, 2025

If you want to fix it, fix it properly. Unicode specifies multiple algorithms for caseless string matching in section 3.13.5 - for something like this, you probably want an "identifier caseless match," but any of them would entirely solve the problem Raymond describes (albeit, in some cases, the non-default case folds might have backwards compatibility issues due to the use of Unicode normalization forms). None of these algorithms are locale-dependent, although the standard does vaguely gesture at Turkish dotted and dotless i as a potential source of issues.

ICU, as you might expect, implements the default case fold operation (and so...
Read more
If you want to fix it, fix it properly. Unicode specifies multiple algorithms for caseless string matching in section 3.13.5 – for something like this, you probably want an “identifier caseless match,” but any of them would entirely solve the problem Raymond describes (albeit, in some cases, the non-default case folds might have backwards compatibility issues due to the use of Unicode normalization forms). None of these algorithms are locale-dependent, although the standard does vaguely gesture at Turkish dotted and dotless i as a potential source of issues.

ICU, as you might expect, implements the default case fold operation (and so should any other self-respecting Unicode string library). The other case folds are defined in terms of combining the default case fold with various other standard transforms, and it does not appear that ICU has bothered to provide convenience wrappers for those. Either way, Unicode case folding is not hard – it is a matter of either linking against a good Unicode library, or using a language that already comes with reasonable Unicode support out of the box.

Read less
- Brian Boorman May 2, 2025
  
  Interestingly, the topic of Unicode case folding came up in LKML this past week. Let’s say that Linus has views on the topic. Re: [GIT PULL] bcachefs fixes for 6.15-rc4
- Jan Ringoš May 2, 2025
  
  I get where you’re coming from, but that’d be way too dangerous change in terms of backwards compatibility. And the desire is to simply load the correct resource, even if there are several whose names differ only in case.

Maurice Kayser May 1, 2025

Thank you so much for taking the time to look into and write about it!

Dmitry April 30, 2025 · Edited

@Joshua Hudson At least it’s a good way for zoomers trying to follow your advice to learn why early Basic programmers used line numbers like 10, 20, 30, etc., right? 🙂
P.S. Commenting to the right branch still doesn’t work right in mobile Chrome.

Joshua Hudson April 30, 2025

Lesson of the day: use numeric resource names and macros to get your identifiers. Everybody will thank you.

Discussion is closed. Login to edit/delete existing comments.

Jan Ringoš May 1, 2025

Could this be, IDK, fixed?
Have FindResource search for the C locale uppercased string first, and after that fall back to the current behavior?
You (Microsoft) have the source codes after all. I don’t think it would break compatibility.

Or even better: Give RC new switch, perhaps /modern, that’d store the resource names unchanged (perhaps alongside with uppercased copy for backward compatibility), and have FindResource try to find the exact string first.

Ah, I miss the era of 2k/XP/7 when these things actually evolved.
- Kevin Norris May 1, 2025
  
  If you want to fix it, fix it properly. Unicode specifies multiple algorithms for caseless string matching in section 3.13.5 - for something like this, you probably want an "identifier caseless match," but any of them would entirely solve the problem Raymond describes (albeit, in some cases, the non-default case folds might have backwards compatibility issues due to the use of Unicode normalization forms). None of these algorithms are locale-dependent, although the standard does vaguely gesture at Turkish dotted and dotless i as a potential source of issues.
  
  ICU, as you might expect, implements the default case fold operation (and so...
  Read more
  If you want to fix it, fix it properly. Unicode specifies multiple algorithms for caseless string matching in section 3.13.5 – for something like this, you probably want an “identifier caseless match,” but any of them would entirely solve the problem Raymond describes (albeit, in some cases, the non-default case folds might have backwards compatibility issues due to the use of Unicode normalization forms). None of these algorithms are locale-dependent, although the standard does vaguely gesture at Turkish dotted and dotless i as a potential source of issues.
  
  ICU, as you might expect, implements the default case fold operation (and so should any other self-respecting Unicode string library). The other case folds are defined in terms of combining the default case fold with various other standard transforms, and it does not appear that ICU has bothered to provide convenience wrappers for those. Either way, Unicode case folding is not hard – it is a matter of either linking against a good Unicode library, or using a language that already comes with reasonable Unicode support out of the box.
  
  Read less
  - Brian Boorman May 2, 2025
    
    Interestingly, the topic of Unicode case folding came up in LKML this past week. Let’s say that Linus has views on the topic. Re: [GIT PULL] bcachefs fixes for 6.15-rc4
  - Jan Ringoš May 2, 2025
    
    I get where you’re coming from, but that’d be way too dangerous change in terms of backwards compatibility. And the desire is to simply load the correct resource, even if there are several whose names differ only in case.
Maurice Kayser May 1, 2025

Thank you so much for taking the time to look into and write about it!
Dmitry April 30, 2025 · Edited

@Joshua Hudson At least it’s a good way for zoomers trying to follow your advice to learn why early Basic programmers used line numbers like 10, 20, 30, etc., right? 🙂
P.S. Commenting to the right branch still doesn’t work right in mobile Chrome.
Joshua Hudson April 30, 2025

Lesson of the day: use numeric resource names and macros to get your identifiers. Everybody will thank you.

Why does Windows have trouble finding my Win32 resource if it contains an accented character?

Author

7 comments

Read next

Using C++ type aliasing to avoid the ODR problem with conditional compilation, part 1

Using type aliasing to avoid the ODR problem with conditional compilation, part 2

Author

7 comments

Read next

Using C++ type aliasing to avoid the ODR problem with conditional compilation, part 1

Using type aliasing to avoid the ODR problem with conditional compilation, part 2

Stay informed