Resolving an ambiguity in the Windows clipboard automated text conversion table

Raymond Chen

Last time, we encountered a mystery where the synthesis of CF_OEMTEXT from CF_TEXT did not use AnsiToOem. Today we will begin the investigation.

Recall that we have a table showing how Windows synthesizes each of the various text formats from the other two. But in the case where the clipboard has two formats available, and you ask for the third, there are two ways that the third format could be synthesized: It could convert the first, or it could convert the second. How does Windows decide?

The preference table is

To get	First try	Then try	And then try
CF_TEXT	CF_TEXT	CF_UNICODETEXT	CF_OEMTEXT
CF_OEMTEXT	CF_OEMTEXT	CF_UNICODETEXT	CF_TEXT
CF_UNICODETEXT	CF_UNICODETEXT	CF_TEXT	CF_OEMTEXT

In words, first look for a perfect match. If that’s not available, then try (in order) CF_UNICODETEXT, then CF_TEXT, then CF_OEMTEXT. (One of those last three checks is redundant with the perfect match check.)

Combining that with our previous table produces this conversion table with priorities:

To get	First try	Then try	And then try
CF_TEXT	CF_TEXT	CF_UNICODETEXT + WC2MB(ANSI CP)	CF_OEMTEXT + OemToAnsi
CF_OEMTEXT	CF_OEMTEXT	CF_UNICODETEXT + WC2MB(OEM CP)	CF_TEXT + AnsiToOem
CF_UNICODETEXT	CF_UNICODETEXT	CF_TEXT + MB2WC(ANSI CP)	CF_OEMTEXT + MB2WC(OEM CP)

Again, “ANSI CP” means “the code page reported by calling GetLocaleInfo with the LCID in the CF_LOCALE clipboard format, and the LOCALE_IDEFAULTANSICODEPAGE locale attribute”. Similarly for “OEM CP”, using LOCALE_IDEFAULTCODEPAGE instead of LOCALE_IDEFAULTANSICODEPAGE.

If you stare at this table, you might notice something odd, possibly even disturbing. And that is part of the answer to the mystery. We’ll talk about it next time.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

8 comments

Discussion is closed. Login to edit/delete existing comments.

Igor Levicki December 15, 2025
@Raymond Chen

Instead of moving the cheese around, maybe, just maybe, Microsoft developers could actually add something new and useful (at least for new programs) like, I don’t know,
```
CF_UTF8TEXT
```
?
- Raymond Chen Author December 15, 2025
  
  “Perfect is the enemy of good.” CF_UTF8TEXT support would have been nice, but would you say “You can’t ship UTF-8 support until you support CF_UTF8TEXT”? There are so many corners that if you insisted that all of them be identified and cleaned up, the feature would probably never meet your standards for shipping.
  - Raymond Chen Author December 18, 2025 · Edited
    
    "Microsoft developers had plenty of time from Windows 10 1803 release until 2025 to implement and ship proper UTF-8 support for a major component of user workflow such as Clipboard. Instead, they dicked around making Clipboard History..."
    
    There are three teams involved here. There's the NLS team (which is doing activeCodePage to get per-process CP_ACP). Then there's the window manager team (which doesn't want to change the clipboard behavior because it is heavily used and carries lots of compatibility baggage). And there's the Emoji Panel team (which decided that a Clipboard History feature would be a neat thing to add to...
    Read more
    “Microsoft developers had plenty of time from Windows 10 1803 release until 2025 to implement and ship proper UTF-8 support for a major component of user workflow such as Clipboard. Instead, they dicked around making Clipboard History…”
    
    There are three teams involved here. There’s the NLS team (which is doing activeCodePage to get per-process CP_ACP). Then there’s the window manager team (which doesn’t want to change the clipboard behavior because it is heavily used and carries lots of compatibility baggage). And there’s the Emoji Panel team (which decided that a Clipboard History feature would be a neat thing to add to the Emoji Panel). Three different teams with different priorities. There is no single pool of “Microsoft developers” that can be freely deployed to any feature.
    
    Read less
  - Igor Levicki December 17, 2025
    
    No I wouldn't have said that.
    
    What I would have said is "Microsoft developers had plenty of time from Windows 10 1803 release until 2025 to implement and ship proper UTF-8 support for a major component of user workflow such as Clipboard. Instead, they dicked around making Clipboard History which increases OS attack surface by adding new background services, compromises user privacy, and as it turns out from your recent post even prevents expected codepage conversion flow while running."
    
    And Clipboard History is just one of a myriad of things that were added in the meantime which should've had lower priority than...
    Read more
    No I wouldn’t have said that.
    
    What I would have said is “Microsoft developers had plenty of time from Windows 10 1803 release until 2025 to implement and ship proper UTF-8 support for a major component of user workflow such as Clipboard. Instead, they dicked around making Clipboard History which increases OS attack surface by adding new background services, compromises user privacy, and as it turns out from your recent post even prevents expected codepage conversion flow while running.”
    
    And Clipboard History is just one of a myriad of things that were added in the meantime which should’ve had lower priority than proper UTF-8 support for the clipboard.
    
    Read less
Simon Farnsworth December 14, 2025

The question that I’d be asking, given the table, is “what guarantees that ‘MB2WC(ANSI CP) followed by WC2MB(OEM CP)’ or ‘MB2WC(OEM CP) followed by WC2MB(ANSI CP)’ is the same as AnsiToOem or OemToAnsi?”.

I’m guessing that the answer is not only “nothing”, but that AnsiToOem/OemToAnsi sometimes does clever things based on the locale in use that MB2WC followed by WC2MB does not.
Vadim Zeitlin December 13, 2025 · Edited

If the table were accurate as shown, it could easily result in infinite recursion, e.g. if there is (only) `CF_OEMTEXT` on the clipboard and the program wants to get `CF_TEXT`, it would try getting `CF_UNICODETEXT` which would fall back to `CF_TEXT` again.
- Chris Iverson December 16, 2025
  
  That WOULD be a problem, if it recursed.
  
  It doesn’t. It’s not a recursive function, it’s a flat lookup table.
  
  If there’s only CF_OEMTEXT on the clipboard, and a program wants CF_TEXT, it will check if each format is available, in the order:
  
  1. CF_TEXT
  2. CF_UNICODETEXT
  3. CF_OEMTEXT
  
  If a program ONLY put OEMTEXT on the clipboard, then the first two checks will show that the requested format is not available, and it will perform the conversion from OEMTEXT.
Joshua Hudson December 12, 2025

Different processes can definitely be in different OEM code pages. Not sure if that’s what you are looking for.