December 12th, 2025
likemind blownintriguing3 reactions

Resolving an ambiguity in the Windows clipboard automated text conversion table

Last time, we encountered a mystery where the synthesis of CF_OEM­TEXT from CF_TEXT did not use Ansi­To­Oem. Today we will begin the investigation.

Recall that we have a table showing how Windows synthesizes each of the various text formats from the other two. But in the case where the clipboard has two formats available, and you ask for the third, there are two ways that the third format could be synthesized: It could convert the first, or it could convert the second. How does Windows decide?

The preference table is

To get First try Then try And then try
CF_TEXT CF_TEXT CF_UNICODETEXT CF_OEMTEXT
CF_OEMTEXT CF_OEMTEXT CF_UNICODETEXT CF_TEXT
CF_UNICODETEXT CF_UNICODETEXT CF_TEXT CF_OEMTEXT

In words, first look for a perfect match. If that’s not available, then try (in order) CF_UNICODE­TEXT, then CF_TEXT, then CF_OEM­TEXT. (One of those last three checks is redundant with the perfect match check.)

Combining that with our previous table produces this conversion table with priorities:

To get First try Then try And then try
CF_TEXT CF_TEXT CF_UNICODETEXT + WC2MB(ANSI CP) CF_OEMTEXT + OemToAnsi
CF_OEMTEXT CF_OEMTEXT CF_UNICODETEXT + WC2MB(OEM CP) CF_TEXT + AnsiToOem
CF_UNICODETEXT CF_UNICODETEXT CF_TEXT + MB2WC(ANSI CP) CF_OEMTEXT + MB2WC(OEM CP)

Again, “ANSI CP” means “the code page reported by calling Get­Locale­Info with the LCID in the CF_LOCALE clipboard format, and the LOCALE_IDEFAULT­ANSI­CODE­PAGE locale attribute”. Similarly for “OEM CP”, using LOCALE_IDEFAULT­CODE­PAGE instead of LOCALE_IDEFAULT­ANSI­CODE­PAGE.

If you stare at this table, you might notice something odd, possibly even disturbing. And that is part of the answer to the mystery. We’ll talk about it next time.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

8 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :
  • Igor Levicki 2 weeks ago

    @Raymond Chen

    Instead of moving the cheese around, maybe, just maybe, Microsoft developers could actually add something new and useful (at least for new programs) like, I don’t know,

    CF_UTF8TEXT

    ?

    • Raymond ChenMicrosoft employee Author 2 weeks ago

      “Perfect is the enemy of good.” CF_UTF8TEXT support would have been nice, but would you say “You can’t ship UTF-8 support until you support CF_UTF8TEXT”? There are so many corners that if you insisted that all of them be identified and cleaned up, the feature would probably never meet your standards for shipping.

      • Raymond ChenMicrosoft employee Author 1 week ago · Edited

        "Microsoft developers had plenty of time from Windows 10 1803 release until 2025 to implement and ship proper UTF-8 support for a major component of user workflow such as Clipboard. Instead, they dicked around making Clipboard History..."

        There are three teams involved here. There's the NLS team (which is doing activeCodePage to get per-process CP_ACP). Then there's the window manager team (which doesn't want to change the clipboard behavior because it is heavily used and carries lots of compatibility baggage). And there's the Emoji Panel team (which decided that a Clipboard History feature would be a neat thing to add to...

        Read more
      • Igor Levicki 1 week ago

        No I wouldn't have said that.

        What I would have said is "Microsoft developers had plenty of time from Windows 10 1803 release until 2025 to implement and ship proper UTF-8 support for a major component of user workflow such as Clipboard. Instead, they dicked around making Clipboard History which increases OS attack surface by adding new background services, compromises user privacy, and as it turns out from your recent post even prevents expected codepage conversion flow while running."

        And Clipboard History is just one of a myriad of things that were added in the meantime which should've had lower priority than...

        Read more
  • Simon Farnsworth 2 weeks ago

    The question that I’d be asking, given the table, is “what guarantees that ‘MB2WC(ANSI CP) followed by WC2MB(OEM CP)’ or ‘MB2WC(OEM CP) followed by WC2MB(ANSI CP)’ is the same as AnsiToOem or OemToAnsi?”.

    I’m guessing that the answer is not only “nothing”, but that AnsiToOem/OemToAnsi sometimes does clever things based on the locale in use that MB2WC followed by WC2MB does not.

  • Vadim Zeitlin · Edited

    If the table were accurate as shown, it could easily result in infinite recursion, e.g. if there is (only) `CF_OEMTEXT` on the clipboard and the program wants to get `CF_TEXT`, it would try getting `CF_UNICODETEXT` which would fall back to `CF_TEXT` again.

    • Chris Iverson

      That WOULD be a problem, if it recursed.

      It doesn’t. It’s not a recursive function, it’s a flat lookup table.

      If there’s only CF_OEMTEXT on the clipboard, and a program wants CF_TEXT, it will check if each format is available, in the order:

      1. CF_TEXT
      2. CF_UNICODETEXT
      3. CF_OEMTEXT

      If a program ONLY put OEMTEXT on the clipboard, then the first two checks will show that the requested format is not available, and it will perform the conversion from OEMTEXT.

  • Joshua Hudson

    Different processes can definitely be in different OEM code pages. Not sure if that’s what you are looking for.