December 9th, 2025
0 reactions

How does Windows synthesize CF_UNICODE­TEXT from CF_TEXT and vice versa?

Last time, we started our exploration of how Windows synthesizes text clipboard formats by looking at the conversion between CF_OEM­TEXT and CF_TEXT. Today, we’ll look at what happens when CF_UNICODE­TEXT enters the picture.

The introduction of CF_UNICODE­TEXT means that we now have three clipboard text formats, and therefore six possible conversions. The four new conversions are

  • CF_UNICODE­TEXT to/from CF_TEXT.
  • CF_UNICODE­TEXT to/from CF_OEM­TEXT.

These conversions are done with the assistance of the CF_LOCALE clipboard format, which contains an LCID, which is a 32-bit integer that encodes a primary language (such as German), a sublanguage (such as Swiss-German), and a sort rule (such as phone book). None of these details are directly relevant to character set conversion. The locale is used because both the ANSI and OEM code pages can be derived from the locale, so it’s only one value that needs to be recorded.¹

The system converts to/from CF_UNICODE­TEXT via the code page obtained from the LCID:

  • LOCALE_IDEFAULT­ANSI­CODE­PAGE when converting to/from CF_TEXT.
  • LOCALE_IDEFAULT­CODE­PAGE when converting to/from CF_OEM­TEXT.

Putting all of this into a chart gives us

To From
CF_TEXT CF_OEMTEXT CF_UNICODETEXT
CF_TEXT nop OemToAnsi WC2MB(ANSI CP)
CF_OEMTEXT AnsiToOem nop WC2MB(OEM CP)
CF_UNICODETEXT MB2WC(ANSI CP) MB2WC(OEM CP) nop

In the above table, “ANSI CP” means “the code page reported by calling Get­Locale­Info with the LCID in the CF_LOCALE clipboard format, and the LOCALE_IDEFAULT­ANSI­CODE­PAGE locale attribute”. Similarly for “OEM CP”, using LOCALE_IDEFAULT­CODE­PAGE instead of LOCALE_IDEFAULT­ANSI­CODE­PAGE.

That’s great, we have all the answers in a table. But that table raises more questions!

We’ll start answering questions next time.

¹ This CF_LOCALE clipboard format existed in 16-bit Windows as well, but it wasn’t really used for anything. The people who added Unicode support to the clipboard realized, “Hey, the thing we need is already here! We just have to start using it.”

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

1 comment

Sort by :
  • skSdnW

    And if an application sets CF_UNICODETEXT on the clipboard, how is CF_LOCALE filled? GetThreadLocale? GetThreadUILanguage?