The Windows clipboard automatic text conversion algorithm is path-dependent

We closed last time with this table:

To get	First try	Then try	And then try
CF_TEXT	CF_TEXT	CF_UNICODETEXT + WC2MB(ANSI CP)	CF_OEMTEXT + OemToAnsi
CF_OEMTEXT	CF_OEMTEXT	CF_UNICODETEXT + WC2MB(OEM CP)	CF_TEXT + AnsiToOem
CF_UNICODETEXT	CF_UNICODETEXT	CF_TEXT + MB2WC(ANSI CP)	CF_OEMTEXT + MB2WC(OEM CP)

I noted that there is something odd, possibly even disturbing, about this table.

Let’s redraw the table as a diagram.

	CF_TEXT
(CF_LOCALE)	⇅		↑	(LOCALE_
	CF_UNICODETEXT		\|	USER_
		↖↘	↓	DEFAULT)
	(CF_LOCALE)		CF_OEMTEXT

Each of the three boxes represents a clipboard format: CF_UNICODETEXT, CF_TEXT, or CF_OEMTEXT.

The lengths of the arrows connecting the boxes represent the priorities: Shorter arrows are preferred over longer arrows. The shortest arrow is the one connecting CF_UNICODETEXT to CF_TEXT. In the middle is the arrow connecting CF_UNICODETEXT to CF_OEMTEXT. And the longest arrow is the one connecting CF_TEXT to CF_OEMTEXT.

Finally, the label on each arrow represents the code page that is used for the conversion. The conversions to and from CF_UNICODETEXT use the CF_LOCALE clipboard format to tell them what locale to use, whereas the conversion between CF_TEXT and CF_OEMTEXT uses LOCALE_USER_DEFAULT.

What’s interesting is that if you want to get from one box to another, say from CF_TEXT to CF_OEMTEXT, you have two options. You can either use the direct line from CF_TEXT to CF_OEMTEXT, or you can take the scenic route from CF_TEXT to CF_UNICODETEXT to CF_OEMTEXT. And the two options produce different results! (In category theory, you would say that the diagram is not commutative.)

If you take the direct route from CF_TEXT to CF_OEMTEXT, then the conversion uses LOCALE_USER_DEFAULT, but if you take the scenic route, then the conversion to CF_UNICODETEXT uses the local specified by CF_LOCALE, as does the conversion from CF_UNICODETEXT to CF_OEMTEXT. If the local specified by CF_LOCALE is different from LOCALE_USER_DEFAULT, then you could very well get different results!

In my test program, I wrote the string "\xD0" to the clipboard as ANSI, and when I read it back as OEM, I expected to receive "\x44" because my system is running with US-English, and the character D0 in code page 1252 is Ð (U+00D0), whose best fit in code page 437 is D (U+0044).

I set the CF_LOCALE clipboard format to 0x0419, which is the locale ID for ru-ru. Receiving character 90 would make sense if the ANSI and OEM code pages were taken from the ru-ru locale: Character D0 in the ru-ru ANSI code page 1251 is Р (U+0420), and that maps neatly to character 90 in the ru-ru OEM code page 866, which is also Р (U+0420).

So it seems that Windows is taking the scenic route, and rather than using AnsiToOem, it’s going through CF_UNICODETEXT. Is the table wrong?

No, the table is correct.

We’ll study the problem some more next time.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

1 comment

Tudor Zagreanu December 15, 2025

My guess is Clipboard History is reading the clipboard in between and asking for CF_UNICODETEXT, causing Windows to take the scenic route.

Stay informed

Get notified when new posts are published.

The Windows clipboard automatic text conversion algorithm is path-dependent

Author

1 comment

Leave a commentCancel reply

Read next

Why is the Windows clipboard taking the scenic route when converting from `CF_TEXT` to `CF_OEMTEXT`?

Deducing the consequences of Windows clipboard text formats on UTF-8

Author

1 comment

Leave a commentCancel reply

Read next

Why is the Windows clipboard taking the scenic route when converting from CF_TEXT to CF_OEM­TEXT?

Deducing the consequences of Windows clipboard text formats on UTF-8

Stay informed

Why is the Windows clipboard taking the scenic route when converting from `CF_TEXT` to `CF_OEMTEXT`?