December 18th, 2025

3 reactions

Concluding thoughts on our deep dive into Windows clipboard text conversion

Raymond Chen

For the past few articles (starting with conversion between CF_OEMTEXT and CF_TEXT), we’ve been looking at how Windows performs text conversion among its three clipboard text formats: CF_UNICODETEXT, CF_TEXT, and CF_OEMTEXT. A lot of the weirdness dates back to adding Unicode support to what originally supported only 8-bit code page-based encodings.

You might take away from this that the clipboard text conversion system is a mess, and you should simply avoid putting text on the clipboard. But really, all the problems boil down to inconsistent conversions to and from the 8-bit formats. If you stick with CF_UNICODETEXT, then everything works great!

For over two decades, Windows has been pushing application developers to move to Unicode, with support for 8-bit code pages being retained for backward compatibility with old programs that haven’t had a chance to update.

So don’t be an old program. Be a new program that uses Unicode, specifically the UTF-16LE encoding, which is what “Unicode” typically means in the context of Windows.

If you prefer to use UTF-8 internally, that’s fine, but convert to UTF-16LE when interacting with the clipboard. If you try to put 8-bit UTF-8 data on the clipboard as CF_TEXT, you are jumping into the ugly mess that is 8-bit code pages.

Bonus chatter: “But why didn’t they fix this when they added support for UTF-8 as CP_ACP?”

This is a case of perfect being the enemy of good.

The ability to specify a custom activeCodePage as CP_ACP was scoped primarily to allowing CP_ACP to be customized on a per-process basis. This magically takes care of functions like MultiByteToWideChar(CP_ACP, ...), as well as any functions built on top of those functions. In particular, the magic extends to functions that have both A and W versions since they internally use MultiByteToWideChar to convert the 8-bit string to UTF-16LE before passing it to the W version.

But there are lots of other places with hidden dependencies on weird quirks of the code page system, such as the clipboard. Chasing down every last one of them would have taken a long time, and then the activeCodePage team would also have to convince all the affected components to add additional code to support dynamic CP_ACP, which in turn could force a larger redesign of that component that the team felt was too risky.

At least the current version of activeCodePage is clear about what it does: It lets you customize the value of CP_ACP.

It’s often better to have a simple set of easy-to-remember rules, even if they don’t cover all the cases, rather than to have a complex set of rules that tries to cover more cases but inevitably still fails to get them all. At least with the simple set of rules, you can predict where it will work and where it will fall short.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

10 comments

Join the discussion.

Leave a commentCancel reply

aaaaaa 123456789 December 22, 2025 · Edited

Given the prevalence of UTF-8 nowadays, it honestly seems like a new clipboard format would solve many of the pain points, with implicit conversions to , of course. (I wouldn't add additional conversions to and — go through and let the existing tested code handle the remainder of the journey. Converting out of Unicode is already painful enough as it is without having slight incompatibilities arising from having two ways to do so.)

UTF-8 is a problem for modern apps, not for legacy software from the early 2000s, so a new API (in the form of a...
Read more
Given the prevalence of UTF-8 nowadays, it honestly seems like a new CF_UTF8TEXT clipboard format would solve many of the pain points, with implicit conversions to CF_UNICODETEXT, of course. (I wouldn’t add additional conversions to CF_TEXT and CF_OEMTEXT — go through CF_UNICODETEXT and let the existing tested code handle the remainder of the journey. Converting out of Unicode is already painful enough as it is without having slight incompatibilities arising from having two ways to do so.)

UTF-8 is a problem for modern apps, not for legacy software from the early 2000s, so a new API (in the form of a new clipboard format) should be good enough for actively-maintained software (and in-development software) to solve the problem without impacting older programs. I wonder if we’ll see that solution in a future Windows release.

Read less

Log in to Vote or Reply
- Jan Ringoš 1 hour ago
  
  Or perhaps the clipboard could just remember which actual code-page was used when setting CF_TEXT, and use that when converting to CF_UNICODETEXT, or to CF_TEXT of different process (with different ACP).
  
  Log in to Vote or Reply
Joshua Hudson December 19, 2025

No, that's not a sensible argument. UTF-16 is bad, everything using it is obsolete, and there's no way around it. I've been around a long time. As soon as UTF-16 failed of its promise of constant time indexing by adding surrogate pairs it became obsolete, and the W variant of the Windows API surface with it.

The UTF-8 everywhere team is right. I can sit here and measure this. Unless literally dealing with a wall of Chinese it's faster to literally handle everything internally as UTF-8 and literally convert to UTF-16 a few hundred codepoints at a time at screen draw...
Read more
No, that’s not a sensible argument. UTF-16 is bad, everything using it is obsolete, and there’s no way around it. I’ve been around a long time. As soon as UTF-16 failed of its promise of constant time indexing by adding surrogate pairs it became obsolete, and the W variant of the Windows API surface with it.

The UTF-8 everywhere team is right. I can sit here and measure this. Unless literally dealing with a wall of Chinese it’s faster to literally handle everything internally as UTF-8 and literally convert to UTF-16 a few hundred codepoints at a time at screen draw time.

When coding in .NET we are penalized greatly for not having a proper UTF-8 binding to SQL server. We feel the repeated string conversions back and forth (web services are all UTF8 or compressed UTF8, which does in fact outperform UTF16). We groan at the lack of a UTF-8 regex form. We can notice in the text editors if we look. We think we can see the problem in JavaScript as well.

So, we know that writing to UTF 16 for Windows benefit is wrong and bad, and since it’s already at the point where new performant applications will be UTF8 internally, the day will come where Windows will run better by flipping over and making the UTF8 A functions the native functions and the W versions call those instead.

But the clipboard is so far down, it might never reach the point where profile guided optimization says fix that one.

Read less

Log in to Vote or Reply
- Jan Ringoš 3 days ago
  
  It is what it is. Windows internals aren’t going to change.
  
  If you store and do a lot of text manipulation, sure UTF-8 …but if I get UTF-16 string from one API, append something, and pass it to another UTF-16 API, adding two extra conversions seems both waste of performance and of engineering time.
  
  Log in to Vote or Reply
  - Igor Levicki December 21, 2025
    
    > If you store and do a lot of text manipulation...
    
    You mean like:
    
    1. Web service which sends and receive UTF-8
    2. Database which stores and retrieves UTF-8
    3. Text editors which load, edit, and save UTF-8
    4. Loggers which log UTF-8 so your logs aren't pointlessly 2x the size and can be read on other platforms
    5. Browsers which use UTF-8 to request URL you typed, fetch pages in UTF-8, and show them to you
    6. Email clients which use UTF-8 (not Outlook, Outlook is retarded especially the new one which doesn't even register mailto: protocol properly)
    7. Hardware devices...
    Read more
    > If you store and do a lot of text manipulation…
    
    You mean like:
    
    1. Web service which sends and receive UTF-8
    2. Database which stores and retrieves UTF-8
    3. Text editors which load, edit, and save UTF-8
    4. Loggers which log UTF-8 so your logs aren’t pointlessly 2x the size and can be read on other platforms
    5. Browsers which use UTF-8 to request URL you typed, fetch pages in UTF-8, and show them to you
    6. Email clients which use UTF-8 (not Outlook, Outlook is retarded especially the new one which doesn’t even register mailto: protocol properly)
    7. Hardware devices such as barcode scanners whose drivers return UTF-8
    
    But if all you use is MS Paint to draw memes then UTF-8 really has no benefit for you.
    
    > but if I get UTF-16 string from one API, append something, and pass it to another UTF-16 API,
    
    If you had UTF-8 APIs to begin with you’d get UTF-8 string, append your UTF-8 “something”, and pass it to another UTF-8 API.
    
    > It is what it is. Windows internals aren’t going to change.
    
    That’s why we are here, to drag them kicking and screaming into the future.
    
    Read less
Igor Levicki 4 days ago

> If you prefer to use UTF-8 internally, that’s fine, but convert to UTF-16LE when interacting with the clipboard.

This is not just a matter of preference -- it is the matter of cross-platform compatibility.

Try writing code which is supposed to work on Windows and on Linux where everything is UTF-8 and you will see what I mean.

> This is a case of perfect being the enemy of good. ...

Keep telling that to yourself if it makes you sleep better at night, but it is still a lame excuse to keep status quo instead of actually modernizing the API surface AND...
Read more
> If you prefer to use UTF-8 internally, that’s fine, but convert to UTF-16LE when interacting with the clipboard.

This is not just a matter of preference — it is the matter of cross-platform compatibility.

Try writing code which is supposed to work on Windows and on Linux where everything is UTF-8 and you will see what I mean.

> This is a case of perfect being the enemy of good. …

Keep telling that to yourself if it makes you sleep better at night, but it is still a lame excuse to keep status quo instead of actually modernizing the API surface AND underlying behavior.

Read less

Log in to Vote or Reply
- Chris Iverson December 19, 2025
  
  > actually modernizing the API surface AND underlying behavior.
  
  Oh sure, let's break literally every application that currently exists on Windows just to allow easier cross-platform development. That makes a ton of sense.
  
  "That's not what I'm saying. Make the core of Windows UTF-8, like the rest of the world, and add a compatibility layer for anything that needs UTF-16."
  
  One, that would involve rewriting everything inside of Windows that even touches strings TWICE. ONCE to get the core to UTF-8, and AGAIN to make the MASSIVE compatibility layer just to let UTF-8 Windows pretend to be what it already is today!
  
  Two, compatibility...
  Read more
  > actually modernizing the API surface AND underlying behavior.
  
  Oh sure, let’s break literally every application that currently exists on Windows just to allow easier cross-platform development. That makes a ton of sense.
  
  “That’s not what I’m saying. Make the core of Windows UTF-8, like the rest of the world, and add a compatibility layer for anything that needs UTF-16.”
  
  One, that would involve rewriting everything inside of Windows that even touches strings TWICE. ONCE to get the core to UTF-8, and AGAIN to make the MASSIVE compatibility layer just to let UTF-8 Windows pretend to be what it already is today!
  
  Two, compatibility layers are not perfect. They always have issues. Changing the entire core of Windows to be completely different, and requiring every application that currently exists today to go through this new compatibility layer will likely break more applications on Windows than even exist on Linux.
  
  And now the Windows team has an even bigger testing surface to deal with, when they already don’t do enough testing. Anything that deals with strings has to be dealt with twice, in case the problem is either in core Windows or the compatibility layer.
  
  And what would they get out of this? A system that functionally is pretty much the same as the current one, but costs twice as much engineering effort(or more) to maintain. (And they’re going to have to maintain that for a LONG time, because you are NOT getting all of those current Windows Unicode apps to switch to UTF-8. So that compatibility layer is going to stay forever.) When they already don’t spend enough maintaining their systems.
  
  I have a lot of issues with the decisions Microsoft makes. I’ve disagreed with pretty much everything they’ve done the past few years. But IMO, choosing not to rewrite all of the core systems, and double or triple their dev costs dealing with the fallout, is not one of them.
  
  Windows is UTF-16. Write your Windows apps as UTF-16 apps, and if you need to pass data externally, convert it to UTF-8. Or use the UTF-8 code page, use UTF-8 internally in your app, use it for any external data you pass, and if you need to work with the OS, use the -A functions, or convert the string to UTF-16 temporarily.
  
  Oh, and don’t get me wrong. I think it would be a massive benefit for Windows to be UTF-8, and align with the rest of the world. It would certainly be more convenient for all developers, I’m not denying that. But that issue was decided a LONG time ago, back when Windows was the only widespread OS that cared about internationalization, and alphabets other than the basic Western one. The other OSes got to standardize on UTF-8 because they either didn’t care about it until after UTF-8 was invented, and available to standardize on, OR they were in the privileged position of being able to just break all the apps using the OS and tell them to use the new stuff.
  
  Turns out it’s really easy to rewrite your core features if you tell everyone using you to screw off if you break them! Who knew!
  
  Read less
  
  Log in to Vote or Reply
  - Igor Levicki 3 days ago
    
    Pro-tip: Next time you respond to someone, try addressing what they actually said.
    
    1. I never suggested breaking anything OS-wide—that’s a hallucination on your part.
    2. The only change I proposed was adding CF_UTF8 clipboard format for new programs. Nothing else.
    
    As for rewriting all APIs to UTF-8, I never suggested that either. But if you want to discuss it: the real work would be updating console, file I/O, and CRT APIs first, since they’re the most visible surfaces for most cross-platform apps. Everything else could come later—or never—considering GUI desktop apps are mostly just skinned web browsers today.
    
    So no, it’s not...
    Read more
    Pro-tip: Next time you respond to someone, try addressing what they actually said.
    
    1. I never suggested breaking anything OS-wide—that’s a hallucination on your part.
    2. The only change I proposed was adding CF_UTF8 clipboard format for new programs. Nothing else.
    
    As for rewriting all APIs to UTF-8, I never suggested that either. But if you want to discuss it: the real work would be updating console, file I/O, and CRT APIs first, since they’re the most visible surfaces for most cross-platform apps. Everything else could come later—or never—considering GUI desktop apps are mostly just skinned web browsers today.
    
    So no, it’s not an enormous rewrite for a corporation valued at ~$3T. Heck, even Copilot Agent could probably refactor those codebases while maintaining backward compatibility if prompted correctly. Of course, we can’t expect that from a company which, in 22 years since x64 debuted, hasn’t updated filesystem read/write internals to use 64-bit sizes, let alone exposed new file I/O APIs capable of reading a 5 GB file into RAM in one go.
    
    Read less
Simon Farnsworth 4 days ago

It also seems like a minor issue - AFAICT, it would be trivial to write a SetClipboardUtf8 function that takes UTF-8 as input, checks that CP_ACP is set to UTF-8, and then use MultiByteToWideChar(CP_ACP) to convert your input and then call SetClipboardData(CF_UNICODETEXT, converted). This then works for both delayed render (SetClipboardData(CF_UNICODETEXT, NULL), then call this function in response to a WM_RENDERFORMAT), and immediate render (just call the function).

Similar applies in reverse to a GetClipboardUtf8 function - GetClipboardData(CF_UNICODETEXT) and convert.

The lack of such a function in Win32 implies either that it's not worth Microsoft supplying because it's so rarely needed (e.g....
Read more
It also seems like a minor issue – AFAICT, it would be trivial to write a SetClipboardUtf8 function that takes UTF-8 as input, checks that CP_ACP is set to UTF-8, and then use MultiByteToWideChar(CP_ACP) to convert your input and then call SetClipboardData(CF_UNICODETEXT, converted). This then works for both delayed render (SetClipboardData(CF_UNICODETEXT, NULL), then call this function in response to a WM_RENDERFORMAT), and immediate render (just call the function).

Similar applies in reverse to a GetClipboardUtf8 function – GetClipboardData(CF_UNICODETEXT) and convert.

The lack of such a function in Win32 implies either that it’s not worth Microsoft supplying because it’s so rarely needed (e.g. because most clipboard-using applications already have many internal conversions, and one more isn’t a pain), or because it’s so simple that everyone might as well open-code it, and remove unneeded parts for their application (e.g. no check on CP_ACP).

Read less

Log in to Vote or Reply
- Raymond Chen Author 4 days ago · Edited
  
  Indeed, the helper doesn’t even need to check whether CP_ACP is UTF-8. It can just call MultiByteToWideChar(CP_UTF8). The point about this being reducible to an open source helper function is important though. Making something more convenient is nice, but it’s more important to make it possible.
  
  Log in to Vote or Reply

Stay informed

Get notified when new posts are published.

Email *

Country/Region *

I would like to receive the The Old New Thing Newsletter. Privacy Statement.

Follow this blog

Concluding thoughts on our deep dive into Windows clipboard text conversion

Author

10 comments

Leave a commentCancel reply

Read next

A shortcut gives me a weird path for a program shortcut that doesn’t point to the executable, so what is it?

All the other cool languages have `try`…`finally`. C++ says “We have `try`…`finally` at home.”

Author

10 comments

Leave a commentCancel reply

Read next

A shortcut gives me a weird path for a program shortcut that doesn’t point to the executable, so what is it?

All the other cool languages have try…finally. C++ says “We have try…finally at home.”

Stay informed

All the other cool languages have `try`…`finally`. C++ says “We have `try`…`finally` at home.”