{"id":111882,"date":"2025-12-18T07:00:00","date_gmt":"2025-12-18T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=111882"},"modified":"2025-12-19T08:27:14","modified_gmt":"2025-12-19T16:27:14","slug":"20251218-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20251218-00\/?p=111882","title":{"rendered":"Concluding thoughts on our deep dive into Windows clipboard text conversion"},"content":{"rendered":"<p>For the past few articles (starting with <a title=\"How does Windows synthesize CF_OEMTEXT from CF_TEXT and vice versa?\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20251208-00\/?p=111849\"> conversion between <code>CF_<wbr \/>OEM\u00adTEXT<\/code> and <code>CF_<wbr \/>TEXT<\/code><\/a>), we&#8217;ve been looking at how Windows performs text conversion among its three clipboard text formats: <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code>, <code>CF_<wbr \/>TEXT<\/code>, and <code>CF_<wbr \/>OEM\u00adTEXT<\/code>. A lot of the weirdness dates back to adding Unicode support to what originally supported only 8-bit code page-based encodings.<\/p>\n<p>You might take away from this that the clipboard text conversion system is a mess, and you should simply avoid putting text on the clipboard. But really, all the problems boil down to inconsistent conversions to and from the 8-bit formats. If you stick with <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code>, then everything works great!<\/p>\n<p>For over two decades, Windows has been pushing application developers to move to Unicode, with support for 8-bit code pages being retained for backward compatibility with old programs that haven&#8217;t had a chance to update.<\/p>\n<p>So don&#8217;t be an old program. Be a new program that uses Unicode, specifically the UTF-16LE encoding, which is what &#8220;Unicode&#8221; typically means in the context of Windows.<\/p>\n<p>If you prefer to use UTF-8 internally, that&#8217;s fine, but convert to UTF-16LE when interacting with the clipboard. If you try to put 8-bit UTF-8 data on the clipboard as <code>CF_<wbr \/>TEXT<\/code>, you are jumping into the ugly mess that is 8-bit code pages.<\/p>\n<p><b>Bonus chatter<\/b>: &#8220;But why didn&#8217;t they fix this when they added support for UTF-8 as <code>CP_<wbr \/>ACP<\/code>?&#8221;<\/p>\n<p>This is a case of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Perfect_is_the_enemy_of_good\"> perfect being the enemy of good<\/a>.<\/p>\n<p>The ability to specify a custom <code>activeCodePage<\/code> as <code>CP_<wbr \/>ACP<\/code> was scoped primarily to allowing <code>CP_<wbr \/>ACP<\/code> to be customized on a per-process basis. This magically takes care of functions like <code>Multi\u00adByte\u00adTo\u00adWide\u00adChar(<wbr \/><span style=\"border: solid 1px currentcolor;\">CP_<wbr \/>ACP<\/span>, ...)<\/code>, as well as any functions built on top of those functions. In particular, the magic extends to functions that have both A and W versions since they internally use <code>Multi\u00adByte\u00adTo\u00adWide\u00adChar<\/code> to convert the 8-bit string to UTF-16LE before passing it to the W version.<\/p>\n<p>But there are lots of other places with hidden dependencies on weird quirks of the code page system, such as the clipboard. Chasing down every last one of them would have taken a long time, and then the <code>activeCodePage<\/code> team would also have to convince all the affected components to add additional code to support dynamic <code>CP_<wbr \/>ACP<\/code>, which in turn could force a larger redesign of that component that the team felt was too risky.<\/p>\n<p>At least the current version of <code>activeCodePage<\/code> is clear about what it does: It lets you customize the value of <code>CP_<wbr \/>ACP<\/code>.<\/p>\n<p>It&#8217;s often better to have a simple set of easy-to-remember rules, even if they don&#8217;t cover all the cases, rather than to have a complex set of rules that tries to cover more cases but inevitably still fails to get them all. At least with the simple set of rules, you can predict where it will work and where it will fall short.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Stick to Unicode and you&#8217;ll be fine.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-111882","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Stick to Unicode and you&#8217;ll be fine.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111882","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=111882"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111882\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=111882"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=111882"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=111882"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}