{"id":111869,"date":"2025-12-15T07:00:00","date_gmt":"2025-12-15T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=111869"},"modified":"2025-12-18T10:18:10","modified_gmt":"2025-12-18T18:18:10","slug":"20251215-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20251215-00\/?p=111869","title":{"rendered":"The Windows clipboard automatic text conversion algorithm is path-dependent"},"content":{"rendered":"<p><a title=\"Resolving an ambiguity in the Windows clipboard automated text conversion table\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20251212-00\/?p=111862\"> We closed last time with this table<\/a>:<\/p>\n<table style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>To get<\/th>\n<th>First try<\/th>\n<th>Then try<\/th>\n<th>And then try<\/th>\n<\/tr>\n<tr>\n<th>CF_TEXT<\/th>\n<td>CF_TEXT<\/td>\n<td>CF_UNICODETEXT + WC2MB(ANSI CP)<\/td>\n<td>CF_OEMTEXT + OemToAnsi<\/td>\n<\/tr>\n<tr>\n<th>CF_OEMTEXT<\/th>\n<td>CF_OEMTEXT<\/td>\n<td>CF_UNICODETEXT + WC2MB(OEM CP)<\/td>\n<td>CF_TEXT + AnsiToOem<\/td>\n<\/tr>\n<tr>\n<th>CF_UNICODETEXT<\/th>\n<td>CF_UNICODETEXT<\/td>\n<td>CF_TEXT + MB2WC(ANSI CP)<\/td>\n<td>CF_OEMTEXT + MB2WC(OEM CP)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>I noted that there is something odd, possibly even disturbing, about this table.<\/p>\n<p>Let&#8217;s redraw the table as a diagram.<\/p>\n<table style=\"border-collapse: collapse; text-align: center;\" title=\"See description in body.\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px currentcolor;\" colspan=\"3\">CF_TEXT<\/td>\n<\/tr>\n<tr>\n<td>(CF_LOCALE)<\/td>\n<td>\u21c5<\/td>\n<td>&nbsp;<\/td>\n<td>\u2191<\/td>\n<td>(LOCALE_<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px currentcolor;\">CF_UNICODETEXT<\/td>\n<td>&nbsp;<\/td>\n<td>|<\/td>\n<td>USER_<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"width: 2em;\">\u2196\u2198<\/td>\n<td>\u2193<\/td>\n<td>DEFAULT)<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>(CF_LOCALE)<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px currentcolor;\">CF_OEMTEXT<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Each of the three boxes represents a clipboard format: <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code>, <code>CF_<wbr \/>TEXT<\/code>, or <code>CF_<wbr \/>OEM\u00adTEXT<\/code>.<\/p>\n<p>The lengths of the arrows connecting the boxes represent the priorities: Shorter arrows are preferred over longer arrows. The shortest arrow is the one connecting <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> to <code>CF_<wbr \/>TEXT<\/code>. In the middle is the arrow connecting <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> to <code>CF_<wbr \/>OEM\u00adTEXT<\/code>. And the longest arrow is the one connecting <code>CF_<wbr \/>TEXT<\/code> to <code>CF_<wbr \/>OEM\u00adTEXT<\/code>.<\/p>\n<p>Finally, the label on each arrow represents the code page that is used for the conversion. The conversions to and from <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> use the <code>CF_<wbr \/>LOCALE<\/code> clipboard format to tell them what locale to use, whereas the conversion between <code>CF_<wbr \/>TEXT<\/code> and <code>CF_<wbr \/>OEM\u00adTEXT<\/code> uses <code>LOCALE_<wbr \/>USER_<wbr \/>DEFAULT<\/code>.<\/p>\n<p>What&#8217;s interesting is that if you want to get from one box to another, say from <code>CF_<wbr \/>TEXT<\/code> to <code>CF_<wbr \/>OEM\u00adTEXT<\/code>, you have two options. You can either use the direct line from <code>CF_<wbr \/>TEXT<\/code> to <code>CF_<wbr \/>OEM\u00adTEXT<\/code>, or you can take the scenic route from <code>CF_<wbr \/>TEXT<\/code> to <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> to <code>CF_<wbr \/>OEM\u00adTEXT<\/code>. And the two options produce different results! (In category theory, you would say that <a href=\"https:\/\/en.wikipedia.org\/wiki\/Commutative_diagram\"> the diagram is not commutative<\/a>.)<\/p>\n<p>If you take the direct route from <code>CF_<wbr \/>TEXT<\/code> to <code>CF_<wbr \/>OEM\u00adTEXT<\/code>, then the conversion uses <code>LOCALE_<wbr \/>USER_<wbr \/>DEFAULT<\/code>, but if you take the scenic route, then the conversion to <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> uses the local specified by <code>CF_<wbr \/>LOCALE<\/code>, as does the conversion from <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> to <code>CF_<wbr \/>OEM\u00adTEXT<\/code>. If the local specified by <code>CF_<wbr \/>LOCALE<\/code> is different from <code>LOCALE_<wbr \/>USER_<wbr \/>DEFAULT<\/code>, then you could very well get different results!<\/p>\n<p>In my test program, I wrote the string <tt>\"\\xD0\"<\/tt> to the clipboard as ANSI, and when I read it back as OEM, I expected to receive <tt>\"\\x44\"<\/tt> because my system is running with US-English, and the character <tt>D0<\/tt> in code page 1252 is \u00d0 (U+00D0), whose best fit in code page 437 is D (U+0044).<\/p>\n<p>I set the <code>CF_<wbr \/>LOCALE<\/code> clipboard format to <code>0x0419<\/code>, which is the locale ID for ru-ru. Receiving character <tt>90<\/tt> would make sense if the ANSI and OEM code pages were taken from the ru-ru locale: Character <tt>D0<\/tt> in the ru-ru ANSI code page 1251 is \u0420 (U+0420), and that maps neatly to character <tt>90<\/tt> in the ru-ru OEM code page 866, which is also \u0420 (U+0420).<\/p>\n<p>So it seems that Windows is taking the scenic route, and rather than using <code>Ansi\u00adTo\u00adOem<\/code>, it&#8217;s going through <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code>. Is the table wrong?<\/p>\n<p>No, the table is correct.<\/p>\n<p>We&#8217;ll study the problem some more next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When the journey is not half of the fun.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-111869","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>When the journey is not half of the fun.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111869","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=111869"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111869\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=111869"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=111869"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=111869"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}