{"id":111854,"date":"2025-12-09T07:00:00","date_gmt":"2025-12-09T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=111854"},"modified":"2025-12-09T09:16:22","modified_gmt":"2025-12-09T17:16:22","slug":"20251209-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20251209-00\/?p=111854","title":{"rendered":"How does Windows synthesize <CODE>CF_<WBR>UNICODE&shy;TEXT<\/CODE> from <CODE>CF_<WBR>TEXT<\/CODE> and vice versa?"},"content":{"rendered":"<p>Last time, we started our exploration of how Windows synthesizes text clipboard formats by <a title=\"How does Windows synthesize CF_OEMTEXT from CF_TEXT and vice versa?\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20251208-00\/?p=111849\"> looking at the conversion between <code>CF_<wbr \/>OEM\u00adTEXT<\/code> and <code>CF_<wbr \/>TEXT<\/code><\/a>. Today, we&#8217;ll look at what happens when <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> enters the picture.<\/p>\n<p>The introduction of <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> means that we now have three clipboard text formats, and therefore six possible conversions. The four new conversions are<\/p>\n<ul>\n<li><code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> to\/from <code>CF_<wbr \/>TEXT<\/code>.<\/li>\n<li><code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> to\/from <code>CF_<wbr \/>OEM\u00adTEXT<\/code>.<\/li>\n<\/ul>\n<p>These conversions are done with the assistance of the <code>CF_<wbr \/>LOCALE<\/code> clipboard format, which contains an <code>LCID<\/code>, which is a 32-bit integer that encodes a primary language (such as German), a sublanguage (such as Swiss-German), and a sort rule (such as phone book). None of these details are directly relevant to character set conversion. The locale is used because both the ANSI and OEM code pages can be derived from the locale, so it&#8217;s only one value that needs to be recorded.\u00b9<\/p>\n<p>The system converts to\/from <code>CF_<wbr \/>UNICODE\u00adTEXT<\/code> via the code page obtained from the LCID:<\/p>\n<ul>\n<li><code>LOCALE_<wbr \/>IDEFAULT\u00adANSI\u00adCODE\u00adPAGE<\/code> when converting to\/from <code>CF_<wbr \/>TEXT<\/code>.<\/li>\n<li><code>LOCALE_<wbr \/>IDEFAULT\u00adCODE\u00adPAGE<\/code> when converting to\/from <code>CF_<wbr \/>OEM\u00adTEXT<\/code>.<\/li>\n<\/ul>\n<p>Putting all of this into a chart gives us<\/p>\n<table style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th rowspan=\"2\">To<\/th>\n<th colspan=\"3\">From<\/th>\n<\/tr>\n<tr>\n<th>CF_TEXT<\/th>\n<th>CF_OEMTEXT<\/th>\n<th>CF_UNICODETEXT<\/th>\n<\/tr>\n<tr>\n<th>CF_TEXT<\/th>\n<td>nop<\/td>\n<td>OemToAnsi<\/td>\n<td>WC2MB(ANSI CP)<\/td>\n<\/tr>\n<tr>\n<th>CF_OEMTEXT<\/th>\n<td>AnsiToOem<\/td>\n<td>nop<\/td>\n<td>WC2MB(OEM CP)<\/td>\n<\/tr>\n<tr>\n<th>CF_UNICODETEXT<\/th>\n<td>MB2WC(ANSI CP)<\/td>\n<td>MB2WC(OEM CP)<\/td>\n<td>nop<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In the above table, &#8220;ANSI CP&#8221; means &#8220;the code page reported by calling <code>Get\u00adLocale\u00adInfo<\/code> with the LCID in the <code>CF_<wbr \/>LOCALE<\/code> clipboard format, and the <code>LOCALE_<wbr \/>IDEFAULT\u00adANSI\u00adCODE\u00adPAGE<\/code> locale attribute&#8221;. Similarly for &#8220;OEM CP&#8221;, using <code>LOCALE_<wbr \/>IDEFAULT\u00adCODE\u00adPAGE<\/code> instead of <code>LOCALE_<wbr \/>IDEFAULT\u00adANSI\u00adCODE\u00adPAGE<\/code>.<\/p>\n<p>That&#8217;s great, we have all the answers in a table. But that table raises more questions!<\/p>\n<p>We&#8217;ll start answering questions next time.<\/p>\n<p>\u00b9 This <code>CF_<wbr \/>LOCALE<\/code> clipboard format existed in 16-bit Windows as well, but it wasn&#8217;t really used for anything. The people who added Unicode support to the clipboard realized, &#8220;Hey, the thing we need is already here! We just have to start using it.&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let&#8217;s ask the locale.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-111854","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Let&#8217;s ask the locale.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111854","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=111854"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111854\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=111854"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=111854"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=111854"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}