{"id":108854,"date":"2023-10-05T07:00:00","date_gmt":"2023-10-05T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=108854"},"modified":"2023-10-05T12:45:19","modified_gmt":"2023-10-05T19:45:19","slug":"20231005-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20231005-00\/?p=108854","title":{"rendered":"How can I get WideCharToMultiByte to convert strings encoded in UTF-16BE?"},"content":{"rendered":"<p>A customer had a Windows program that receives data in UTF-16BE format, and they want to convert it to Shift JIS format. According to the customer liaison:<\/p>\n<blockquote class=\"q\"><p>They convert the characters from UTF-16LE to Shift JIS by calling <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code>, and it works fine. However, trying to convert the characters from UTB-16BE to Shift JIS via <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code> produces garbage. How can we tell <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code> that the string is UTF-16BE? Is there any documentation that explains this?<\/p><\/blockquote>\n<p>In Windows, if a string is described as being in Unicode or UTF-16 format, the documentation means UTF-16LE format by default. Similarly, if a sequence of bytes is described as encoding a multi-byte integer, the documentation means little-endian twos-complement format by default.\u00b9<\/p>\n<p>The bias toward little-endian format in Windows is so strong that big-endian format is sometimes called &#8220;reverse byte order&#8221;, such as in the values returned by the <code>Is\u00adText\u00adUnicode<\/code> format.<\/p>\n<p>In this case, it&#8217;s not clear how the customer is using the <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code> function to convert UTF-16BE to Shift JIS. The <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code> function does not have any flag to specify the source encoding, so the system assumes the default, which is UTF-16LE. I&#8217;m guessing that they are just passing UTF-16BE data directly to the <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code> function and hoping that the function somehow employs psychic powers to realize &#8220;Oh, this time, the data should be treated as UTF-16BE.&#8221;<\/p>\n<p>The <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code> function does not have psychic powers. It converts from UTF-16LE.<\/p>\n<p>The customer must convert their source data from UTF-16BE to UTF-16LE, and then pass the UTF-16LE data to <code>Wide\u00adChar\u00adTo\u00adMulti\u00adByte<\/code> function. Fortunately, converting UTF-16BE to UTF-16LE is extremely straightforward.<\/p>\n<p>\u00b9 One example of how the default might not apply is when talking about data encoded in &#8220;network byte order&#8221;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You first have to get it into a format the WideCharToMultiByte accepts.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-108854","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>You first have to get it into a format the WideCharToMultiByte accepts.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108854","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=108854"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108854\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=108854"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=108854"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=108854"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}