{"id":45401,"date":"2015-06-11T07:00:00","date_gmt":"2015-06-11T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20150611-00\/?p=45401\/"},"modified":"2019-03-13T12:16:15","modified_gmt":"2019-03-13T19:16:15","slug":"20150611-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150611-00\/?p=45401","title":{"rendered":"Keep your eye on the code page: Is this string CP_ACP or UTF-8?"},"content":{"rendered":"<p>A customer had a problem with strings and code pages. <\/p>\n<blockquote CLASS=\"q\">\n<p>The customer has a password like <code>\"M&uuml;llwagen\"<\/code> for a particular user. Note the umlaut over the <i>u<\/i>. That character is encoded as the two bytes <code>C3 BC<\/code> according to UTF-8. When the customer passes this password to the <code>Logon&shy;User<\/code> function in order to authenticate the user, the call fails, claiming that the password is invalid. <\/p>\n<p>If we encode the <i>&uuml;<\/i> as the single byte <code>FC<\/code>, then the call to <code>Logon&shy;User<\/code> succeeds. <\/p>\n<p>Therefore, if the string is in UTF-8 form, it needs to be converted, and to do this we use the <code>Multi&shy;Byte&shy;To&shy;Wide&shy;Char<\/code> function. Once converted, the logon is successful. <\/p>\n<p>The problem is that we are not sure if the password being given to the application will encode the <i>&uuml;<\/i> as <code>C3 BC<\/code> or as <code>FC<\/code>. If it arrives as <code>FC<\/code>, and we try to convert it with the <code>Multi&shy;Byte&shy;To&shy;Wide&shy;Char<\/code> function, the <i>&uuml;<\/i> <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2012\/05\/04\/10300670.aspx\">is converted to <code>U+FFFD<\/code><\/a>. <\/p>\n<p>If I take the <code>FC<\/code>-encoded string and convert it with the <code>Multi&shy;Byte&shy;To&shy;Wide&shy;Char<\/code> function, passing <code>CP_ACP<\/code> as the first parameter, then it converts successfully (no <code>U+FFFD<\/code>), and the call to <code>Logon&shy;User<\/code> is successful. <\/p>\n<p>For the application, the customer does not want to distinguish the two cases or implement any retry logic or anything like that. Can you help us understand the issue, what we are doing wrong, and how we can fix it? <\/p>\n<\/blockquote>\n<p>As the problem is stated, you are screwed. <\/p>\n<p>You have a bunch of bytes, and you don&#8217;t know what encoding they are in. The byte sequence <code>C3 BC<\/code> might be a UTF-8 encoding of <i>&uuml;<\/i>, or it could be a <code>CP_ACP<\/code> encoding of <i>&Atilde;&frac12;<\/i>. You are stuck with guessing. But for something as important as passwords, you shouldn&#8217;t guess. You need to know for sure, because an incorrect guess will generate audit entries, and may cause the user to become locked out of the account due to too many incorrect passwords. <\/p>\n<p>This means that you need to make sure that whoever is passing you the string also tells you what encoding it is using. <\/p>\n<p>The customer liaison replied, <\/p>\n<blockquote CLASS=\"q\">\n<p>Thanks. I went back and talked to the customer, and it turns out that the password is always in UTF-8 form, so the problem is solved. We will always pass <code>CP_UTF8<\/code> when converting the string. <\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>You don&#8217;t know. Somebody has to tell you.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-45401","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>You don&#8217;t know. Somebody has to tell you.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/45401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=45401"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/45401\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=45401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=45401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=45401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}