{"id":7983,"date":"2012-03-28T07:00:00","date_gmt":"2012-03-28T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2012\/03\/28\/converting-to-unicode-usually-involves-you-know-some-sort-of-conversion\/"},"modified":"2012-03-28T07:00:00","modified_gmt":"2012-03-28T07:00:00","slug":"converting-to-unicode-usually-involves-you-know-some-sort-of-conversion","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20120328-00\/?p=7983","title":{"rendered":"Converting to Unicode usually involves, you know, some sort of conversion"},"content":{"rendered":"<p>\nA colleague was investigating a problem with a third party\napplication and found an unusual window class name:\nL&#8221;&#x6574;&#x7473;&#x6574;&#x7473;&#8221;.\nHe remarked,\n&#8220;This looks quite odd and could be some problem with the application.&#8221;\n<\/p>\n<p>\nThe string is nonsense in Chinese,\nbut I immediately recognized what was up.\n<\/p>\n<p>\nHere&#8217;s a hint:\nRewrite the string as\n<\/p>\n<blockquote CLASS=\"m\"><p>\nL&#8221;\\x6574&#8243; L&#8221;\\x7473&#8243; L&#8221;\\x6574&#8243; L&#8221;\\x7473&#8243;\n<\/p><\/blockquote>\n<p>\nStill don&#8217;t see it?\nHow about looking at the byte sequence,\nremembering that Windows uses UTF-16LE.\n<\/p>\n<blockquote CLASS=\"m\"><p>\n0x74 0x65 0x73 0x74 0x74 0x65 0x73 0x74\n<\/p><\/blockquote>\n<p>\nOkay, maybe you don&#8217;t have your ASCII table memorized.\n<\/p>\n<blockquote CLASS=\"m\">\n<table>\n<tr>\n<td>0x74<\/td>\n<td>0x65<\/td>\n<td>0x73<\/td>\n<td>0x74<\/td>\n<td>0x74<\/td>\n<td>0x65<\/td>\n<td>0x73<\/td>\n<td>0x74<\/td>\n<\/tr>\n<tr>\n<td>t<\/td>\n<td>e<\/td>\n<td>s<\/td>\n<td>t<\/td>\n<td>t<\/td>\n<td>e<\/td>\n<td>s<\/td>\n<td>t<\/td>\n<\/tr>\n<\/table>\n<\/blockquote>\n<p>\nThat&#8217;s right, the application took the ASCII string\n&#8220;testtest&#8221; and just treated it as a Unicode string\nwithout actually converting it to Unicode.\nWhen the compiler complained &#8220;Cannot convert char * to wchar_t *&#8221;\nthey just stuck a cast to make the compiler shut up.\n<\/p>\n<pre>\n<i>\/\/ Code in italics is wrong\nWNDCLASSW wc;\nwc.lpszClassName = (LPWSTR)\"testtest\";<\/i>\n<\/pre>\n<p>\nThey were lucky that the compiler happened to put\n<i>two<\/i> null bytes at the end of the &#8220;testtest&#8221; string.\n<\/p>\n<p>\n<b>Bonus psychic powers<\/b>: Actually, I have a theory\nas to how this happened that doesn&#8217;t involve maliciousness.\n(This is generally a good mindset to maintain,\nsince most of the time, when people cause a problem,\nit&#8217;s not willful; it&#8217;s accidental.)\nConsider a library with the following interface header file:\n<\/p>\n<pre>\n\/\/ mylib.h\n#ifdef __cplusplus\nextern \"C\" {\n#endif\nBOOL RegisterWindowClass(LPCTSTR pszClassName);\n#ifdef __cplusplus\n}; \/\/ extern \"C\"\n#endif\n<\/pre>\n<p>\nSomebody uses this header file like this:\n<\/p>\n<pre>\n#include &lt;mylib.h&gt;\nBOOL Initialize()\n{\n    return RegisterWindowClass(TEXT(\"testtest\"));\n}\n<\/pre>\n<p>\nSo far so good.\n<\/p>\n<p>\nMeanwhile, the library implementation goes like this:\n<\/p>\n<pre>\n#define UNICODE\n#define _UNICODE\n#include &lt;mylib.h&gt;\nLRESULT CALLBACK StandardWndProc(HWND, UINT, WPARAM, LPARAM);\nBOOL RegisterWindowClass(LPCTSTR pszClassName)\n{\n    WNDCLASS wc = { 0, StandardWndProc, 0, 0, g_hInstance,\n                    LoadIcon(IDI_APPLICATION),\n                    LoadCursor(IDC_ARROW),\n                    (HBRUSH)(COLOR_WINDOW + 1),\n                    NULL, pszClassName);\n    return RegisterClass(&amp;wc);\n}\n<\/pre>\n<p>\nThe two files both compile successfully, and they even link together.\nUnfortunately, one of them was compiled with Unicode disabled,\nand the other was compiled with Unicode enabled.\nSince the header file uses <code>LPCTSTR<\/code>,\nthe actual declaration of <code>RegisterWindowClass<\/code>\n<i>changes<\/i> depending on whether the code that includes\nthe header file is compiled as Unicode or ANSI.\n<\/p>\n<p>\nResult: If one file is compiled as ANSI and the other is\ncompiled as Unicode, then one will pass an ANSI string,\nwhich the other will receive and treat as Unicode.\n<\/p>\n<p>\nThis is why functions in Windows which are dependent on\nwhether the caller is compiled as ANSI or Unicode\nare really two functions, one with the A suffix (for ANSI)\nand another with the W suffix (for Wnicode?), and the\ngeneric name is really a macro that forwards to one or the\nother.\nIt prevents <code>TCHAR<\/code>s from sneaking past the compiler\nand ending up being interpreted differently by the two sides.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A colleague was investigating a problem with a third party application and found an unusual window class name: L&#8221;&#x6574;&#x7473;&#x6574;&#x7473;&#8221;. He remarked, &#8220;This looks quite odd and could be some problem with the application.&#8221; The string is nonsense in Chinese, but I immediately recognized what was up. Here&#8217;s a hint: Rewrite the string as L&#8221;\\x6574&#8243; L&#8221;\\x7473&#8243; [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-7983","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>A colleague was investigating a problem with a third party application and found an unusual window class name: L&#8221;&#x6574;&#x7473;&#x6574;&#x7473;&#8221;. He remarked, &#8220;This looks quite odd and could be some problem with the application.&#8221; The string is nonsense in Chinese, but I immediately recognized what was up. Here&#8217;s a hint: Rewrite the string as L&#8221;\\x6574&#8243; L&#8221;\\x7473&#8243; [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/7983","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=7983"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/7983\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=7983"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=7983"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=7983"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}