{"id":8273,"date":"2012-02-20T07:00:00","date_gmt":"2012-02-20T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2012\/02\/20\/whats-the-difference-between-text-document-text-document-ms-dos-format-and-unicode-text-document\/"},"modified":"2012-02-20T07:00:00","modified_gmt":"2012-02-20T07:00:00","slug":"whats-the-difference-between-text-document-text-document-ms-dos-format-and-unicode-text-document","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20120220-00\/?p=8273","title":{"rendered":"What&#039;s the difference between Text Document, Text Document &#8211; MS-DOS Format, and Unicode Text Document?"},"content":{"rendered":"<p>Alasdair King asks why Wordpad has three formats, <i>Text Document<\/i>, <i>Text Document&nbsp;&#8211; MS-DOS Format<\/i>, and <i>Unicode Text Document<\/i>. &#8220;<a href=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2010\/07\/20\/10040074.aspx#10042552\">Isn&#8217;t at least one redundant?<\/a>&#8221;\n Recall that in Windows, three code pages have special status.<\/p>\n<ol>\n<li>Unicode (more specifically, UTF-16LE) <\/li>\n<li>     <code>CP_ACP<\/code>,     commonly known as the ANSI code page,     although     <a href=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2004\/05\/31\/144893.aspx\">     that is a misnomer<\/a> <\/li>\n<li><code>CP_OEM<\/code>,     commonly known as the OEM code page,     although     <a href=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2005\/08\/29\/457483.aspx\">     that too is a misnomer<\/a>. <\/li>\n<\/ol>\n<p> Three text file formats. Three encodings. Hm&#8230; I wonder&#8230;\n As you might have guessed by now, the three text file formats correspond to the three special code pages. Now it&#8217;s just a matter of deciding which one matches with which. The easiest one is the Unicode one; it seems clear that <i>Unicode Text Document<\/i> matches with Unicode. Okay, we now have to figure out how <i>Text Document<\/i> and <i>Text Document&nbsp;&#8211; MS-DOS Format<\/i> map to <code>CP_ACP<\/code> and <code>CP_OEM<\/code>. But another piece of the puzzle is pretty clear, because <a href=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2005\/08\/29\/457483.aspx\"> MS-DOS used the so-called OEM code page<\/a>. Therefore, by process of elimination, <i>Text Document<\/i> corresponds to <code>CP_ACP<\/code>.\n Now that we have puzzled out what the three text formats correspond to, we can address the question &#8220;Isn&#8217;t at least one redundant?&#8221;\n <a href=\"http:\/\/blogs.msdn.com\/b\/michkap\/\"> Michael Kaplan<\/a> explained that <a href=\"http:\/\/blogs.msdn.com\/b\/michkap\/archive\/2005\/02\/08\/369197.aspx\"> ACP and OEM are (usually) different<\/a>. And neither is the same as Unicode. So in fact all three are (usually) different.<\/p>\n<p> In the United States, the so-called ANSI code page is <a href=\"http:\/\/msdn.microsoft.com\/goglobal\/cc305145.aspx\"> code page 1252<\/a>, the so-called OEM code page is <a href=\"http:\/\/msdn.microsoft.com\/goglobal\/cc305156.aspx\"> code page 437<\/a>, and Unicode is code page 1200. Here&#8217;s the string <tt>r&eacute;sum&eacute;<\/tt> expressed in each of the three encodings. <\/p>\n<table border=\"1\" style=\"border-collapse: collapse\" cellpadding=\"3\">\n<tr>\n<th valign=\"baseline\">Description<\/th>\n<th valign=\"baseline\">Encoding<\/th>\n<th valign=\"baseline\">Code page<br \/>(en-us)<\/th>\n<th valign=\"baseline\">Bytes<\/th>\n<\/tr>\n<tr>\n<td valign=\"baseline\">Text Document<\/td>\n<td valign=\"baseline\">CP_ACP<\/td>\n<td valign=\"baseline\" align=\"right\">1252<\/td>\n<td valign=\"baseline\"><tt>72 E9 73 75 6D E9<\/tt><\/td>\n<\/tr>\n<tr>\n<td valign=\"baseline\">Text Document&nbsp;&#8211; MS-DOS Format<\/td>\n<td valign=\"baseline\">CP_OEM<\/td>\n<td valign=\"baseline\" align=\"right\">437<\/td>\n<td valign=\"baseline\"><tt>72 82 73 75 6D 82<\/tt><\/td>\n<\/tr>\n<tr>\n<td valign=\"baseline\">Unicode Text Document<\/td>\n<td valign=\"baseline\">UTF-16LE<\/td>\n<td valign=\"baseline\" align=\"right\">1200<\/td>\n<td valign=\"baseline\"><tt>FF FE 72 00 E9 00 73 00<br \/>                             75 00 6D 00 E9 00<\/tt><\/td>\n<\/tr>\n<\/table>\n<p> Three encodings, three different files. No redundancy. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Alasdair King asks why Wordpad has three formats, Text Document, Text Document&nbsp;&#8211; MS-DOS Format, and Unicode Text Document. &#8220;Isn&#8217;t at least one redundant?&#8221; Recall that in Windows, three code pages have special status. Unicode (more specifically, UTF-16LE) CP_ACP, commonly known as the ANSI code page, although that is a misnomer CP_OEM, commonly known as the [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-8273","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Alasdair King asks why Wordpad has three formats, Text Document, Text Document&nbsp;&#8211; MS-DOS Format, and Unicode Text Document. &#8220;Isn&#8217;t at least one redundant?&#8221; Recall that in Windows, three code pages have special status. Unicode (more specifically, UTF-16LE) CP_ACP, commonly known as the ANSI code page, although that is a misnomer CP_OEM, commonly known as the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/8273","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=8273"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/8273\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=8273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=8273"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=8273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}