{"id":21283,"date":"2008-08-11T10:00:01","date_gmt":"2008-08-11T10:00:01","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2008\/08\/11\/psychic-debugging-why-cant-streamreader-read-apostrophes-from-a-text-file\/"},"modified":"2008-08-11T10:00:01","modified_gmt":"2008-08-11T10:00:01","slug":"psychic-debugging-why-cant-streamreader-read-apostrophes-from-a-text-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20080811-01\/?p=21283","title":{"rendered":"Psychic debugging: Why can&#8217;t StreamReader read apostrophes from a text file?"},"content":{"rendered":"<p><P>\nAs is customary, the first day of CLR Week is a warm-up.\nActually, today&#8217;s question is a BCL question, not a CLR question,\nbut only the nitpickers will bother to notice.\n<\/P>\n<BLOCKQUOTE CLASS=\"q\">\n<P>\nCan somebody explain why StreamReader can&rsquo;t read apostrophes?\nI have a text file, and I read from it the way you would expect:\n<\/P>\n<PRE>\nStreamReader sr = new StreamReader(&#8220;myfile.txt&#8221;);\nConsole.WriteLine(sr.ReadToEnd());\nsr.Close();\n<\/PRE>\n<P>\nI expect this to print the contents of the file\nto the console, and it does&mdash;almost.\nEverything looks great except that all the apostrophes are gone!\n<\/P>\n<\/BLOCKQUOTE>\n<P>\nYou don&#8217;t have to have very strong psychic powers to figure this one out.\n<\/P>\n<P>\nHere&#8217;s a hint: In some versions of this question,\nthe problem is with accented letters.\n<\/P>\n<P>\nYour first psychic conclusion is that the text file is probably an ANSI\ntext file.\nBut\n<A HREF=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/system.io.streamreader.aspx\">\nStreamReader defaults to UTF-8<\/A>, not ANSI.\nOne version of this question actually came right out and asked,\n&#8220;Why can&#8217;t StreamReader read apostrophes from my ANSI text file?&#8221;\nThe alternate version of the\nquestion already contains a false hidden assumption:\nStreamReader can&#8217;t read apostrophes from an ANSI text file\nbecause StreamReader (by default) doesn&#8217;t read ANSI text files at all!\n<\/P>\n<P>\n<P>\nBut that shouldn&#8217;t be a factor, since the apostrophe is encoded\nthe same in ANSI and UTF-8, right?\n<\/P>\n<P>\nThat&#8217;s your second clue.\nOnly the apostrophe is affected.\nWhat&#8217;s so special about the apostrophe?\n(The bonus hint should tip you off:\nWhat&#8217;s so special about accented letters?\nWhat property do they share with the apostrophe?)\n<\/P>\n<P>\nThere are apostrophes and there are apostrophes,\nand it&#8217;s those &#8220;weird&#8221; apostrophes that are the issue here.\nCode points\nU+2018&nbsp;(&lsquo;) and\nU+2019&nbsp;(&rsquo;)\noccupy positions 0x91 and 0x92, respectively,\nin code page 1252,\nand these &#8220;weird&#8221; apostrophes are all illegal lead bytes in\nUTF-8 encoding.\nAnd the default behavior for the\n<A HREF=\"http:\/\/msdn2.microsoft.com\/en-us\/library\/system.text.utf8encoding.aspx\">\nEncoding.UTF8Encoding<\/A>\nencoding is to ignore invalid byte sequences.\nNote that StreamReader does not raise an exception when incorrectly-encoded\ntext is encountered.\nIt just ignores the bad byte and continues as best it can,\nfollowing\n<A HREF=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2007\/07\/03\/3665338.aspx#3672697\">\nBurak&#8217;s advice<\/A>.\n<\/P>\n<P>\nResult:\nStreamReader appears to ignore apostrophes and accented letters.\n<\/P>\n<P>\nThere are therefore multiple issues here.\nFirst,\nyou may want to look at why\nyour ANSI text file is using those weird apostrophes.\nMaybe it&#8217;s intentional, but I suspect it isn&#8217;t.\nSecond, if you&#8217;re going to be reading ANSI text,\nyou can&#8217;t use a default StreamReader, since a default StreamReader\ndoesn&#8217;t read ANSI text.\nYou need to set the encoding to <CODE>System.Text.Encoding.Default<\/CODE>\nif you want to read ANSI text.\nAnd third, why are you using ANSI text in the first place?\nANSI text files are not universally transportable, since the ANSI code\npage changes from system to system.\nShouldn&#8217;t you be using UTF-8 text files in the first place?\n<\/P>\n<P>\nAt any rate, the solution is to decide on an encoding and to specify\nthat encoding when creating the StreamReader.\n<\/P>\n<P>\nThis exercise is just another variation on\n<A HREF=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2005\/03\/08\/389527.aspx\">\nKeep your eye on the code page<\/A>.\n<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As is customary, the first day of CLR Week is a warm-up. Actually, today&#8217;s question is a BCL question, not a CLR question, but only the nitpickers will bother to notice. Can somebody explain why StreamReader can&rsquo;t read apostrophes? I have a text file, and I read from it the way you would expect: StreamReader [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-21283","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>As is customary, the first day of CLR Week is a warm-up. Actually, today&#8217;s question is a BCL question, not a CLR question, but only the nitpickers will bother to notice. Can somebody explain why StreamReader can&rsquo;t read apostrophes? I have a text file, and I read from it the way you would expect: StreamReader [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/21283","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=21283"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/21283\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=21283"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=21283"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=21283"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}