{"id":3985,"date":"2018-11-15T17:23:59","date_gmt":"2018-11-16T01:23:59","guid":{"rendered":"http:\/\/blogs.msdn.microsoft.com\/commandline\/?p=3985"},"modified":"2019-02-25T16:59:47","modified_gmt":"2019-02-26T00:59:47","slug":"windows-command-line-unicode-and-utf-8-output-text-buffer","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/commandline\/windows-command-line-unicode-and-utf-8-output-text-buffer\/","title":{"rendered":"Windows Command-Line: Unicode and UTF-8 Output Text Buffer"},"content":{"rendered":"<p>In this post, we&#8217;ll discuss the improvements we&#8217;ve been making to the Windows Console&#8217;s internal text buffer, enabling it to better store and handle Unicode and UTF-8 text.<\/p>\n<p><!--more--><\/p>\n<h2 id=\"posts-in-the-windows-command-line-series\">Posts in the Windows Command-Line series:<\/h2>\n<p>This list will be updated as more posts are published:<\/p>\n<ol>\n<li><a target=\"_blank\" href=\"https:\/\/blogs.msdn.microsoft.com\/commandline\/2018\/06\/20\/windows-command-line-backgrounder\/\" rel=\"noopener\">Command-Line Backgrounder<\/a><\/li>\n<li><a target=\"_blank\" href=\"https:\/\/blogs.msdn.microsoft.com\/commandline\/2018\/06\/27\/windows-command-line-the-evolution-of-the-windows-command-line\/\" rel=\"noopener\">The Evolution of the Windows Command-Line<\/a><\/li>\n<li><a target=\"_blank\" href=\"https:\/\/blogs.msdn.microsoft.com\/commandline\/2018\/07\/20\/windows-command-line-inside-the-windows-console\/\" rel=\"noopener\">Inside the Windows Console<\/a><\/li>\n<li><a target=\"_blank\" href=\"http:\/\/devblogs.microsoft.com\/commandline\/windows-command-line-introducing-the-windows-pseudo-console-conpty\/\" rel=\"noopener\">Introducing the Windows Pseudo Console (ConPTY) API<\/a><\/li>\n<li><strong>Unicode and UTF-8 Output Text Buffer<\/strong> <em>[this post]<\/em><\/li>\n<\/ol>\n<hr \/>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-Bart.png\"><img decoding=\"async\" width=\"600\" height=\"321\" class=\"size-medium wp-image-4005\" alt=\"Study your Unicode Text Encodings!\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-Bart-600x321.png\" \/><\/a><\/p>\n<p><a href=\"https:\/\/www.perl.com\/article\/building-a-utf-8-encoder-in-perl\/\">[Source: David Farrell\u2019s \u201cBuilding a UTF-8 encoder in Perl\u201d]<\/a><\/p>\n<p>The most visible aspect of a Command-Line Terminal is that it displays the text emitted from your shell and\/or Command-Line tools and apps, in a grid of mono-spaced cells \u2013 one cell per character\/symbol\/glyph. Great, that\u2019s simple. How hard can it be, right \u2013 it\u2019s just letters? Noooo! Read-on!<\/p>\n<h2>Representing Text<\/h2>\n<p>Text is text is text. Or is it?<\/p>\n<p>If you\u2019re someone who speaks a language that originated in Western Europe (e.g. English, French, German, Spanish, etc.), chances are that your written alphabet is pretty homogenous \u2013 10 digits, 26 separate letters \u2013 upper &amp; lower case = 62 symbols in total. Now add around 30 symbols for punctuation and you\u2019ll need around 95 symbols in total. But if you\u2019re from East Asia (e.g. Chinese, Japanese, Korean, Vietnamese, etc.) you\u2019ll likely read and write text with a few more symbols \u2026 more than 7000 in total!<\/p>\n<p>Given this complexity, how do computers represent, define, store, exchange\/transmit, and render these various forms of text in an efficient, and standardized\/commonly-understood manner?<\/p>\n<h2>In the beginning was ASCII<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Computer#Digital_computers\">The dawn of modern digital computing<\/a> was centralized around the UK and the US, and thus English was the predominant language and alphabet used.<\/p>\n<p>As we saw above, the ~95 characters of the English alphabet (and necessary punctuation) can be individually represented using 7-bit values (0-127), with room left-over for additional non-visible control codes.<\/p>\n<p>In 1963, the American National Standards Institute (ANSI) published the X3.4-1963 standard for the American Standard Code for Information Interchange (ASCII) \u2013 this became the basis of what we now know as the ASCII standard.<\/p>\n<blockquote>\n<p>\u2026 and Microsoft gets a bad rap for naming things \ud83d\ude09\n  The initial X3.4-1963 standard left 28 values undefined and reserved for future use. Seizing the opportunity, the International Telegraph and Telephone Consultative Committee (CCITT, from French: Comit\u00e9 Consultatif International T\u00e9l\u00e9phonique et T\u00e9l\u00e9graphique) proposed a change to the ANSI layout which caused the lower-case characters to differ in bit pattern from the upper-case characters by just a single bit. This simplified character case detection\/matching and the construction of keyboards and printers.<\/p>\n<\/blockquote>\n<p>Over time, additional changes were made to some of the characters and control codes, until we ended up with the now well-established ASCII table of characters which is supported by practically every computing device in use today.<\/p>\n<p><figure id=\"attachment_3205\" aria-labelledby=\"figcaption_attachment_3205\" class=\"wp-caption alignnone\" ><img decoding=\"async\" width=\"600\" height=\"393\" class=\"wp-image-3205 size-medium\" alt=\"7-bit ASCII Table\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/command-line-backgrounder-ascii-600x393.png\" \/><figcaption id=\"figcaption_attachment_3205\" class=\"wp-caption-text\">]<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/command-line-backgrounder-ascii.png\">4<\/a> 7-bit ASCII Table<\/figcaption><\/figure><\/p>\n<p>\u00a0<\/p>\n<p>The rapid adoption of Computers in Europe presented a new challenge though: How to represent text in languages other than English. For example, how should letters with accents, umlauts, and additional symbols be represented?<\/p>\n<p>To accomplish this, the ASCII table was extended with the addition of an extra bit, making characters 8-bits long, adding 127 \u201cextended characters\u201d:<\/p>\n<p><figure id=\"attachment_4035\" aria-labelledby=\"figcaption_attachment_4035\" class=\"wp-caption alignnone\" ><img decoding=\"async\" width=\"573\" height=\"335\" class=\"wp-image-4035 size-full\" alt=\"Extended 8-bit ASCII characters\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-extended-ascii-codepage.png\" \/><figcaption id=\"figcaption_attachment_4035\" class=\"wp-caption-text\">]<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-extended-ascii-codepage.png\">5<\/a> Extended 8-bit ASCII characters<\/figcaption><\/figure><\/p>\n<p>\u00a0<\/p>\n<p>But that that still didn\u2019t provide enough room to represent all the characters, glyphs and symbols required by computer users across the globe, many of whom needed to represent and display additional characters \/ glyphs.<\/p>\n<p>So, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Code_page\">Code pages<\/a> were introduced.<\/p>\n<h2>Code Pages \u2013 a partial solution<\/h2>\n<p>Code pages define sets of characters for the \u201cextended characters\u201d from 0x80 \u2013 0xff (and, in some cases, a few of the non-displaying characters between 0x00 and 0x19). By selecting a different Code page, a Terminal can display additional glyphs for European languages and some block-symbols (see above), <a href=\"https:\/\/en.wikipedia.org\/wiki\/CJK\">CJK<\/a> text, Vietnamese text, etc.<\/p>\n<p>However, writing code to handle\/swap Code pages, and the lack of any standardization for Code pages in general, made text processing and rendering difficult, error prone, and presented major interop &amp; user-experience challenges.<\/p>\n<p>Worse still, 128 additional glyphs doesn\u2019t even come close to providing enough characters to represent some languages: For example, high-school level Chinese uses 2200 ideograms, with several hundred more in everyday use, and in excess of 7000 ideograms in total.<\/p>\n<p>Clearly, code pages &#8211; additional sets of 128 chars \u2013 are not a scalable solution to this problem.<\/p>\n<p>One approach to solving this problem was to add more bits \u2013 an extra 8-bits, in fact!<\/p>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/DBCS\">Double Byte Character Set (DBCS)<\/a> code-page approach uses two bytes to represent a single character. This gives an addressable space of 2^16 \u2013 1 == 65,535 characters. However, despite attempts to standardize the Japanese <a href=\"https:\/\/en.wikipedia.org\/wiki\/Shift_JIS\">Shift JIS<\/a> encoding, and the variable-length ASCII-compatible <a href=\"https:\/\/en.wikipedia.org\/wiki\/Extended_Unix_Code#EUC-JP\">EUC-JP<\/a> encoding, DBCS code-page encodings were often <a href=\"https:\/\/en.wikipedia.org\/wiki\/DBCS#Controversy\">riddled with issues<\/a> and did not deliver a universal solution to the challenge of encoding text.<\/p>\n<p>What we really needed was a Universal Code for text data.<\/p>\n<h2>Enter, Unicode<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Unicode\">Unicode<\/a> is a set of standards that defines how text is represented and encoded.<\/p>\n<p>The design of Unicode started in 1987 by engineers at Xerox and Apple. The initial Unicode-88 spec was published in February 1988, and has been continually refined and updated ever since, adding new character representations, additional language support, and even emoji \ud83d\ude0a<\/p>\n<blockquote>\n<p>For a great history of Unicode, <a href=\"https:\/\/www.unicode.org\/history\/earlyyears.html\">read this<\/a>!\n  Today, Unicode supports up to 1,112,064 valid \u201ccodepoints\u201d each representing a single character \/ symbol \/ glyph \/ ideogram \/ etc. This should provide plenty of addressable codepoints for the future, especially considering that \u201cUnicode 11 currently defines 137,439 characters covering 146 modern and historic <a href=\"https:\/\/en.wikipedia.org\/wiki\/Script_(Unicode)\">scripts<\/a>, as well as multiple symbol sets and emoji\u201d [source: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Unicode\">Wikipedia<\/a>, Oct 2018]<\/p>\n<p>\u201c1.2 million codepoints should be enough for anyone\u201d \u2013 source: Rich Turner, Oct 2018\n  Unicode text data can be encoded in many ways, each with their strengths and weaknesses:<\/p>\n<\/blockquote>\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td valign=\"top\">\n        <b>Encoding<\/b>\n      <\/td>\n<td valign=\"top\">\n        <b>Notes<\/b>\n      <\/td>\n<td valign=\"top\">\n        <b># Bytes per codepoint<\/b>\n      <\/td>\n<td valign=\"top\">\n        <b>Pros<\/b>\n      <\/td>\n<td valign=\"top\">\n        <b>Cons<\/b>\n      <\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n        <b><a href=\"https:\/\/en.wikipedia.org\/wiki\/UTF-32\">UTF-32<\/a><\/b>\n      <\/td>\n<td valign=\"top\">\n        Each valid 32-bit value is a direct index to an individual Unicode codepoint\n      <\/td>\n<td valign=\"top\">\n        4\n      <\/td>\n<td valign=\"top\">\n        No decoding required\n      <\/td>\n<td valign=\"top\">\n        Consumes a lot of space\n      <\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n        <b><a href=\"https:\/\/en.wikipedia.org\/wiki\/UTF-16\">UTF-16<\/a><\/b>\n      <\/td>\n<td valign=\"top\">\n        Variable-length encoding, requiring either one or two 16-bit values to represent each codepoint\n      <\/td>\n<td valign=\"top\">\n        2\/4\n      <\/td>\n<td valign=\"top\">\n        Simple decoding\n      <\/td>\n<td valign=\"top\">\n        Consumes 2 bytes even for ASCII text. Can rapidly end-up requiring 4 bytes\n      <\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n        <b>UCS-2<\/b>\n      <\/td>\n<td valign=\"top\">\n        Precursor to UTF-16 Fixed-length 16-bit encoding used internally by Windows, Java, and JavaScript\n      <\/td>\n<td valign=\"top\">\n        2\n      <\/td>\n<td valign=\"top\">\n        Simple decoding\n      <\/td>\n<td valign=\"top\">\n        Consumes 2 bytes even for ASCII text. Unable to represent some codepoints\n      <\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n        <b><a href=\"https:\/\/en.wikipedia.org\/wiki\/UTF-8\">UTF-8<\/a><\/b>\n      <\/td>\n<td valign=\"top\">\n        Variable-length encoding. Requires between one and four bytes to represent all Unicode codepoints\n      <\/td>\n<td valign=\"top\">\n        1-4\n      <\/td>\n<td valign=\"top\">\n        Efficient, granular storage requirements\n      <\/td>\n<td valign=\"top\">\n        Moderate decoding cost\n      <\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n        <b>Others<\/b>\n      <\/td>\n<td valign=\"top\">\n        <a href=\"https:\/\/en.wikipedia.org\/wiki\/Comparison_of_Unicode_encodings\">Other encodings<\/a> exist, but are not in widespread use\n      <\/td>\n<td valign=\"top\">\n        N\/A\n      <\/td>\n<td valign=\"top\">\n        N\/A\n      <\/td>\n<td valign=\"top\">\n        N\/A\n      <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Due largely to its flexibility and storage\/transmission efficiency, UTF-8 has become the predominant text encoding mechanism on the Web: As of today (October 2018), <a href=\"https:\/\/en.wikipedia.org\/wiki\/UTF-8\">92&#46;4% of all Web Pages are encoded in UTF-8<\/a>!<\/p>\n<p><figure id=\"attachment_3275\" aria-labelledby=\"figcaption_attachment_3275\" class=\"wp-caption alignnone\" ><img decoding=\"async\" width=\"600\" height=\"353\" class=\"size-medium wp-image-3275\" alt=\"UTF-8 encoding popularity for web pages (source: Wikipedia)\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/command-line-backgrounder-unicode-encodings-600x353.png\" \/><figcaption id=\"figcaption_attachment_3275\" class=\"wp-caption-text\">]<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/command-line-backgrounder-unicode-encodings.png\">16<\/a> UTF-8 encoding popularity for web pages (source: Wikipedia)<\/figcaption><\/figure><\/p>\n<p>\u00a0<\/p>\n<p>It\u2019s clear, therefore that anything that processes text should <em>at least<\/em> be able to support UTF-8 text.<\/p>\n<blockquote>\n<p>To learn more about text encoding and Unicode, read Joel Spolsky\u2019s great writeup here: <a href=\"https:\/\/www.joelonsoftware.com\/2003\/10\/08\/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses\/\">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)<\/a><\/p>\n<\/blockquote>\n<h2>Console \u2013 built in a pre-Unicode dawn<\/h2>\n<p>Alas, the Windows Console is not (currently) able to support UTF-8 text!<\/p>\n<p>Windows Console was created way back in the early days of Windows, back before Unicode itself existed! Back then, a decision was made to represent each text character as a fixed-length 16-bit value (UCS-2). Thus, the Console\u2019s text buffer contains 2-byte <code>wchar_t<\/code> values per grid cell, <code>x<\/code> columns by <code>y<\/code> rows in size.<\/p>\n<p>While this design has supported the Console for more than 25 years, the rapid adoption of UTF-8 has started to cause problems:<\/p>\n<p>One problem, for example, is that because UCS-2 is a fixed-width 16-bit encoding, it is unable to represent <a href=\"https:\/\/en.wikipedia.org\/wiki\/UTF-16#\/media\/File:Unifont_Full_Map.png\">all Unicode codepoints<\/a>.<\/p>\n<p>Another related but separate problem with the Windows Console is that because GDI is used to render Console\u2019s text, and GDI does not support font-fallback, Console is unable to display glyphs for codepoints that don\u2019t exist in the currently selected font!<\/p>\n<blockquote>\n<p>Font-fallback is the ability to dynamically look-up and load a font that is similar-to the currently selected font, but which contains a glyph that\u2019s missing from the currently selected font\n  These combined issues are why Windows Console cannot (currently) display many complex Chinese ideograms and cannot display emoji.<\/p>\n<\/blockquote>\n<p>Emoji? SRSLY? This might at first sound trivial but is an issue since some tools now emit emoji to, for example, indicate test results, and some programming languages\u2019 source code supports\/requires Unicode, including emoji!<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-emoji-code.png\"><img decoding=\"async\" width=\"556\" height=\"327\" class=\"size-full wp-image-4015\" alt=\"Emoji Code\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-emoji-code.png\" \/><\/a><\/p>\n<p>[Source: <a target=\"_blank\" href=\"http:\/\/www.globalnerdy.com\/2014\/06\/03\/swift-fun-fact-1-you-can-use-emoji-characters-in-variable-constant-function-and-class-names\/\" rel=\"noopener\">You can use emoji characters in Swift variable, constant, function, and class names<\/a>]<\/p>\n<p>But I digress \u2026<\/p>\n<h2>Text Attributes<\/h2>\n<p>In addition to storing the text itself, a Console\/Terminal must store the foreground and background color, and any other per-cell information required.<\/p>\n<p>These attributes must be stored efficiently and quickly \u2013 there\u2019s no need to store background and foreground information for each cell individually, especially since most Console apps\/tools output pretty uniformly colored text, but storing and retrieving attributes must not unnecessarily hinder rendering performance.<\/p>\n<p>Let\u2019s dig in and find out how the Console handles all this! \ud83d\ude0a<\/p>\n<h2>Modernizing the Console\u2019s text buffer<\/h2>\n<p>As discussed in <a href=\"https:\/\/blogs.msdn.microsoft.com\/commandline\/2018\/08\/02\/windows-command-line-introducing-the-windows-pseudo-console-conpty\/\">the previous post in this series<\/a>, the Console team have been busy overhauling the Windows Console\u2019s internals for the last several Win10 releases, carefully modernizing, modularizing, simplifying, and improving the Console\u2019s code &amp; features \u2026 while not noticeably sacrificing performance, and not changing current behaviors.<\/p>\n<p>For each major change, we evaluate and prototype several approaches, and measure the Console\u2019s performance, memory footprint, power consumption, etc. to figure-out the best real-world solution. We took the same approach for the buffer improvements work which was started before 1803 shipped and continue beyond 1809.<\/p>\n<p>The key issue to solve was that the Console previously stored each cell\u2019s text data as UCS-2 fixed-length 2-byte <code>wchar_t<\/code> values.<\/p>\n<p>To fully support all Unicode characters we needed a more flexible approach that added no noticeable processing or memory overhead for the general case, but was able to dynamically handle additional bytes of text data for cells that contain multi-byte Unicode characters.<\/p>\n<p>We examined several approaches, and prototyped &amp; measured a few, which helped us disqualify some potential approaches where turned-out to be ineffective in real-world use.<\/p>\n<h2>Adding Unicode Support<\/h2>\n<p>Ultimately, we arrived at the following architecture:<\/p>\n<p><figure id=\"attachment_4085\" aria-labelledby=\"figcaption_attachment_4085\" class=\"wp-caption alignnone\" ><img decoding=\"async\" width=\"618\" height=\"600\" class=\"wp-image-4085 size-large\" alt=\"Console Text Buffer Architecture\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-architecture-618x600.png\" \/><figcaption id=\"figcaption_attachment_4085\" class=\"wp-caption-text\">]<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/33\/2019\/02\/5-Buffers-architecture.png\">21<\/a> Console Text Buffer Architecture<\/figcaption><\/figure><\/p>\n<p>\u00a0<\/p>\n<p>From the top (original buffer&#8217;s blue boxes):<\/p>\n<ul>\n<li><strong>ScreenInfo<\/strong> \u2013 maintains information about the viewport, etc., and contains a TextBuffer \n<ul>\n<li><strong>TextBuffer<\/strong> \u2013 represents the Console\u2019s text area as a collection of rows \n<ul>\n<li><strong>Row<\/strong> \u2013 uniquely represents each CharRow in the console and the formatting attributes applied to each row \n<ul>\n<li><strong>CharRow<\/strong> \u2013 contains a collection of CharRowCells, and the logic and state to handle row wrapping &amp; navigation \n<ul>\n<li><strong>CharRowCell<\/strong> \u2013 contains the actual cell\u2019s text, and a DbcsAttribute byte containing cell-specific flags<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Several key changes were made to the Console\u2019s buffer implementation (indicated in orange in the diagram above), including:<\/p>\n<ol>\n<li>The Console&#8217;s <code>CharRowCell::DbcsAttribute<\/code> stores formatting information about how wide the text data is for DBCS chars. Not all bits were in use so an <strong>additional flag was added<\/strong> to indicate if the text data for a cell exceeds one 16-bit <code>wchar_t<\/code> in length. If this flag is set, the Console will fetch the text data from the <code>UnicodeStorage<\/code>.<\/li>\n<li><strong><code>UnicodeStorage<\/code><\/strong> was added, which contains a map of <code>{row:column}<\/code> coordinates to a collection of 16-bit <code>wchar<\/code>values. This allows the buffer to store an arbitrary number of <code>wchar<\/code> values for each individual cell in the Console that needs to store additional Unicode text data, ensuring that the Console remains impervious to expansion to the scope and range of Unicode text data in the future. And because <code>UnicodeStorage<\/code> is a map, the lookup cost of the overflow text is constant and fast!<\/li>\n<\/ol>\n<p>So, imagine if a cell needed to display a Unicode grinning face emoji: \ud83d\ude00 This emoji\u2019s representation in (little-endian) bytes: <code>0xF0 0x9F 0x98 0x80<\/code>, or in words: <code>0x9FF0 0x8098<\/code>. Clearly, this emoji glyph won\u2019t fit into a single 2-byte <code>wchar_t<\/code>. So, in the new buffer, the CharRowCell\u2019s DbcsAttribute\u2018s &#8220;overrun&#8221; flag will be set, indicating that the Console should look-up the UTF-16 encoded data for that <code>{row:col}<\/code> stored in the UnicodeStorage\u2019s map container.<\/p>\n<p>A key point to make about this approach is that we don\u2019t need any additional storage if a character can be represented as a single 8\/16-bit value: We only need to store additional \u201coverrun\u201d text when needed. This ensures that for the most common case \u2013 storing ASCII and simpler Unicode glyphs, we don\u2019t increase the amount of data we consume, and don\u2019t negatively impact performance.<\/p>\n<h2>Great, so when do I get to try this out?<\/h2>\n<p>If you\u2019re running Windows 10 October 2018 Update (build 1809), you\u2019re already running this new buffer!<\/p>\n<p>We tested the new buffer prior to including it quietly in Insider builds in the months leading-up to 1809 and made some key improvements before 1809 was shipped.<\/p>\n<h2>Are we there yet?<\/h2>\n<p>Not quite!<\/p>\n<p>We\u2019re also working to further improve the buffer implementation in subsequent OS updates (and via the Insider builds that precede each OS release).<\/p>\n<p>The changes above only allow for the storage of a single codepoint per <code>CharRowCell<\/code>. More complex glyphs that require multiple codepoints are not yet supported, but we\u2019re working on adding this capability in a future OS release.<\/p>\n<p>The current changes also don\u2019t cover what is required for our \u201cprocessed input mode\u201d that presents an editable input line for applications like CMD.exe. We are planning and actively updating the code for popup windows, command aliases, command history, and the editable input line itself to support full true Unicode as well.<\/p>\n<p>And don\u2019t go trying to display emoji just yet \u2013 that requires a new rendering engine that supports font-fallback \u2013 the ability to dynamically find, load, and render glyphs from fonts other than the currently selected font. And that\u2019s the subject of a whole \u2018nother post for another time \ud83d\ude09.<\/p>\n<p>Stay tuned for more posts soon!<\/p>\n<p>We look forward to hearing your thoughts &#8211; feel free to sound-off below, or ping <a href=\"https:\/\/twitter.com\/richturn_ms\">Rich on Twitter<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we&#8217;ll discuss the improvements we&#8217;ve been making to the Windows Console&#8217;s internal text buffer, enabling it to better store and handle Unicode and UTF-8 text.<\/p>\n","protected":false},"author":910,"featured_media":4226,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,6],"tags":[23,29,31,37,64,65],"class_list":["post-3985","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-command-line","category-windows-console","tag-buffers","tag-command-line","tag-console","tag-encoding","tag-unicode","tag-utf-8"],"acf":[],"blog_post_summary":"<p>In this post, we&#8217;ll discuss the improvements we&#8217;ve been making to the Windows Console&#8217;s internal text buffer, enabling it to better store and handle Unicode and UTF-8 text.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/posts\/3985","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/users\/910"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/comments?post=3985"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/posts\/3985\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/media\/4226"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/media?parent=3985"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/categories?post=3985"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/commandline\/wp-json\/wp\/v2\/tags?post=3985"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}