Windows Command-Line: Inside the Windows Console
Welcome to the third post in the Windows Command-Line series. In this post, we’ll start to dig into the internals of the Windows Console and Command-Line, what it is, what it does … and what it doesn’t do.
Posts in the “Windows Command-Line” series
Note: This chapter list will be updated as more posts are published:
- The Evolution of the Windows Command-Line
- Inside the Windows Console [This Post]
- Introducing the Windows Pseudo Console (ConPTY)
- Unicode and UTF-8 Output Text Buffer
[Updated 2018-07-20 to improve readability and clarify some Unicode/UTF-x details]
During the initial development of Windows NT, circa 1989, there was no GUI, there was no desktop, there was ONLY a full-screen command-line, that visually resembled MS-DOS more than it did the future. When the Windows GUI’s implementation started to arrive, the team needed a Console GUI app and thus, the Windows Console was born. Windows Console is one of the first Windows NT GUI apps, and is certainly one of the oldest Windows apps still in general use.
The Windows Console code-base is currently (July 2018) almost 30 years old … older, in fact, than the developers who now work on it! 😄
What does the Console do?
As we learned in our previous posts, a Terminal’s job is relatively simple:
- Handle User Input
- Accept input from devices including keyboard, mouse, touch, pen, etc.
- Translate input into relevant characters and/or ANSI/VT sequences
- Send characters to the connected app/tool/shell
- Handle App Output:
- Accept text output from a connected Command-Line app/tool
- Update the display as required, based on the received app output (e.g. output text, move the cursor, set text color, etc.)
- Handle System Interactions:
- Launch when requested
- Manage resources
- Resize/maximize/minimize, etc.
- Terminate when required, or when the communications channel is closed/terminated
However, the Windows Console does things a little differently:
Inside the Windows Console
Windows Console is a traditional Win32 executable and, though it was originally written in ‘C’, much of the code is being migrated to modern C++ as the team modernizes and modularizes Console’s codebase.
For those who care about such things: Many have asked whether Windows is written in C or C++. The answer is that – despite NT’s Object-Based design – like most OS’, Windows is almost entirely written in ‘C’. Why? C++ introduces a cost in terms of memory footprint, and code execution overhead. Even today, the hidden costs of code written in C++ can be surprising, but back in the late 1990’s, when memory cost ~$60/MB (yes … $60 per MEGABYTE!), the hidden memory cost of vtables etc. was significant. In addition, the cost of virtual-method call indirection and object-dereferencing could result in very significant performance & scale penalties for C++ code at that time. While one still needs to be careful, the performance overhead of modern C++ on modern computers is much less of a concern, and is often an acceptable trade-off considering its security, readability, and maintainability benefits … which is why we’re steadily upgrading the Console’s code to modern C++.
So, what’s inside the Windows Console?
Before Windows 7, Windows Console instances were hosted in the crucial Client Server Runtime Subsystem (CSRSS). In Windows 7, however, Console was extracted from CSRSS for security and reliability reasons, and given a new home in the following binaries:
- conhost.exe – the user-mode Windows Console UX & command-line plumbing
- condrv.sys – a Windows kernel driver providing communication infrastructure between conhost and one or more Command-Line shells/tools/apps
A high-level view of Console’s current internal architecture looks like this:
The core components of the Console consist of the following (from the bottom-up):
- ConDrv.sys – Kernel-Mode driver
- Provides a high-performance communications channel between Console and any connected Command-Line apps
- Ferries IO Control (IOCTL) messages back and forth between Command-Line apps and the Console they’re “attached” to
- Console IOCTL messages contain
- Data representing requests to execute API calls against the Console instance
- Text sent from the Console to the Command-Line app
- ConHost.exe – Win32 GUI app:
- ConHost Core – the Console’s internals and plumbing
- API Server: Converts IOCTL messages received from Command-Line app(s) into API calls, and sends text records from Console to Command-Line app
- API: Implements the Win32 Console API & logic behind all the operations that the Console can be asked to perform
- Input Buffer: Stores keyboard and mouse event records generated by user input
- VT Parser: If enabled, parses VT sequences from text, extracts any found from text, and generates equivalent API calls instead
- Output Buffer: Stores the text displayed on the Console’s display. Essentially a 2D array of CHAR_INFO structs which contain each cell’s character data & attributes (more on the buffer below)
- Other: Not included in the diagram above include settings infrastructure storing/retrieving values from registry and/or shortcut files etc.
- Console UX App Services – the Console UX & UI layer
- Manages the layout, size, position, etc. of the Console window on-screen
- Displays and handles settings UI, etc.
- Pumps the Windows message queue, handles Windows messages, and translates user input into key and mouse event records, storing them in the Input Buffer
- ConHost Core – the Console’s internals and plumbing
The Windows Console API
As can be seen in the Console architecture above, unlike NIX terminals, the Console sends/receives API calls and/or data serialized into IO Control (IOCTL) messages, not serialized text. Even ANSI/VT sequences embedded in text received from (primarily Linux) Command-Line apps is extracted, parsed and converted into API calls. This difference exposes the key fundamental philosophical difference between *NIX and Windows: In *NIX, “everything is a file”, whereas, in Windows, “everything is an object“.
There are pros and cons to both approaches, which we’ll outline, but avoid debating at length here. Just remember that this key difference in philosophy is fundamental to many of the differences between Windows and *NIX!
In *NIX, Everything is a File
When Unix was first implemented in the late 1960’s and early 1970’s, one of the core tenets was that (wherever possible) everything should be abstracted as a file stream. One of the key goals was to simplify the code required to access devices and peripherals: If all devices presented themselves to the OS as file-systems, then existing code could access those devices more easily. This philosophy runs deep: One can even navigate and interrogate a great deal of a *NIX-based OS & machine configuration by navigating pseudo/virtual file-systems which expose what appear to be “files” and folders, but actually represent machine configuration, and hardware. For example, in Linux, one can explore a machine’s processors’ properties by examining the contents of the
The simplicity and consistency of this model can, however, come at a cost: Extracting/interrogating specific information from text in pseudo files, and returned from executing commands often requires tools, e.g. sed, awk, perl, python, etc. These tools are used to write commands and scripts to parse the text content, looking for specific patterns, fields, and values. Some of these scripts can get quite complex, are often difficult to maintain, and can be fragile – if the structure, layout, and/or format of the text changes, many scripts will likely need to be updated.
In Windows, Everything is an Object
When Windows NT was being designed & built, “Objects” were seen as the future of software design: “Object Oriented” languages were emerging faster than rabbits from a burrow – Simula and Smalltalk were already established, and C++ was becoming popular. Other Object-Oriented languages like Python, Eiffel, Objective-C, ObjectPascal/Delphi, Java, C#, and many others followed in rapid succession.
Inevitably, having been forged during those heady, Object-Oriented days (circa 1989), Windows NT was designed with a philosophy that “everything is an object”. In fact, one of the most important parts of the NT Kernel is the “Object Manager“!
Developers use Windows’ Win32 API to access and manipulate objects and structures that provide access to similar information provided by *NIX pseudo files and tools. And because parsers, compilers, and analyzers understand the structure of objects, many coding errors can often be caught earlier, helping verify that the programmer’s intent is syntactically and logically correct. This can also result in less breakage, volatility, and “churn” over time.
So, coming back to our central discussion about Windows Console: The NT team decided to build a “Console” which differentiated itself from a traditional *NIX terminal in a couple of key areas:
- Console API: Rather than relying on programmers’ ability to generate “difficult to verify” ANSI/VT-sequences, Windows Console can be manipulated and controlled via a rich Console API
- Common services: To avoid having every Command-Line shell re-implement the same services time and again (e.g. Command History, Command Aliasing), the Console itself provides some of these services, accessible via the Console API
Problems with the Windows Console
While the Console’s API has proven very popular in the world of Windows Command-Line tools and services, the API-centric model presents some challenges for Command-Line scenarios:
Windows’ Command-Line & cross-platform interop
Many Windows Command-Line tools and apps make extensive use of the Console API.
The problem? These APIs only work on Windows. Thus, combined with other differentiating factors (e.g. process lifecycle differences, etc.), Windows Command-Line apps are not always easily-portable to *NIX, and vice-versa.
Because of this, the Windows ecosystem has developed its own, often similar, but usually different Command-Line tools and apps. This means that users have to learn one set of Command-Line apps and tools, shells, scripting languages, etc. when using Windows, and another when using *NIX.
There is no simple quick-fix for this issue: The Windows Console and Command-Line cannot simply be thrown away and replaced by bash and iTerm2 because there are hundreds of millions of apps, scripts, and tools that depend upon the Windows Console and Cmd/PowerShell shells, many of which are launched billions of times a day on Windows PC’s and Servers around the globe.
So, what’s the solution here? How do developers run command-line tools, compilers, platforms, etc. originally built primarily on/for *NIX based platforms?
3rd party tools like MinGW/MSYS and Cygwin do a great job of porting many of the core GNU tools and compatibility libraries to Windows, but they are not able to run un-ported, unmodified Linux binaries. This turns out to be an essential requirement, becuase many Ruby, Python, Node, etc. packages and modules depend-upon Linux behaviors and/or or “wrap” Linux binaries.
These reasons led Microsoft to enable genuine, unmodified Linux binaries and tools to run natively on Windows’ Subsystem for Linux (WSL).
Using WSL, users can now download and install one or more genuine Linux distros side-by-side on the same machine, and use each distros’ or tools’ package manager (e.g. apt, zypper, npm, gem, etc.) to install and run the vast majority of Linux Command-Line tools, packages, and modules alongside their favorite Windows apps and tools. To learn more about WSL, visit the WSL Learning Page, or the official WSL documentation.
Also, there are still some things that Console offers that haven’t been adopted by non-Microsoft terminals: Specifically, the Windows Console provides command-history and command-alias services, which aimed to eliminate the need for every command-line shells (in particular) to re-re-re-implement the same functionality. We’ll return to this subject in the future.
Remoting Windows’ Command-Line is difficult
As we discussed in the Command-Line Backgrounder post, Terminals were originally separate from the computer to which they were attached. Fast-forward to today, this design remains: Most modern terminals and Command-Line apps/shells/etc. are separated by processes and/or machine boundaries.
On *NIX-based platforms, the notion that terminals and command-line applications are separate and simply exchange characters, has resulted in *NIX Command-Lines being easy to access and operate from a remote computer/device: As long as a terminal and a Command-Line application can exchange streams of characters via a some type of ordered serial communications infrastructure (TTY/PTY/etc.), it is pretty trivial to remotely operate a *NIX machine’s Command-Line.
On Windows however, many Command-Line applications depend on calling Console API’s, and assume that they’re running on the same machine as the Console itself. This makes it difficult to remotely operate Windows Command-Line shells/tools/etc.: How does a Command-Line application running on a remote machine call API’s on the user’s local machine’s Console? And worse, how does the remote Command-Line app call Console API’s if its being accessed via a terminal on a Mac or Linux box?!
Sorry to tease, but we’ll return to this subject in much more detail in a future post!
Launching the Console … or not!
Generally, on *NIX based systems, when a user wants to launch a Command-Line tool, they first launch a Terminal. The Terminal then starts a default shell, or can be configured to launch a specific app/tool. The Terminal and Command-Line app communicate by exchanging streams of characters via a Pseudo TTY (PTY) until one or both are terminated.
On Windows, however, things work differently: Windows users never launch the Console (conhost.exe) itself: Users launch Command-Line shells and apps, not the Console itself!
Yes, in Windows, users launch the Command-Line app, NOT the Console itself. If a user launches a Command-Line app from an existing Command-Line shell, Windows will (usually) attach the newly launched Command-Line .exe to the current Console. Otherwise, Windows will spin up a new Console instance and attach it to the newly launched app.
Because users run
PowerShell.exe and see a Console window appear, they labor under the common misunderstanding that Cmd and PowerShell are, themselves, “Consoles” … they’re not! Cmd.exe and PowerShell.exe are “headless” Command-Line applications that need to be attached to a Console (
conhost.exe) instance from which they receive user input and to which they emit text output to be displayed to the user.
Also, many people say “Command-Line apps run in the Console”. This is misleading and contributes additional confusion about how Consoles and Command-Line apps actually work!
Please help correct this misconception if you hear it by pointing out that “Command-Line tools/apps run connected to a Console” (or similar). Thanks! 😃
Okay, so, Windows Command-Line apps run in their own processes, connected to a Console instance running in a separate process. This is just like in *NIX where Command-Line applications run connected to Terminal apps. Sounds good, right? Well … no; there are some problems here because Console does things a little differently:
- Console and Command-Line app communicate via IOCTL messages through the driver, not via text streams (as in *NIX)
- Windows mandates that ConHost.exe is the Console app which is connected to Command-Line apps
- Windows controls the creation of the communication “pipes” via which the Console and Command-Line app communicate
These are significant limitations, especially the latter point. Why? What if you wanted to create an alternate Console app for Windows? How would you send keyboard/mouse/pen/etc. user actions to the Command-Line app if you couldn’t access the communications “pipes” connecting your new Console to the Command-Line app?
Alas, the story here is not a good one: There ARE some great 3rd party Consoles (and server apps) for Windows (e.g. ConEmu/Cmder, Console2/ConsoleZ, Hyper, Visual Studio Code, OpenSSH, etc.), but they have to jump through extraordinary hoops to act like a normal Console would.
For example, 3rd party Consoles have to launch a Command-Line app off-screen at, for example, (-32000,-32000). They then have to send keystrokes to the off-screen Console, and screen-scrape the off-screen Console’s text contents and re-draw them on their own UI! I know, crazy, right?! It’s a testament to the ingenuity and determination of the creators of these apps that they even work at all.
This is clearly a situation we are keen to remedy. Stay tuned for more info on this part of the story too – there’s some good news on the way.
Windows Console & VT
As discussed above, Windows Console provides a rich API. Using the Console API, Command-Line apps and tools write text, change text colors, move the cursor, etc. And, because of the Console API, Windows Console had little need to support ANSI/VT sequences that provide very similar functionality on other platforms. In fact, until Windows 10, Windows Console only implemented the bare minimum support for ANSI/VT sequences:
This all started to change in 2014, when Microsoft formed a new Windows Console team dedicated to untangling and improving the Console & Windows’ Command-Line infrastructure.
One of the new Console team’s highest priorities was to implement comprehensive support for ANSI/VT sequences in order to render the output of *NIX applications running on Windows Subsystem for Linux (WSL), and on remote *NIX machines. You can read a little more about this story in the previous post in this series.
The Console team added comprehensive support for ANSI/VT sequences to Windows 10’s Console, enabling users to use and enjoy a huge array of Windows and Linux Command-Line tools and apps. The team continues to improve and refine Console’s VT support with each OS release, and are grateful for any issues you file on our GitHub issues tracker 😉
A quick Unicode refresher: Unicode or ISO/IEC 10646 is an international standard defining every character/glyph used in almost every writing system on Earth, plus many non-script symbols and character-sized images (e.g. emoji) in use today. At present (July 2018), Unicode 11 defines 137439 characters, across 146 modern and historic scripts! Unicode also defines several character encodings, including UTF-8, UTF-16, and UTF-32:
- UTF-8: 1-byte for the first 127 code points (maintaining compatibility with ASCII), and an optional additional 1-3 bytes (4 bytes total) for other characters
- UTF-16/UCS-2: 2-bytes for each character. UCS-2 (used internally by Windows) supports encoding the first 65536 code points (know as the Basic Multilingual Plane – BMP). UTF-16 extends UCS-2 by incorporating a 4-byte encoding for 17 additional planes of characters
- UTF-32: 4-bytes per character
The most popular encoding today, thanks to its efficient storage requirements, and widespread use in HTML pages, is UTF-8. UTF-16/UCS-2 are both common, though decreasingly so in stored documents (e.g. web pages, code, etc.). UTF-32 is rarely used due to its inefficient and considerable storage requirements. Great, so we have effective and efficient ways to represent and store Unicode characters!
Alas, the Windows Console and its API were created before Unicode was created. The Windows Console stores text (that is subsequently drawn on the screen) as UCS-2 characters requiring 2-bytes per cell. Command-Line apps write text to the Console using the Console API. Many Console APIs come in two flavors – functions with an
A suffix handle single-byte/character strings, and functions with a
W suffix handle 2-byte (wchar)/character strings: For example, the WriteConsoleOutputCharacter() function compiles down to
WriteConsoleOutputCharacterA() for ASCII projects, or
WriteConsoleOutputCharacterW() for Unicode projects. Code can specifically call
...W suffixed functions directly if specific handling is required.
However, while all W APIs support UCS-2, and some were updated to also support UTF-16, not all W APIs fully support UTF-16.
Also, Console doesn’t support some newer Unicode features including Zero Width Joiners (ZWJ) which are used to combine otherwise separate characters in, for example, Arabic and Indic scripts, and are even used to combine several emoji characters into one visual glyph like the “people” emoji, and ninjacats.
Worse still, the Console’s current text renderer can’t even draw these complex glyphs, even if the buffer could store them: Console currently uses GDI for text rendering, but GDI doesn’t adequately support font-fallback – a mechanism to dynamically find and load an alternative font that contains a glyph missing from the current font. Font-fallback is well supported by more modern text rendering engines like DirectWrite
So what happens if you wanted to write complex and conjoined glyphs onto the Console? Sadly, you can’t … yet, but this too is a post for another time.
So, where are we?
Once again, dear reader, if you’ve read everything above, thank you, and congratulations – you now know more about the Windows Console than most of your friends, and likely more than even you wanted to! Lucky you 😛
We’ve covered covered a lot of ground in this post:
- The major building-blocks of the Windows Console:
- Condrv.sys – the Console communication driver
- ConHost.exe – the Console UX, internals, and plumbing:
- API Server – serializes API calls and text data via IOCTL messages send to/from the driver
- API – the functionality of the Console
- Buffers – Input buffer storing user input, output buffer storing output/display text
- VT Parser – converts ANSI/VT sequences embedded in the text stream into API calls
- Console UX – the Console’s UI state, settings, features
- Other – Misc lifetime, security, etc.
- What the Console does
- Sends user input to the connected Command-Line app
- Receives and displays output from the connected Command-Line app
- How Console differs from *NIX terminals
- NIX: “Everything is a file/text-stream”
- Windows: “Everything is an object, accessible via an API”
- Console Problems
- Console and Command-Line apps communicate via API call requests and text serialized into IOCTL messages
- Only Windows command-line apps call the Console API
- More work to port Command-Line apps to/from Windows
- Apps call Windows API to interact with Console
- Makes remoting Windows Command-Line apps/tools difficult
- Dependence on IOCTLs breaks the “exchange of characters” terminal design Makes it difficult to operate remote Windows Command-Line tools from non-Windows machines
- Launching Windows Command-Line apps is “unusual”
- Only ConHost.exe can be attached to Command-Line apps
- 3rd party terminals forced to create off-screen Console and send-keys/screen-scrape to/from it
- Windows historically doesn’t understand ANSI/VT sequences
- Mostly remedied in Windows 10 😃
- Console has limited support for Unicode & currently struggles to deal with storing and rendering modern UTF-8 and characters requiring Zero Width Joiners
In the next few posts in this series, we’ll delve further into the Console, and discuss how we’re addressing these issues … and more! As always, stay tuned 😉 [Many thanks to my colleagues on the Console team for helping keep this post accurate and balanced – Michael, Mike, Dustin and Austin – y’all rock! 😃]