Once upon a time, the IBM PC was released.
In the IBM PC BIOS, you could enter characters that weren’t present on the keyboard by holding the Alt key and typing the decimal value on the numeric keypad. For example, you could enter ñ by holding Alt and typing Numpad1 Numpad6 Numpad4, then releasing the Alt key.
For expository simplicity, I will henceforth use the notation Alt+164 to indicate that you press the Alt key, then type the specified digits in sequence on the numeric keypad, then release the Alt key.
Okay, so in the IBM PC BIOS, when you typed Alt+…, the code numbers were treated as decimal byte values, and the result on the screen came from your video card’s character generator. In the United States, the character generator’s ROM showed what we today call Code Page 437.
When it was introduced, Windows in the United States used Code Page 1252 as its 8-bit character set, which it called the “ANSI character set”; the old BIOS character set was retroactively named the OEM character set. To preserve compatibility with MS-DOS, if you used the Alt key in conjunction with the numeric keypad, the number you typed was still looked up in OEM character set, so that your muscle-memory code numbers still worked. You could still type Alt+164 to get your ñ, even though the code number for ñ in Code Page 1252 is 241, not 164.
If you wanted to type a character that had no OEM equivalent, you could prefix a numeric keypad 0 to indicate that you wanted the value looked up in the ANSI code page. Therefore, you could type Alt+0169 to get a ©, which did not exist in the OEM code page. You could also type Alt+0241 to get your precious ñ, using the ANSI code point number rather than the OEM code point number.
If you entered a number larger than 255, both Windows and the IBM PC BIOS took your value mod 256, so typing Alt+259 was the same as typing Alt+3. Both gave you OEM code point 3, which for Code Page 437 is a heart ♥.
If you ask the Internet how to type some of these non-ASCII characters on Windows, you may see people (and large language models) that tell you to type, say, Alt+9731 to get a Unicode snowman ☃. Unfortunately, from what we’ve learned above, this doesn’t work. You instead get the OEM character whose value is 9731 mod 256 = 3, or the aforementioned heart ♥.
A customer reported that a recent Windows update broke their ability to type a snowman by using Alt+9731. We explained that the update was not at fault; rather, Alt+9731 was never supposed to produce a snowman at all! But the customer insisted that it used to work.
A closer investigation of the issue revealed the reason.
You see, while it’s true that the Alt+… decimal value is taken mod 256, that is just the default behavior of the Windows input system. But some controls (most notably the RichEdit control) override the default handling of the Alt+… sequence and parse out the decimal value mod 65536 rather than mod 256.
This means that whether the Alt+… value is taken mod 256 depends on what kind of control you are typing into.
By default, the value is taken mod 256, and Alt+9731 gives you a heart.
But if you happen to be using a RichEdit control, then the Alt+… value is taken mod 65536, and Alt+9731 gives you a snowman.
(I don’t know of anybody who takes the value mod 2097151, to support direct entry of code points outside the Basic Multilingual Plane.)
seems like it’s time to unify and just remove the modulo part of the logic. After all entering a larger value that relies on the wrap-around might occasionally be part of somebody’s muscle memory, but it’s probably rare. And having access to the entire codepage/unicode seems like a win.
given the discussion of alternate, more mnemonic methods, it probably won’t matter much either way. I can’t recall the last time I used alt+ to enter a character, at least in windows. But it was probably to make a non-printing character alternative to the space (‘ ‘), because MSDOS. That’s alt+255. Good thing it was modulo 256 and not 255!
Great article. I never knew that the value rolled over like that. I used to memorize the keys that I wanted to use but didn’t really know the history or how it all worked.
Why was support for taking the entered value mod _anything_ added? Why not treat it as an error (I’m thinking beeping the PC speaker, lol) if larger than the maximum supported number?
The IBM PC BIOS didn’t add code to take the value mod anything; rather, it didn’t add code to detect that there was an overflow from the single byte it kept. The relevant chunk of BIOS code is:
This is the code that adds the typed value (in DI) to the byte stored in ALT_INPUT. There’s two places for an overflow of the byte to occur – the “MUL AH” which multiplies the existing value by 10 (and can thus turn 26 into 260), and the ADD AX, DI which adds in the value you just typed (and can thus turn 250 into 256, 257, 258 or 259). You’d have to add extra code to this routine to detect either of those overflow cases and treat it as an error; otherwise, when you do “MOV ALT_INPUT, AL” at the end, the value in AX is taken mod 256 automatically, by the definition of the AL register.
And this is for a constrained system; the BIOS on the original IBM PC is in an 8 KiB ROM chip, in the days when ROM wasn’t cheap. Adding extra instructions to check for error and beep instead of taking the input wouldn’t be seen as a good use of ROM space for an advanced user feature – since it’s likely to either push something else out of the BIOS (and if so, what), or increase the price of the BIOS ROM in a “cheap” system.
One neat feature in Ubuntu and Linux Mint (not sure exactly which component is responsible for it) is the ability to use Ctrl+Shift+U to enter a hex Unicode code point.
Comes in handy for inserting TAB characters in the text editor when it’s configured to use spaces instead of tabs (Makefiles, for instance, require TAB characters). Especially since it turns out that a single 9 without leading zeros is sufficient!
Most IDEs and advanced text editors can set the indentation style depending on file type. The EditorConfig standard allows to define these settings in an editor-independent way. EditorConfig is supported a wide range of IDEs and text editors on different platforms. Here is a minimal example for your Makefile use case (put it in an .editorconfig file):
There is simply no need to solve the indentation style problem on the level of the keyboard layout. Still, it never hurts to know how to enter arbitrary Unicode code points. 🙂
Most likely it came from an IME engine, either ibus or fcitx5 (both supports the ctrl+shift+u shortcut). Considering you’re using Ubuntu and Mint, which by default ships gnome[-related] desktop environments, it’s probably ibus. (gnome has builtin support for it).
If you enable it in the registry, Windows does allow you to press Alt + and then the codepoint hex sequence, and the + is a key you have to press.
I always wonder why this isn’t enabled by default.
I was recently helping my daughter with her Spanish homework in Word. She asked me how to type an accented “e” and I did what always used to work with Alt-num sequences only to find out it no longer works. Not only do a lot of modern laptops not provide NumLock anymore, but Word interprets Alt+0 and other numbers as its own shortcuts. I guess the modern world has no use for quicker ways of doing things with the keyboard. I can barely get my kids to learn to type:).
You can use WinCompose to easily type almost any character :
https://github.com/samhocevar/wincompose/releases/
I haven’t tried that one before, thanks. Additionally, Microsoft has “Quick Accent” as part of their PowerToys.
It would be nice if Windows had an option like smartphone keyboards do, where you could just hold down a key and then use the arrow keys to select which accented version you wanted. Currently, if you hold down a key, it just keeps typing that character repeatedly until you release it. Does anyone actually need to do that on a regular basis?
Luckily, Word 2003 (as well as the rest of Office products) is still a thing and it just works, and doesn’t have the stupid Gibbon interface. Those brand new ZIP-files get converted with MS-provided converter quite well, and if some document doesn’t, that’s the other party’s problem: any decent document should not use stuff that causes the converter to fail.
for me they unfortunately explode as soon as they display the open or save as dialog
thankfully I can use Libre and it handles scaling much better than old Office did but the speed, the speed! I’m pretty sure Word 2003 takes less time to launch than modern win11 notepad does
Actually, modern tools (be they laptops or keyboards or Windows) de-emphasized the utterly non-mnemonic, primitive, and clumsy alt sequences for a darn good reason. And replaced them with something far easier and better. Because the need for everyone everywhere to enter international characters is only growing as the world gets smaller and more multilingual.
If you are a) an American, and b) need to type characters not commonly found on a US hardware keyboard, the correct answer is to enable the English United States International keyboard. It’s in Settings under Keyboards. Once enabled, you can toggle back and forth between the domestic and international versions of the keyboard with a single chord: [Windows]+[Spacebar].
With the keyboard in international mode, typing ñ is the far _more_ mnemonic ~n. The various other diacritics are equally simple. Type a : for umlaut, a `, ^ or ‘ for the three kinds of accent, then the letter to be modified and suddenly Roberto es tu tío. I have no doubt your daughter will find ~n far more memorable and easy than she will find alt+194.
I wish keyboards had an “International Lock” key to toggle between standard English US and English US International keyboards.
I need to use International occasionally, but if I leave it on all the time, it gets in the way. English does on a few rare occasions use umlauts to indicate that two consecutive vowels make two separate sounds (e.g., “naïve”), but most of these words have just dropped the umlaut altogether.
Doesn’t ctrl+shift do it for you? That should cycle through different layouts for the same language. I use that to shift between UK and UKX on my system all the time.
Then again, on UK Extended, most of the dead keys are enabled using altgr, where the back tick is the only one that is a dead key by default.
I’m sure you can find similar layouts for other languages.
There is the UK Extended layout as another example of this.
My Linux desktop does something similar… right-alt, ‘n’, ‘~’ will give ‘ñ’. I prefer that version myself — letter before accent — but that’s no doubt as much because I’m used to it.
Doesn’t it accept them either way around? (Not sure whether that applies to all Compose key combinations though.)
Huh, so it does. Never noticed that.
Office has also supported similar mnemonic shortcuts “forever” without switching keyboards. “Modern Notepad” also supports them.
https://support.microsoft.com/en-us/office/keyboard-shortcuts-to-add-language-accent-marks-in-word-and-outlook-3801b103-6a8d-42a5-b8ba-fdc3774cfc76
(Though it may be left shift / control only?)
That’s a super handy link – thanks!