July 2nd, 2024
mind blownheart5 reactions

The history of Alt+number sequences, and why Alt+9731 sometimes gives you a heart and sometimes a snowman

Once upon a time, the IBM PC was released.

In the IBM PC BIOS, you could enter characters that weren’t present on the keyboard by holding the Alt key and typing the decimal value on the numeric keypad. For example, you could enter ñ by holding Alt and typing Numpad1 Numpad6 Numpad4, then releasing the Alt key.

For expository simplicity, I will henceforth use the notation Alt+164 to indicate that you press the Alt key, then type the specified digits in sequence on the numeric keypad, then release the Alt key.

Okay, so in the IBM PC BIOS, when you typed Alt+…, the code numbers were treated as decimal byte values, and the result on the screen came from your video card’s character generator. In the United States, the character generator’s ROM showed what we today call Code Page 437.

When it was introduced, Windows in the United States used Code Page 1252 as its 8-bit character set, which it called the “ANSI character set”; the old BIOS character set was retroactively named the OEM character set. To preserve compatibility with MS-DOS, if you used the Alt key in conjunction with the numeric keypad, the number you typed was still looked up in OEM character set, so that your muscle-memory code numbers still worked. You could still type Alt+164 to get your ñ, even though the code number for ñ in Code Page 1252 is 241, not 164.

If you wanted to type a character that had no OEM equivalent, you could prefix a numeric keypad 0 to indicate that you wanted the value looked up in the ANSI code page. Therefore, you could type Alt+0169 to get a ©, which did not exist in the OEM code page. You could also type Alt+0241 to get your precious ñ, using the ANSI code point number rather than the OEM code point number.

If you entered a number larger than 255, both Windows and the IBM PC BIOS took your value mod 256, so typing Alt+259 was the same as typing Alt+3. Both gave you OEM code point 3, which for Code Page 437 is a heart ♥.

If you ask the Internet how to type some of these non-ASCII characters on Windows, you may see people (and large language models) that tell you to type, say, Alt+9731 to get a Unicode snowman ☃. Unfortunately, from what we’ve learned above, this doesn’t work. You instead get the OEM character whose value is 9731 mod 256 = 3, or the aforementioned heart ♥.

A customer reported that a recent Windows update broke their ability to type a snowman by using Alt+9731. We explained that the update was not at fault; rather, Alt+9731 was never supposed to produce a snowman at all! But the customer insisted that it used to work.

A closer investigation of the issue revealed the reason.

You see, while it’s true that the Alt+… decimal value is taken mod 256, that is just the default behavior of the Windows input system. But some controls (most notably the RichEdit control) override the default handling of the Alt+… sequence and parse out the decimal value mod 65536 rather than mod 256.

This means that whether the Alt+… value is taken mod 256 depends on what kind of control you are typing into.

By default, the value is taken mod 256, and Alt+9731 gives you a heart.

But if you happen to be using a RichEdit control, then the Alt+… value is taken mod 65536, and Alt+9731 gives you a snowman.

(I don’t know of anybody who takes the value mod 2097151, to support direct entry of code points outside the Basic Multilingual Plane.)

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

22 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :
  • alan robinson · Edited

    seems like it's time to unify and just remove the modulo part of the logic. After all entering a larger value that relies on the wrap-around might occasionally be part of somebody's muscle memory, but it's probably rare. And having access to the entire codepage/unicode seems like a win.

    given the discussion of alternate, more mnemonic methods, it probably won't matter much either way. I can't recall the last time I used alt+ to enter a character, at least in windows. But it was probably to make a non-printing character alternative to the space (' '), because MSDOS. That's alt+255....

    Read more
  • George Wilson

    Great article. I never knew that the value rolled over like that. I used to memorize the keys that I wanted to use but didn’t really know the history or how it all worked.

  • Bwmat

    Why was support for taking the entered value mod _anything_ added? Why not treat it as an error (I’m thinking beeping the PC speaker, lol) if larger than the maximum supported number?

    • Simon Farnsworth

      The IBM PC BIOS didn't add code to take the value mod anything; rather, it didn't add code to detect that there was an overflow from the single byte it kept. The relevant chunk of BIOS code is:

      <code>

      This is the code that adds the typed value (in DI) to the byte stored in ALT_INPUT. There's two places for an overflow of the byte to occur - the "MUL AH" which multiplies the existing value by 10 (and can thus turn 26 into 260), and the ADD AX, DI which adds in the value you just typed (and can thus turn...

      Read more
  • Falcon

    One neat feature in Ubuntu and Linux Mint (not sure exactly which component is responsible for it) is the ability to use Ctrl+Shift+U to enter a hex Unicode code point.

    Comes in handy for inserting TAB characters in the text editor when it’s configured to use spaces instead of tabs (Makefiles, for instance, require TAB characters). Especially since it turns out that a single 9 without leading zeros is sufficient!

    • Holger Stenger

      Most IDEs and advanced text editors can set the indentation style depending on file type. The EditorConfig standard allows to define these settings in an editor-independent way. EditorConfig is supported a wide range of IDEs and text editors on different platforms. Here is a minimal example for your Makefile use case (put it in an .editorconfig file):

      <code>

      There is simply no need to solve the indentation style problem on the level of the keyboard layout. Still, it never hurts to know how to enter arbitrary Unicode code points. :)

      Read more
    • Tianyi Guan

      Most likely it came from an IME engine, either ibus or fcitx5 (both supports the ctrl+shift+u shortcut). Considering you’re using Ubuntu and Mint, which by default ships gnome[-related] desktop environments, it’s probably ibus. (gnome has builtin support for it).

    • Me Gusta

      If you enable it in the registry, Windows does allow you to press Alt + and then the codepoint hex sequence, and the + is a key you have to press.
      I always wonder why this isn’t enabled by default.