Windows 11 Notepad

Murray Sargent

The new Windows 11 Notepad uses RichEdit and runs on up-to-date Windows 11 installations. In addition to a Windows 11 look with rounded corners and a dark-theme option, the new Notepad includes several standard RichEdit editing enhancements, such as Alt+x for entering Unicode characters, Ctrl+} for toggling between matching brackets/parentheses, multilevel undo, drag & drop, color emoji, and autoURL detection. You might guess that using a RichEdit plain-text control in Notepad would be a slam dunk. RichEdit has had plain-text controls ever since Office 97 (last century!) and they’ve been used myriad times. But those plain-text controls have been small and typically exist in dialog boxes. Notepad is often used to view large files, so high performance is important, and lines can be crazy long. And classic Notepad has been improved in various ways, such as better performance, line-ending detection (CR, LF, CRLF), and a “Show Unicode control characters” context-menu option. Accordingly, it’s taken significant effort to use RichEdit as the new Notepad’s editing engine. This post describes some additions and implementation details.

Additions to RichEdit

The classic Notepad has two handy features that weren’t implemented in RichEdit: line-ending detection (CR, LF, CRLF) and the “Show Unicode control characters” mode (discussed next). For years Notepad didn’t break Unix-convention lines that terminated with a LF (U+000A) instead of a CRLF (U+000D U+000A). I used to open the Unicode Character Data files, which contain LF-terminated lines, with WordPad and save them to convert the LF’s to CRLF’s so that Notepad would display them correctly. To fix this problem, Notepad went one better: it checked to see which line ending came first and then made that line ending the default for the file. So, a file with LF- terminated lines remains LF terminated and displayed correctly. Internally RichEdit follows the lead of Word and the Mac in terminating paragraphs with a CR and converting LF’s and CRLF’s to CR when reading in a file or storing text via an API like WM_SETTEXT or ITextRange2::SetText2. This is still the case, but you can tell RichEdit to recognize the kind of line termination in a file and use that choice for saving/copying the file by sending the EM_SETENDOFLINE message with wparam = EC_ENDOFLINE_DETECTFROMCONTENT.

Show Unicode control characters mode and emoji

Notepad has had a “Show Unicode control characters” option in its context menu for many years. This mode displays Bidi zero-width control characters using distinctive “zero-width” glyphs. This is very valuable, for example, in revealing the Bidi RLO (U+202E) and LRO (U+202D) codes that override the usual character directionalities and are sometimes used to spoof files for nefarious purposes. It also displays the zero-width joiner (ZWJ—U+200D) with a “zero-width” vertical line topped by an x. But inside emoji ZWJ sequences, such as family emojis, the mode doesn’t break the sequence apart at the ZWJ’s and doesn’t reveal the ZWJ’s by the zero-width ZWJ glyph. And classic Notepad doesn’t display ZWJ sequences and emoji in general in color.

In the new Notepad “Show Unicode control characters” mode, ZWJ sequences are broken apart at the ZWJ’s and the ZWJ’s are displayed by the ZWJ zero-width glyph. You can navigate inside the ZWJ sequence using the ← and → keys and type Alt+x to see the codes of the characters comprising the ZWJ sequence. This lets you figure out how a ZWJ sequence is constructed. For example, the new mode displays the family emoji ZWJ sequence👨‍❤️‍👩given by the codes U+1F468 ZWJ U+2764 U+FE0F ZWJ U+1F469 as

Image FamilyEmoji

Find/Replace dialog drop down

Visual Studio Code has a nifty Find/Replace dialog that drops down into the upper right of the text area. In case the dialog overlaps the starting text, the user can drag the text down just under the bottom of the dialog. The new Notepad mimics this behavior. It was a bit tricky to get RichEdit to provide the associated functionality. In rich-text formatting, the paragraph space-before and space-after properties are used to add spacing between paragraphs. Since RichEdit is a rich-text editor, it supports these properties, and it was natural to implement the drop-down space as “document space before”. The space-before value is included in the ascent of the first line in the document. The tricks came in dealing with deleting or replacing the first line and in scrolling the display correctly with a nonzero document-space-before value.

Plain-text UI improvements

We decided to match the Visual-Studio UI for selecting and not selecting the EOP character at the end of a line. This differs from Word’s UI, which tends to auto select the EOP character if you navigate next to it. Specifically, in plain-text controls, we don’t let the mouse extend the selection to include the EOP on a line or let Shift+End select the EOP. This corresponds to what gets deleted if you hit the Delete key after selecting the text. You can still select the EOP character by using Shift+→ and by extending the selection to the next line. Also, if word wrap is turned off, the insertion-point caret now follows any spaces you enter instead of ignoring the spaces.

Some implementation details

The Windows 11 Notepad uses a window for its editing canvas and windows generally use GDI for displaying text and images. GDI doesn’t have functions to display color fonts in color, whereas DirectWrite does. To be able to use DirectWrite for color emoji and other enhancements, the new Notepad therefore creates a RichEDitD2DPT window, which uses DirectWrite for text and GDI for OLE objects (Notepad doesn’t insert OLE objects).

The RichEdit build used in Notepad comes from the same sources as the RichEdit that’s loaded with Microsoft 365 applications like Word, PowerPoint, Excel, and OneNote. It’s not the Windows RichEdit in msftedit.dll. Consequently, Notepad has the latest RichEdit improvements.

We’ve fixed bugs that didn’t show up for RichEdit plain-text controls over the years partly because before Notepad, the plain-text instances have been small.

Notepad uses RichEdit classic font binding instead of the IProvideFontInfo font binding used in XAML text controls and in RichEdit controls appearing in Microsoft 365 applications. Notepad doesn’t want to load the mso libraries used in the latter since these libraries are quite large. The classic font binding has been improved but needs to add support for more scripts.

We improved RichEdit’s performance for large ASCII files such as those for core dumps. One feature that can slow down reading in a large file is autoURL detection. While reading in plain text, LF and CRLF are translated to CR for internal use and in the process the text is checked for the combination “:/”. If that combination isn’t found and only AURL_ENABLEURL is enabled, autoURL detection is bypassed.

Future

Imagine things that can be added given the power of RichEdit. RichEdit plain-text controls have only one paragraph format, but they can have considerable character formatting. The latter is needed because 1) Unicode has over 144,000 characters and a single font is limited to 65535 glyphs, and 2) input method editors (IMEs) and spell checkers require underlines and/or text coloring. So, although the user interface only exposes one character format, the TOM object model gives access to many more properties (see ITextFont2). Accordingly, it would be possible to offer program code syntax highlighting used, for example, in Visual Studio and Visual Studio Code. But probably that should be the domain of compiler interactive development environments. Another option could be to display HTML, XML, JSON, and RTF files with indentation and toggle between XML/HTML start and end tags like Ctrl+} does for bracketed expressions, e.g., in JSON and RTF files. Large file performance needs more improvement. Please feel free to comment about bugs and wishes!