You can get and set text from/into RichEdit in a variety of formats including RTF, HTML, MathML, OMML, UnicodeMath, Nemeth Braille, and speech. This post documents RichEdit options for a general way to access text using ITextRange2::SetText2(options, bstr) and ITextRange2::GetText2(options, pbstr). As such, this post is for programmers. All options work in the current Microsoft Office RichEdit (riched20.dll in an Office subdirectory) and many work in the Windows RichEdit (msftedit.dll). The options are defined in the following table in which s/g stands for SetText2/GetText2, respectively.
Option | Value | s/g | Meaning |
tomUnicodeBiDi | 0x00000001 | s | Use Unicode BiDi algorithm for inserted text |
tomAdjustCRLF | 0x00000001 | g | If range start is inside multicode unit like CRLF, surrogate pair, etc., move to start of unit |
tomUseCRLF | 0x00000002 | g | Paragraph ends use CRLF (U+000D U+000A) |
tomTextize | 0x00000004 | g | Embedded objects export alt text; else U+FFFC |
tomAllowFinalEOP | 0x00000008 | g | If range includes final EOP, export it; else don’t |
tomUnlink | 0x00000008 | s | Disables link attributes if present |
tomUnhide | 0x00000010 | s | Disables hidden attribute if present |
tomFoldMathAlpha | 0x00000010 | g | Replace math alphanumerics with ASCII/Greek |
tomIncludeNumbering | 0x00000040 | g | Lists include bullets/numbering |
tomCheckTextLimit | 0x00000020 | s | Only insert up to text limit |
tomDontSelectText | 0x00000040 | s | After insertion, call Collapse(tomEnd) |
tomTranslateTableCell | 0x00000080 | g | Export spaces for table delimiters |
tomNoMathZoneBrackets | 0x00000100 | g | Used with tomConvertUnicodeMath and tomConvertTeX. Set discards math zone brackets |
tomLanguageTag | 0x00001000 | s/g | Sets BCP-47 language tag for range; gets tag |
tomConvertRTF | 0x00002000 | s/g | Set or get RTF |
tomGetTextForSpell | 0x00008000 | g | Export spaces for hidden/math text, table delims |
tomConvertMathML | 0x00010000 | s/g | Set or get MathML |
tomGetUtf16 | 0x00020000 | g | Causes tomConvertRTF, etc. to get UTF-16. SetText2 accepts 8-bit or 16-bit RTF |
tomConvertLinearFormat | 0x00040000 | s/g | Alias for tomConvertUnicodeMath |
tomConvertUnicodeMath | 0x00040000 | s/g | UnicodeMath |
tomConvertOMML | 0x00080000 | s/g | Office MathML |
tomConvertMask | 0x00F00000 | s/g | Mask for mutually exclusive modes |
tomConvertRuby | 0x00100000 | s | See section below on Entering Ruby Text |
tomConvertTeX | 0x00200000 | s/g | See LaTeX Math in Office |
tomConvertMathSpeech | 0x00300000 | g | Math speech (English only here) |
tomConvertSpeechTokens | 0x00400000 | g | Simple Unicode and speech tokens |
tomConvertNemeth | 0x00500000 | s/g | Nemeth math braille in U+2800 block |
tomConvertNemethAscii | 0x00600000 | g | Corresponding ASCII braille |
tomConvertNemethNoItalic | 0x00700000 | g | Nemeth braille in U+2800 block w/o math italic |
tomConvertNemethDefinition | 0x00800000 | g | Fine-grained speech in braille |
tomConvertHtml | 0x00900000 | s/g | Convert HTML |
tomConvertEnclose | 0x00A00000 | s | See section below on Entering Enclosed Text |
tomConvertCRtoLF | 0x01000000 | g | Plain-text paragraphs end with LF, not CRLF |
tomLaTeXDelim | 0x02000000 | g | Use LaTeX math-zone delimiters \(…\) inline, \[…\] display; else $…$, $$…$$. Set handles all |
tomGhostText | 0x04000000 | s | Set ghost text (used for text prediction) |
tomNoGhostText | 0x04000000 | g | Get text without ghost text |
Mutually exclusive options
Nonzero values within the mask defined by tomConvertMask (0x00F00000) are mutually exclusive, that is, they cannot be combined (OR’d) with one another. The options UnicodeMath, [La]TeX (tomConvertTeX), and Nemeth math braille (tomConvertNemeth) are also mutually exclusive. You can set only one at a time. But other options can be OR’d in if desired.
Nemeth math braille options
A string of Nemeth math braille codes in the Unicode range U+2800..U+283F can be inserted and built up by calling ITextRange2::SetText2(tomConvertNemeth, bstr). If the string is valid, you can get it back in any of the math formats including Nemeth math braille. For example, if you insert the string
⠹⠂⠌⠆⠨⠏⠼⠮⠰⠴⠘⠆⠨⠏⠐⠹⠨⠈⠈⠙⠨⠹⠌⠁⠬⠃⠀⠎⠊⠝⠀⠨⠹⠼⠀⠨⠅⠀⠹⠂⠌⠜⠁⠘⠆⠐⠤⠃⠘⠆⠐⠻⠼
you see
You can also input braille with a standard keyboard by typing a control word \braille assigned to the Unicode character U+24B7 (Ⓑ). (See LaTeX Math in Office for how to add commands to math autocorrect). The \braille command causes math input to accept braille input via a regular keyboard using the braille ASCII codes sometimes referred to as North American Braille Computer Codes. The character ~ (U+007E) disables this input mode. These braille codes are described in the post Nemeth Braille—the first math linear format and can be input using refreshable braille displays. Alternatively, such input can be automated by calling ITextSelection::TypeText(bstr). Just as in entering UnicodeMath, the equations build up on screen as soon as the math braille input becomes unambiguous. The implementation includes the math braille UI that cues the user where the insertion point is for unambiguous editing of math zones using braille. Note that as of this posting, the math braille facility isn’t hooked up to Narrator or other screen readers.
Getting (and Setting) Math Speech
The tomConvertMathSpeech currently only gets math speech in English. Microsoft Office apps like Word, PowerPoint and OneNote deliver math speech in over 18 languages to the assistive technology (AT) program Narrator via the UIA ITextRangeProvider::GetText() function. Other ATs could also get math speech this way, although they usually get MathML and generate speech from that. Dictating (setting) math speech would be nice for both blind and sighted folks. Imagine, you can say 𝑎² + 𝑏² = 𝑐² faster than you can type it or write it! The SetText2(tomConvertMathSpeech, bstr) is ready to handle such input, but the feature is not available yet.
Entering ruby text
In a nonmath context, the option, tomConvertRuby (0x00100000), can be used to convert strings like “{…|…}” to ruby inline objects, where the first ellipsis represents the ruby text and the second ellipsis the base text. The ASCII curly braces and vertical bar are translated to the internal ruby-object structure characters U+FDD1, U+FDEF, and U+FDEE, respectively. Alternatively, the string can contain those structure characters directly. If a digit follows the start delimiter (‘{‘ or U+FDD1}, the digit defines the ruby options
rubyAlign val | Meaning |
center (0) | Center <ruby> with respect to <base> |
distributeLetter (1) | Distribute difference in space between longer and shorter text in the latter, evenly between each character |
distributeSpace (2) | Distribute difference in space between longer and shorter text in the latter using a ratio of 1:2:1 which corresponds to lead : inter-character : end |
left (3) | Align <ruby> with the left of <base> |
right (4) | Align <ruby> with the right of <base> |
If you add 5 to these values, the ruby object will display the ruby text below the base text instead of above it. For example, calling ITextRange2::SetText2(tomConvertRuby, bstr) with bstr containing the string “{1にほんご|日本語}” inserts
The string can contain text in addition to ruby objects and the ruby objects can be nested to create compound ruby objects such as
Entering enclosed text
The post Rounded Rectangles and Ellipses – Math in Office (microsoft.com) describes ways to enclose text in possibly rounded rectangles and ellipses. The SetText2(tomConvertEnclose, bstr) option is similar to the tomConvertRuby option. It converts strings like “{…}” to a tomEnclose object.
Other ways to get/set text
In addition to the ITextRange2::SetText2/GetText2(), the messages WM_SETTEXT, EM_SETTEXTEX, WM_GETTEXT, and EM_GETTEXTEX are useful. The set-text messages work with plain text or RTF in rich-text controls. EM_SETTEXTEX accepts both 16-bit RTF as well as 8-bit RTF, while WM_SETTEXT doesn’t handle 16-bit RTF.
Thank you for that list! It’s a life-saver for me 🙂 The RichEdit syntax can be confusing at times, especially for a German speaker like me. Thanks for the recap.
Best regards,
Helmut C. Gross
Florist Berlin, Deutschland