The post Math Accessibility Trees appeared first on Math in Office.

]]>More than one kind of tree is possible and this post compares two trees for the equation

Each tree node is labeled with its math text in UnicodeMath along with the type of node. UnicodeMath lends itself to being spoken especially if processed a bit to speak things like 𝑎² as “a squared” in the current natural language as described in Speaking of math…. The first kind of tree corresponds to the traditional math layout used in documents, while the second kind corresponds to the mathematical semantics. Accordingly we call the first kind a *display tree* and the second a *semantic tree*.

More specifically, the display tree corresponds to the way TeX and OfficeMath display mathematical text and approximates the way Presentation MathML represents mathematical text. Mathematical layout entities such as fractions, integrals, roots, subscripts and superscripts are represented by nodes in trees. Binary and relational operators that don’t require special typography other than appropriate spacing are included in text nodes. The display tree for the equation above is

└─Math zone └─ “1/2π ∫_0^2π ⅆ𝜃/(𝑎+𝑏 sin 𝜃) = 1/√(𝑎²−𝑏²)” ├─ “1/2π” fraction │ ├─ “1” numerator │ └─ “2π” denominator ├─ “∫_0^2π ⅆθ/(𝑎+𝑏 sin 𝜃)” integral │ ├─ “0” lower limit │ ├─ “2π” upper limit │ └─ “ⅆθ/(𝑎+𝑏 sin 𝜃)” integrand │ └─ “ⅆθ/(𝑎+𝑏 sin 𝜃)” fraction │ ├─ “ⅆθ” numerator │ └─ “𝑎+𝑏 sinθ” denominator │ ├─ “𝑎+𝑏” text │ └─ “sin𝜃” function apply │ ├─ “sin” function name │ └─ 𝜃” argument ├─ “=” text └─ “1/√(𝑎²−𝑏²)” fraction ├─ “1” numerator └─ “√(𝑎²−𝑏²)” denominator └─ “√(𝑎²−𝑏²)” radical ├─ “⬚” degree └─ “𝑎²−𝑏²” radicand ├─ “𝑎²” superscript │ ├─ “𝑎” base | └─ “2” script ├─ “−” text └─ “𝑏²” superscript ├─ “𝑏” base └─ “2” script

Note that the invisible times implicit between the leading fraction and the integral isn’t displayed and the expression 𝑎 + 𝑏 sin*θ* is displayed as a text node 𝑎 + 𝑏 followed by a function-apply node sin*θ*, without explicit nodes for the + and an implied invisible times.

To navigate through the 𝑎 + 𝑏 and into the fractions and integral, one can use the usual text left and right arrow keys or their braille equivalents. In OfficeMath, one can navigate through the whole equation with these arrow keys, but it’s helpful also to have coarser grained navigation keys to go between sibling nodes and up to parent nodes. For the sake of discussion, let’s suppose the tree navigation hot keys are those defined in the table

Ctrl+→ | Go to next sibling |

Ctrl+← | Go to previous sibling |

Home | Go to parent ahead of current child |

End | Go to parent after current child |

For example starting at the beginning of the equation, Ctrl+→ moves past the leading fraction to the integral, whereas → moves to the start of the numerator of the leading fraction. Starting at the beginning of the upper limit, Home goes to the insertion point between the leading fraction and the integral, while End goes to the insertion point in front of the equal sign. Ctrl+→ and Ctrl+← allow a user to scan an equation rapidly at any level in the hierarchy. After one of these hot keys is pressed, the linear format for the object at the new position can be spoken in a fashion quite similar to ClearSpeak. When the user finds a position of interest, s/he can use the usual input methods to delete and/or insert new math text.

Now consider the semantic tree, which allocates nodes to all binary and relational operators as well as to fractions, integrals, etc.

└─Math zone └─ “1/2𝜋 ∫_0^2𝜋 ⅆ𝜃/(𝑎+𝑏 sin𝜃)=1/√(𝑎²− 𝑏²)” └─ “=” text ├─ “⊠” implied times │ ├─ “1/2𝜋” fraction │ │ ├─ “1” numerator │ │ └─ “2π” denominator │ └─ “∫_0^2𝜋 ⅆ𝜃/(𝑎+𝑏 sin𝜃)” integral │ ├─ “0” lower limit │ ├─ “2π” upper limit │ └─ “ⅆ𝜃/(𝑎+𝑏 sin𝜃)” integrand │ └─ “ⅆ𝜃/(𝑎+𝑏 sin𝜃)” fraction │ ├─ “ⅆ𝜃” numerator │ │ └─ “⊠” implied times │ │ ├─ “ⅆ” text │ │ └─ “𝜃” text │ └─ “𝑎+𝑏 sin𝜃” denominator │ └─ “+” text │ ├─ “𝑎” text │ └─ “𝑏 sin𝜃” function apply │ └─ “⊠” implied times │ ├─ “𝑏” text │ └─ “sin𝜃” function │ └─ “” function apply │ ├─ “sin” function name │ └─ “𝜃” argument └─ “1/√(𝑎²− 𝑏²)” fraction ├─ “1” numerator └─ “√(𝑎²− 𝑏²)” denominator └─ “√(𝑎²− 𝑏²)” radical ├─ “⬚” degree └─ “𝑎²− 𝑏²” radicand └─ “−” text ├─ “𝑎²” superscript │ ├─ “𝑎” base │ └─ “2” script └─ “𝑏²” superscript ├─ “𝑏” base └─ “2” script

The semantic tree corresponds to Content MathML. It has drawbacks: 1) it’s bigger and requires more key strokes to navigate, 2) it doesn’t correspond to speech order, and 3) it requires a Polish-prefix mentality. Some people have developed such a mentality, perhaps having used HP calculators, and prefer it. But it’s definitely an acquired taste and it doesn’t correspond to the way that mathematics is conventionally displayed, edited, and spoken. Accordingly the first kind of tree seems significantly better for speech and editing, at least for the math encountered in grades K-12.

The choice for higher-level math is complicated by the fact that the usual meanings for superscripts, vertical bars, and other notation may be incorrect. For example, exponents are usually powers and it’s appropriate to speak 𝑎² as “a squared”. But in tensor analysis, exponents can be indices and saying them as powers is incorrect. One way around this is to say 𝑎² as “a superscript 2” or “a sup 2”, but it would be better to know the author’s intent and generate more descriptive speech. Another example is |𝑥|. In math up through calculus, this is the absolute value of 𝑥. However, in higher-level math it could mean the cardinality of the set 𝑥, or something else. In these cases and many others in advanced math, the semantic tree might reveal the author’s intent better than the display tree.

The MathML working group is studying ways to make Presentation MathML support accurate speech for ambiguous mathematical notations.

Both kinds of trees include nodes defined by the OMML entities listed in the following table along with the corresponding MathML entities

Built-up Office Math Object | OMML tag | MathML |

Accent |
acc | mover/munder |

Bar |
bar | mover/munder |

Box |
box | menclose (approx) |

BoxedFormula |
borderBox | menclose |

Delimiters |
d | mfenced or mrow with mo’s |

EquationArray |
eqArr | mtable (with alignment groups) |

Fraction |
f | mfrac |

FunctionApply |
func | mrow with &FunctionApply; |

LeftSubSup |
sPre | mmultiscripts (special case of) |

LowerLimit |
limLow | munder |

Matrix |
m | mtable |

Nary |
nary | mrow followed by msubsup w n-ary mo |

Phantom |
phant | mphantom and/or mpadded |

Radical |
rad | msqrt/mroot |

GroupChar |
groupChr | mover/munder |

Subscript |
sSub | msub |

SubSup |
sSubSup | msubsup |

Superscript |
sSup | msup |

UpperLimit |
limUpp | mover |

Ordinary text |
r | mtext |

MathML has additional nodes, some of which involve infix parsing to recognize, e.g., integrals. The OMML entities were defined for typographic reasons since they require special display handling. Interestingly the OMML entities also include useful semantics, such as identifying integrals and trigonometric functions without special parsing.

In summary, math zones can be made accessible using display trees for which the node contents are spoken in the localized linear format and navigation is accomplished using simple arrow keys, Ctrl arrow keys, and the Home and End keys, or their Braille equivalents. Arriving at any particular insertion point, the user can hear or feel the math text and can edit the text in standard ways.

The post Math Accessibility Trees appeared first on Math in Office.

]]>The post Some UnicodeMath Enhancements appeared first on Math in Office.

]]>With all three formats, the *n*-aryand, e.g., integrand or summand, may not be identified by surrounding delimiters. But OfficeMath and MathType have *n*-aryand arguments as described in the post Integrands, Summands, and Math Function Arguments. UnicodeMath has the binary operator U+2592 (▒) to treat the expression that follows the ▒ as the *n*-aryand (see Section 3.4 of UnicodeMath 3.1). In generalizing the conversion code for LaTeX and braille, it became clear that a space alone is adequate for starting *n*-aryands and we don’t need the ▒, which doesn’t look like mathematics. So, the converter now makes the first expression that follows the *n*-ary operator and limits into the *n*-aryand. For example, the integral

can be given by the UnicodeMath 1/2π ∫_0^2π ⅆθ/(a+b sin θ)=1/√(a^2-b^2) since the first expression that follows the ∫_0^2π is the fraction ⅆθ/(a+b sin θ). This works for many integrands. More complicated integrands are usually enclosed in brackets, braces, or parentheses.

A “bare” matrix, that is, one with no enclosing brackets can be entered by typing the TeX control word \matrix. In addition, there are five matrix constructs with enclosing brackets that can be entered as summarized in the following table in which … stands for the matrix contents.

LaTeX |
Char |
Code |
Form |

\matrix | ■ | U+25A0 | … |

\bmatrix | ⓢ | U+24E2 | […] |

\pmatrix | ⒨ | U+24A8 | (…) |

\vmatrix | ⒱ | U+24B1 | |…| |

\Bmatrix | Ⓢ | U+24C8 | {…} |

\Vmatrix | ⒩ | U+24A9 | ‖…‖ |

The UnicodeMath syntax for a parenthesized 2×2 matrix is \pmatrix(a&b@c&d), which builds up as

Sometimes you just want to enter a sample matrix quickly. If any of the six matrix control words are followed by a digit *d*, they insert a *d *× *d* identity matrix. For example, typing \pmatrix 3 enters

This is easier to type than \pmatrix(1&0&0@0&1&0@0&0&1), which displays the same identity matrix. Some of the matrix control words are missing in the default math autocorrect file. You can add them as described in the last section of this post.

This trigonometric expression is ambiguous: is it sin(𝑥²) or (sin 𝑥)²? Without the parentheses, the UnicodeMath for the former is “sin x^2” and for the latter is “sin x ^2”. In the latter, the space following the x builds up the sin x into a math function object and then the ^2 squares the object. But the results are very different formulas. The converter avoids the ambiguity by building up “sin x ^2” to be the same math function object as “sin^2 x”, that is, sin² 𝑥.

You can enter the common LaTeX expressions \frac{a}{b} and \binom{n}{m} in UnicodeMath input mode provided you have added math autocorrect entries to convert \frac to ⍁ (U+2341) and \binom to ⒝ (U+249D). To add math autocorrect entries, click on the lower-right box in the Equations/Conversions ribbon option to display the dialog box

Then click on the Math AutoCorrect… button to see and add math autocorrect entries. For example, to add \frac with U+2341, type as in the dialog box

And then enter Alt+x to convert the 2341 to ⍁. Probably when you type LaTeX in UnicodeMath input mode, a dialog ought to appear asking you if you’d like to switch to LaTeX input mode.

The post Some UnicodeMath Enhancements appeared first on Math in Office.

]]>The post RichEdit Emoticon Shortcuts appeared first on Math in Office.

]]>The build-in emoticon shortcuts are defined in the table

Type |
Get |
Unicode |

`%)` |
U+1F615 | |

`0:)` |
U+1F607 | |

`:'(` |
U+1F622 | |

`:')` |
U+1F602 | |

`:'-(` |
U+1F622 | |

`:'-)` |
U+1F602 | |

`:(` |
U+02639 | |

`:)` |
U+0263A | |

`:+1:` |
U+1F44D | |

`:-(` |
U+02639 | |

`:-)` |
U+1F60A | |

`:-D` |
U+1F603 | |

`:-o` |
U+1F632 | |

`:-p` |
U+1F61D | |

`:-|` |
U+1F610 | |

`:D` |
U+1F603 | |

`:fire:` |
U+1F525 | |

`:grin:` |
U+1F601 | |

`:o` |
U+1F632 | |

`:p` |
U+1F61D | |

`:smile:` |
U+1F604 | |

`:yum:` |
U+1F60B | |

`:|` |
U+1F610 | |

`;)` |
U+1F609 | |

`;-)` |
U+1F609 | |

`</3` |
U+1F494 | |

`<3` |
U+02764 | |

`>:)` |
U+1F608 | |

`B-)` |
U+1F60E |

The emoticon shortcut facility is incorporated into the RichEdit autocorrect facility. To enable the autocorrect facility, send the message EM_SETAUTOCORRECTPROC with wparam = an AutoCorrectProc callback pointer. If you don’t want to implement an autocorrect callback, set wparam = 1. This activates the built-in math autocorrect facility in math zones. It also activates emoticon shortcuts if they’re enabled. To enable the emoticon shortcuts, get the current language-option flags by sending EM_GETLANGOPTIONS, OR in IMF_EMOTICONSHORTCUTS (0x8000), and send EM_SETLANGOPTIONS with lparam equal to the result. The emoticon-shortcut option is disabled by default. Have fun

The post RichEdit Emoticon Shortcuts appeared first on Math in Office.

]]>The post RichEdit Hot Keys appeared first on Math in Office.

]]>This post summarizes the hot keys built into RichEdit. A previous post published a summary of all RichEdit hot keys as of 2013, but that post got truncated, it’s missing some hot keys that were added recently, and the hyperlinks need updating. Note that RichEdit clients, e.g., OneNote, often handle all hot keys with RichEdit never seeing the corresponding keyboard messages. Since the client receives the keyboard input, it can do whatever it wants to with that input. This flexibility is valuable particularly for localizing hot keys. RichEdit is “globalized”, but not localized. A number of the hot keys described in this post are English-centric and should be localized by the client. Other hot keys are global by nature and can be used in any locale.

The post Entering Unicode Characters explains several ways to enter arbitrary Unicode characters into applications. My favorite general-purpose way is via Alt+x, which works in Word, Outlook, OneNote, and RichEdit-based programs like WordPad. It ought to work in *all* editors! (Sadly, it doesn’t work in PowerPoint, Excel or Visual Studio, although it’d be easy for these programs to implement it ). It works by entering the Unicode hex code for the character followed by Alt+x. So, entering 2260 Alt+x enters ≠. Entering 1d44e Alt+x enters 𝑎, which is math-italic a. I use this hot key almost as often as I use Ctrl+c (copy) and Ctrl+v (paste). When I’m writing code in Visual Studio, I keep a program running RichEdit handy for entering Unicode symbols. Programs are easier to read with real Unicode characters instead of workarounds using the \xXXXX notation. You can also copy the symbols from appropriate web pages such as Mathematical operators and symbols in Unicode which has most math symbols. But if you know the Unicode code point, the Alt+x hot key is faster. It also lets you find out a character’s Unicode hex value from the character since Alt+x is a toggle: convert hex to character; convert character to hex. Try it, you will like it!

A pair of globalized hot keys set the BiDi directionality of a paragraph. These hot keys depend on knowing the difference between right and left. The WM_KEYDOWN message passes information in the message lparam that specifies right Shift or left Shift. Specifically, byte 2 of the lparam gives the key’s scan code and the value 0x36 is the scan code for the right-shift key ever since IBM shipped its first PC in August, 1981. This information lets RichEdit handle the Ctrl+RightShift hot key to switch the BiDi paragraph directionality to RTL (right to left). Similarly, Ctrl+LeftShift switches to LTR (left to right). RichEdit tracks which Alt, Shift, and Ctrl keys are depressed at any given time. This enables it to differentiate between left Alt for menus and right Alt (AltGr) for keyboard commands. But the most important use is for the Ctrl+RightShift and Ctrl+LeftShift hot keys. Lots of other Word hot keys are implemented in RichEdit.

RichEdit supports Word’s standard subscript and superscript hot keys: Ctrl+= and Ctrl+Shift+=, respectively. These hot keys toggle their respective states. For example, if you type some text, Ctrl+=, and some more text, the latter will be subscripted up until you type Ctrl+= again to go back on the base line. If you type one of these hot keys while some text is selected, that text’s script character will be toggled accordingly. In UnicodeMath, subscripts and superscripts are usually entered with the _ and ^ operators as in [La]TeX, or via the ribbon. But the standard hot keys can be handy provided the scripts are not nested. These hot keys have different meanings in a math zone: Ctrl+= builds up LaTeX or UnicodeMath into OfficeMath and Ctrl+Shift+= builds OfficeMath down into LaTeX or UnicodeMath.

The Ctrl+} hot key is copied from the Visual Studio program editor. Ctrl+} moves the insertion point from one end of a bracketed expression (…), […], {…} to the other end. This is very handy for examining text with nested parentheses or curly braces (RTF, LaTeX, computer programs, JSON, etc.).

Arrow, PgUp/PgDn, and Home/End key behavior is summarized in the following table for ordinary text (behavior in math zones may differ). A depressed state of the Shift, Ctrl, and Alt keys is given by ✓; else the key isn’t depressed.

Key |
Shift |
Ctrl |
Action |

← | Move left char | ||

← | ✓ | Move left word | |

← | ✓ | Select left char | |

← | ✓ | ✓ | Select left word |

↑ | Move up line | ||

↑ | ✓ | Move to start of paragraph | |

↑ | ✓ | Select up line | |

↑ | ✓ | ✓ | Select to start of paragraph |

→ | Move right char | ||

→ | ✓ | Move right word | |

→ | ✓ | Select right char | |

→ | ✓ | ✓ | Select right word |

↓ | Move down line | ||

↓ | ✓ | Move to end of paragraph | |

↓ | ✓ | Select down line | |

↓ | ✓ | ✓ | Select to end of paragraph |

PgUp | Move up one screen | ||

PgUp | ✓ | Move to start of screen | |

PgUp | ✓ | Select up one screen | |

PgUp | ✓ | ✓ | Select to start of screen |

PgDn | Move down one screen | ||

PgDn | ✓ | Move to end of screen | |

PgDn | ✓ | Select down one screen | |

PgDn | ✓ | ✓ | Select to end of screen |

Home | Move to start of line | ||

Home | ✓ | Move to start of story | |

Home | ✓ | Select to start of line | |

Home | ✓ | ✓ | Select to start of story |

End | Move to end of line | ||

End | ✓ | Move to end of story | |

End | ✓ | Select to end of line | |

End | ✓ | ✓ | Select to end of story |

Arrow-key behavior in vertical text corresponds to the different direction. For example, ↓ goes to the next character instead of going to the next line. See Math Selection for a discussion of how the navigation keys work in a math zone. An important point is that if you select a math structure character (start of object, end of object, or end of argument), the whole object is automatically selected.

Typically typing the Tab key inserts a Tab character (U+0009). But depending on context, the Tab key may turn into a navigation key. For example, in a table cell, the Tab key goes to the next cell and Shift+Tab goes to the previous cell (if any). If the selection is in the last cell of a table, the Tab key inserts a new row after the last row with the insertion point in the first cell of the new row.

In math zones, the Tab key goes to the next argument of the current math function and the Shift+Tab key goes to the previous argument. This behavior was originally scheduled for Word as well, but got postponed.

In dialog window controls, Tab characters are ignored. This allows dialogs to use the Tab character to move from control to control.

The Enter key usually inserts an end-of-paragraph character (U+000D—carriage return) and the Shift+Enter key inserts an end-of-line character (U+000B—VT). See Paragraphs and Paragraph Formatting for a discussion of the differences between these kinds of insertions. At the end of a table row, the Enter key inserts a new row after the current row. Inside a math object argument, an Enter key inserts an equation array. This is handy for the lower limit of n-ary objects like summations, which may have more than one subscript range. In a display math zone, Shift+Enter starts a new equation (see The Math Paragraph for details).

If the current selection is nondegenerate (selects one or more characters), the Delete key deletes the selected characters. If the current selection is degenerate, i.e., an insertion point (IP), the Delete key usually deletes the character immediately following the IP. If the character is followed by one or more combining marks, the Delete key deletes the whole combining-mark sequence. Similarly if the character is followed by a variation selector, the Delete key deletes the whole variation-selector sequence. If the Ctrl key is pressed for an insertion point, the Delete key deletes the word following the IP. See Math Selection for a discussion of how the Delete and Backspace keys work in math zones. In particular, the math object is selected if you type Delete at the start of the object or Backspace at the end the object. A second Delete or Backspace then deletes the object. This behavior exists so that you don’t delete things by mistake. If you do so anyway, you can always undo your deletion by typing Ctrl+Z.

The Backspace key is similar to the Delete key but has some differences in addition to operating on the character preceding the insert point. If the current selection is nondegenerate, the Backspace key acts the same as the Delete key and deletes the selected characters. If the current selection is degenerate, i.e., an insertion point, the Backspace key usually deletes the character immediately preceding the insertion point. If that character is a combining mark, the Backspace key deletes that combining mark alone. This differs from the Delete key at the start of a combining-mark sequence, which deletes the whole combining-mark sequence. If the preceding character is a variation selector, the Backspace key deletes the whole variation-selector sequence. If the Ctrl key is pressed for an insertion point, the Backspace key deletes the word preceding the IP. See Math Selection for a discussion of how the Backspace key works in math zones. In particular, the math object is selected if you type Backspace at the end the object. A second Backspace then deletes the object. Alt+Backspace is an alias for Ctrl+Z (undo).

The following table lists additional hot keys handled by RichEdit

Key |
Shift |
Ctrl |
Alt |
Action |

= | ✓ | Toggle subscript mode (not in math zone) Build up selected math text (in math zone) | ||

= | ✓ | ✓ | Toggle superscript mode (not in math zone) Build down selected math text (in math zone) | |

= | ✓ | Insert math zone | ||

= | ✓ | ✓ | Build down selected math text | |

= | ✓ | ✓ | ✓ | Build up selected math text (doesn’t have to be in math zone) |

– | ✓ | Insert soft hyphen (U+00AD) | ||

– | ✓ | ✓ | Insert nonbreaking hyphen (U+20✓✓) | |

, | ✓ | Cedilla accent dead key (English only) | ||

‘ | ✓ | Acute accent dead key (English only) | ||

“ | ✓ | ✓ | Smart quotes (English only) | |

~ | ✓ | ✓ | Tilde accent dead key (English only) | |

; | ✓ | Dieresis accent dead key (English only) | ||

` | ✓ | Grave accent dead key (English only) | ||

> | ✓ | ✓ | Make font bigger | |

< | ✓ | ✓ | Make font smaller | |

! | ✓ | ✓ | ✓ | Insert ¡ (inverted !, English only) |

? | ✓ | ✓ | ✓ | Insert ¿ (English only) |

} | ✓ | Move to other end of bracketed expression (…), […], {…} | ||

1 | ✓ | Single spacing | ||

2 | ✓ | Double spacing | ||

5 | ✓ | 1.5 spacing | ||

6 | ✓ | Caret accent dead key (English only) | ||

A | ✓ | Select All | ||

A | ✓ | ✓ | Toggle all caps | |

B | ✓ | Toggle bold | ||

C | ✓ | Copy selection (Ctrl+Insert is an alias) | ||

E | ✓ | Center selected paragraph(s) | ||

E | ✓ | ✓ | Insert € (except for languages noted below) | |

I | ✓ | Toggle italic | ||

J | ✓ | Justify selected paragraphs(s) | ||

L | ✓ | Left align selected paragraph(s) | ||

L | ✓ | ✓ | Cycle through bullet/numbering types | |

Q | ✓ | Alias for alt+= | ||

R | ✓ | Right align selected paragraph(s) | ||

U | ✓ | Toggle underline | ||

V | ✓ | Paste (Shift+Insert is an alias) | ||

X | ✓ | Copy selection and delete it | ||

X | ✓ | Convert from hex to Unicode and vice versa | ||

Y | ✓ | Redo | ||

Z | ✓ | Undo | ||

F3 | ✓ | If first selected letter is lower-case, change to title case; else change to lower case | ||

F8 | ✓ | ✓ | ✓ | Turn on table autofit |

F12 | ✓ | ✓ | ✓ | Same as Alt+X |

The Euro (€) isn’t inserted by Ctrl+Alt+E for the following languages: UK English, Eire English, Polish, Portuguese, Hungarian, Vietnamese, New Tai Lue, Ogham, Hawaiian, Gaelic, Sesotho, Twana, Kyrgyz, Igbo, Latvian, Georgian, Hebrew, Pashto, Latin, Maltese, Cherokee, Myanmar, Sinhalese, Syriac, Inuktitut, Khmer, Tibetan, and Hindi.

The post RichEdit Hot Keys appeared first on Math in Office.

]]>The post MathML and OMML User Selection Attributes appeared first on Math in Office.

]]>A user selection can be degenerate, that is, an insertion point, or nondegenerate in which case it selects one or more ranges of characters. Multiple disjoint selections (multiple ranges) can be made by using the Ctrl key and clicking appropriately. For math editing, multiple selections aren’t generally very useful, and this post doesn’t treat them. A nondegenerate selection has an *active* end, the end that moves when you enter Shift + an arrow key, and an *anchor* end. The two ends coincide for an insertion point.

To specify the locations of the selection ends in any MathML/OMML content, we define the attribute names selActiveEnd, selAnchorEnd, and selIP (insertion point). The values for these attributes are given in the table

“before” | Before math zone |

“after” | After math zone |

“n“ |
At offset n into an element |

The most common attribute is selIP with the value “0”, i.e., an insertion point at the start of the element, such as the MathML <mi selIP=”0″>a</mi> or the OMML <m:t selIP=”0”>𝑎</m:t>.

With elements containing more than one character like the MathML <mi>sin</mi>, the insertion point might be in between the ‘s’ and the ‘i’, in which case one has <mi selIP=”1″>sin</mi> and the OMML <m:t selIP=“1”>sin</m:t>. If the user then enters Shift+→ to select the ‘i’, the MathML is <mi selAnchorEnd=”1″ selActiveEnd=”2″>sin></mi> and the OMML is <m:t selAnchorEnd=”1″ selActiveEnd=”2″>sin</m:t>.

Another case is for an IP at the end of an object argument. For example, in the MathML fraction <mfrac><mn>1</mn><mn>2</mn></mfrac>, if the IP follows the ‘2’ in the denominator, the selIP attribute appears in MathML as <mn selIP=”1″>2</mn>. This IP is at the end of the denominator, not at the end of the fraction, and entering a character puts the character in the denominator following the ‘2’. The corresponding OMML is <m:t selIP=”1″>2</m:t>.

The offset *n* is given in code units of the Unicode encoding in use. Microsoft Office apps use UTF-16 for which most math alphanumerics are surrogate pairs, that is, 2 code units. If a fraction denominator is 𝑥, an IP following the 𝑥 is specified for a UTF-16 implementation by the MathML as <mi selIP=”2″>x</mi> even though MathML uses the single-unit ASCII letter x to represent the surrogate-pair math-alphabetic 𝑥 (U+1D465). This choice is synchronized with the selection in memory. In OMML, math alphanumerics aren’t translated to ASCII, so this size difference doesn’t occur.

An IP or selection end can follow the last element of a parent element such as being after the parenthesized expression in (𝑎 + 𝑏)² but still in the base of the superscript object. For an IP, this is given in MathML by an empty <mrow selIP=”0″/>:

<msup> <mrow> <mrow><mo>(</mo><mi>a</mi><mo>+</mo><mi>b</mi><mo>)</mo></mrow> <mrow selIP="0"/></mrow> <mn>2</mn></msup>

In OMML, it’s given by an empty run <m:r selIP=”0″/>. The empty <mrow> or empty <m:r> is also used for an IP at the end of the math zone, but still in the math zone.

If an IP is at the start of a math object, such as a fraction, but before the first argument, the selection attribute goes in the math-object element. For example, for the fraction “1 over 2”, an IP at the start of the fraction is indicated by <mfrac selIP=”0″> in MathML and by <m:f selIP=”0″> in OMML.

A selection may start before a math zone and select part or all of the math zone. Similarly, it can start inside the math zone and extend beyond it. The attribute values “before” and “after” are used for such cases, respectively. For example, the math in the statement (selection is highlighted) “the Pythagorean Theorem is 𝑎² + 𝑏² = 𝑐²” is represented by the MathML

<math selAnchorEnd="before"> <msup><mi>a</mi><mn>2</mn></msup> <mo selActiveEnd="0">+</mo> <msup><mi>b</mi><mn>2</mn></msup> <mo>=</mo> <msup><mi>c</mi><mn>2</mn></msup> </math>

where the active end follows the 𝑎². If the whole math zone is embedded in a selection with the active end at the selection end, <math selAnchorEnd=”before” selActiveEnd=”after”> is used. A selected empty math-zone place holder can be represented in MathML by <math selIP=”0″/>.

Although it would be possible to allow selection attributes on most any element, it’s simpler to process and nevertheless general enough to restrict the elements that take selection attributes to

- Character elements: MathML <mi>, <mn>, <mo>, <mtext> and OMML <m:t>
- MathML <mrow> and OMML <m:r>
- Math object elements like MathML <mfrac> and OMML <m:f>
- Math zone element: MathML <math> and OMML <oMath>

The post MathML and OMML User Selection Attributes appeared first on Math in Office.

]]>The post Displaying Enlarged Images in Popup Window appeared first on Math in Office.

]]>Enable the EN_IMAGE notification by sending an EM_SETEVENTMASK message with lParam equal to an event mask that includes the ENM_IMAGE flag defined by

#define ENM_IMAGE 0x00000400 // Event mask for mouse over image

In RichEdit window controls, the notification is sent to the parent window packaged in a WM_NOTIFY message with lParam being a pointer to an ENIMAGE struct defined by

typedef struct _enimage { NMHDR nmhdr; // Notification header UINT msg; // Message causing notification, e.g. WM_LBUTTONDOWN WPARAM wParam; // Msg wParam LPARAM lParam; // Msg lParam IMAGEDATA ImageData; // Image Data } ENIMAGE;

where nmhdr.code = EN_IMAGE defined by

#define EN_IMAGE 0x0721 // Notification when mouse is over an image

IMAGEDATA is defined by

typedef struct _imagedata { LONG cp; // cp of image in RichEdit control IMAGETYPE Type; // Image type LONG Width; // Image width in HIMETRIC units LONG Height; // Image height in HIMETRIC units } IMAGEDATA;

and IMAGETYPE is defined by

typedef enum _IMAGETYPE { IT_NONE, IT_BMP, IT_GIF, IT_JPEG, IT_PNG, IT_ICO, IT_TIFF, IT_WMP, IT_UNKNOWN // User installed WIC codec } IMAGETYPE;

In windowless RichEdit controls, EN_IMAGE is passed to the host via an ITextHost::TxNotify() call. If the image is singly selected, RichEdit doesn’t send EN_IMAGE notifications so that users can use the mouse to resize the image.

Clients can display an enlarged image whenever desired by sending the EM_DISPLAYIMAGE message defined by

#define EM_DISPLAYIMAGE (WM_USER + 386)

The message wParam is a pointer to an IMAGEDATA structure defined above. The message lParam is an ID2D1RenderTarget* for the target window. The client should supply the desired new IMAGEDATA::Width and Height in HIMETRIC units. For example, on receipt of an EN_IMAGE notification, the client can use the data in the IMAGEDATA struct included in the ENIMAGE notification struct. The Width and Height values determine the image aspect ratio, which should be maintained in the enlarged image.

Here is an example with an image of the Matterhorn in the edit control (upper image) and an enlarged image below it

The post Displaying Enlarged Images in Popup Window appeared first on Math in Office.

]]>The post RichEditD2D Window Controls appeared first on Math in Office.

]]>In January 2020, the Microsoft 365 RichEdit introduced a D2D/DirectWrite RichEdit window control with the new window class “RichEditD2D”. It uses D2D/DirectWrite for text and images and the window’s HDC for rendering embedded objects. It doesn’t support printing yet, since printing requires GDI. Fortunately, many applications don’t need to print anymore. On my laptop, the Microsoft 365 RichEdit is housed in C:\Program Files\Microsoft Office\root\vfs\ProgramFilesCommonX64\Microsoft Shared\OFFICE16\riched20.dll.

The RichEditD2D window class is implemented using the ID2D1Factory::CreateDCRenderTarget() method to create a highly functional ID2D1RenderTarget for an HDC. Image display doesn’t need an HDC and is rendered correctly on the D2D/DirectWrite path. OLE objects need an HDC and are queued up for rendering after the D2D/DirectWrite rendering finishes. It’s important to support OLE objects partly because the desktop Outlook To and Cc resolved email addresses are OLE objects.

The RichEditD2D window class works well with the Win32 Outlook To, Cc, and subject lines, rendering emoji in color on the subject line. Released versions of Outlook don’t use the RichEditD2D window class, so you’ll still see black and white emoji’s on Outlook’s Subject line . Also, the RichEditD2D implementation hasn’t been ported to Window’s msftedit.dll used by WordPad and other non-Office programs.

RichEdit *windowless* controls have supported the D2D/DirectWrite code path for years now. For such controls, the client has to implement the ITextHost or ITextHost2 interface, which are more complicated than simply calling CreateWindowEx(). They’re also more flexible, so many programs use windowless RichEdit controls in Office and in Windows. For example, the XAML TextBox and RichTextBox controls use windowless controls running in the D2D/DirectWrite mode and automatically enable color emoji.

For windowless controls, color fonts aren’t enabled by default. To enable them, send the message EM_SETTYPOGRAPHYOPTIONS with wparam = lparam = TO_DEFAULTCOLOREMOJI | TO_DISPLAYFONTCOLOR, where TO_DEFAULTCOLOREMOJI = 0x1000 and TO_DISPLAYFONTCOLOR = 0x2000. In a windowless control, you can do this by calling

ITextServices::TxSendMessage(EM_SETTYPOGRAPHYOPTIONS, TO_DEFAULTCOLOREMOJI | TO_DISPLAYFONTCOLOR, TO_DEFAULTCOLOREMOJI | TO_DISPLAYFONTCOLOR, nullptr);

The post RichEditD2D Window Controls appeared first on Math in Office.

]]>The post MathML mfenced element deprecated on web appeared first on Math in Office.

]]>The MathML <mfenced> element is handy for representing a variety of delimited expressions, such as parenthesized, braced, and bracketed expressions. The expressions can contain separators. Examples are (𝑎 + 𝑏), (𝑎 + 𝑏], and the quantum mechanical expectation value ⟨𝜓|ℋ|𝜓⟩, in which the ‘|’ is a separator. The <mfenced> element corresponds quite closely to the OMML delimiters element <d> used in Office app files, which is why the OfficeMath MathML writers use it.

To show how <mfenced> can be emulated by an <mrow>, consider (𝑎 + 𝑏]. Using <mfenced>, it is represented by

`<mfenced close="]">`

`<mi>a</mi><mo>+</mo><mi>b</mi>`

`</mfenced>`

Since left parenthesis is the default start delimiter, the <mfenced> doesn’t need the attribute open=”(“, although it could have it. The equivalent <mrow> representation is

`<mrow>`

`<mo>(</mo>`

`< mi>a</mi><mo>+</mo><mi>b</mi>`

`<mo>]</mo>`

`</mrow>`

Here the <mo> fence=”true” attribute isn’t needed since the MathML operator dictionary assigns fence=”true” to parentheses, brackets, braces and other Unicode characters that are fences by default. You need the attribute fence=”false” or stretchy=”false” if you don’t want the delimiters to grow to fit their content.

Comparing these representations, we see that <mfenced> is more compact. On the other hand, the <mrow> emulation is more general in that you can include attributes like different math colors on the individual delimiters and you can embellish the delimiters with accents. If you want a delimited expression with just the open delimiter, e.g., {𝑎 + 𝑏, you omit the <mo> for the close delimiter. Similarly, a delimited expression with no open delimiter, e.g., 𝑎 + 𝑏}, omits the open delimiter. For more discussion of <mfenced> and <mrow>, see the MathML 3.0 spec.

The <mfenced> element is an example of Polish prefix notation: you know up front what kind of math object is involved. In contrast, you must parse an <mrow> emulation of <mfenced> to figure out what it represents. The parsing is a little tricky, but it’s not that hard since the delimiter roles are implied by the order in which the delimiters appear inside the <mrow>.

The basic principle is that the start and end delimiters are fences, and any delimiters in between are separators. The main OfficeMath MathML reader uses a SAX parser, which cannot look ahead. But the reader can store information for looking behind. The algorithm is: the first delimiter of an <mrow> is a start delimiter <mo> and other delimiters are marked as separators. When the parser comes to the end of the delimiter expression (</mrow>), it remarks the last delimiter as an end delimiter. If there are only two delimiters, there are no separators. If there’s only one delimiter, it’s a start delimiter unless it comes at the end. This algorithm converts <mrow> delimiter elements into the OfficeMath <d> equivalent. It will be used soon in Office apps since FireFox removed support of <mfenced> (OneNote counted on it in FireFox!) and the Chromium code base won’t support it either. Yes, Chromium will support “core” Presentation MathML. Many browsers are based on Chromium, e.g., Chrome and Edge.

Some MathML elements are “inferred mrow’s” in that they treat multiple children as a single argument and the algorithm works with them as well. Such elements include <math>, <msqrt>, <menclose>, <mphantom>, <mpadded> and <mtd>.

Best practice <mrow> delimiter emulation restricts the contents of the <mrow> to the contents of the delimited expression. But what if there are other things inside an <mrow> such as in (note: <math> is an inferred <mrow>)

<math > <mo>(</mo> <mi>a</mi> <mo>+</mo> <mi>b</mi> <mo>)</mo> <mo>+</mo> <mo>|</mo> <mi>a</mi> <mo>+</mo> <mi>b</mi> <mo>|</mo> </math>

Two tricks are useful: with no form-disambiguating attribute like “form”=”prefix” on the delimiter <mo>’s (as in this example), use the default form value given in the MathML operator dictionary. This works for all default delimiter pairs, but not for ‘|’ which can be used as a separator (infix), open delimiter (prefix), or close delimiter (postfix). For ‘|’ use the algorithm above with a small twist: when there is an active ‘|’ start delimiter, treat a ‘|’ as an end delimiter. When finished processing any delimiter expression, reset the state to “no delimiters”. As such ‘|’ is alternately a start delimiter and end delimiter. This algorithm cannot produce nested absolute-value expressions. To nest an absolute value, use appropriate form attributes, or, best practice, put the absolute value in its own <mrow>.

The post MathML mfenced element deprecated on web appeared first on Math in Office.

]]>The post How I got into technical word processing appeared first on Math in Office.

]]>When I finished my PhD in 1967, I went to Bell Labs to continue working on laser physics and after a year got seduced by the idea of labeling graphs with real built-up, i.e., 2D, mathematical expressions. To this end, I created the SCROLL language (**s**tring and **c**haracter **r**ecording **o**riented **l**ogogrammatic **l**anguage), which was the first language capable of “typesetting” mathematical equations on a computer. I published it in AFIPS Conf. Proc. **35**:525-536, AFIPS Press, Montvale, N.J. (1970). Admittedly SCROLL’s typography was limited. For example, the user had the responsibility of spacing the math, in contrast with TeX, Word 2007, and other sophisticated systems. But it was the first program capable of displaying built-up math, and it was fine for that time to be able to show nicely labeled results at various conferences.

After my two-year stint at Bell Labs, one of my fellow graduate students at Yale, Marlan Scully, suggested coming to the Optical Sciences Center at The University of Arizona to work on lasers and things and in particular to write *Laser Physics*, a book we had talked about writing some day with Willis. Well for a North Easterner, Tucson, Arizona was a most fabulous and interesting place and certainly one way to start seeing the rest of the world. So instead of going to Bell Labs in Murray Hill to work with the great computer science group there (and maybe later on eqn/troff, a TeX competitor), I went to Tucson. Marlan, Willis, and I (well mostly me, with two excellent consultants!) wrote the book and I personally typed over two/thirds of it using a superb new kind of typewriter called the IBM Selectric. It had handy type balls that you could exchange, so you could have italic, symbols, script, and other typefaces. What a huge improvement over the swapping out of keys which we had to do with the older IBM typewriters. The reason I had to type so much of the book was because even with the Selectric our secretaries couldn’t type math very well, especially with subscripted superscripts, integrals, and the like common in the laser theories we were writing about.

*Laser Physics *was typeset in South Korea and the drafts confused *α* and *a,* *ν* and *v* (nu and vee, since Times New Roman also confuses them), and other symbols. It took me over a month to straighten things out, even though the original manuscript was correct. Such problems tweaked my interest in preparing technical documents on computers. Publishing in physics journals was much easier, but you still had to spend significant time proof reading galley proofs.

Around 1978 I got a Diablo daisy-wheel printer to go with my IMSAI Z80 microcomputer. Not only was it much faster than the Selectric, it had many daisies some of which were proportionally spaced, and it was designed to work as a computer printer. I had gotten into microcomputing thinking that by computerizing my house I’d learn something about experimental physics, since Willis taught me that a real physicist needs to know something about both experiment and theory. To handle the proportional spacing, I wrote a printer driver. My colleague Rick Shoemaker, another microcomputer addict, and I decided to write a book called *Interfacing Microcomputers to the Real World*, and we “typeset” it using my printer driver and a daisy-wheel printer. Addison-Wesley published the book, just as it had published *Laser Physics*, but this time using our nice proportionally spaced camera-ready proofs.

Well clearly, we needed to be able to typeset math, so I generalized the printer driver to do so using algorithms like those for the SCROLL language. Another physicist, Mike Aronson, who had written the PMATE editor I was using, suggested that the input format should resemble real linearized math as in the C language rather than the Polish prefix format used in SCROLL. So I wrote a translator to accept a simplified linear format, the forerunner of UnicodeMath which we use in Office apps today. The translator was coded so tightly in Z80 assembly language that it along with the rest of the formatter fitted into 16KB of ROM for a controller some friends of mine created for Diablo daisy-wheel printers. When used with a tractor feed, it could print the whole document with one daisy, roll the document back, print with the next daisy, etc. It was positively wild watching the printer type the symbols in place after printing the main text.

As a laser physicist, I was naturally symbiotically attached to the idea of laser printers, so when HP came out with their early laser printers, I converted the program to 8086 code for use on IBM PCs and HP LaserJets. The editor and formatter ran just fine in MS-DOS in the PC’s incredibly roomy 640KB. Rick and I updated our microcomputer book to *The IBM-PC from the Inside Out*, once again published by Addison-Wesley from our camera-ready copy. I called the program the PS Technical Word Processor, and my users and I wrote many papers and books using it. Well many by a typical professor’s standards, i.e., not by Knuth’s (!), and essentially none by Microsoft’s standards. I really wanted to distribute the approach more widely. With myriad improvements, e.g., LineServices, we now have OfficeMath. And yet there’s still much to do!

The post How I got into technical word processing appeared first on Math in Office.

]]>The post Unicode Math Calligraphic Alphabets appeared first on Math in Office.

]]>Note: the January 2021 meeting of the Unicode Technical Committee accepted 52 variation sequences for the upper-case script letters: 26 for roundhand and 26 for chancery. See L2/20-275R for the latest proposal.

1) Here’s an example of chancery and roundhand F’s being used in the same document:

2) Here are examples featuring P’s and C’s in which script letters denote infinity categories

3) Still another paper has the following

4) Both script styles are in the OMS encoding for LaTeX as illustrated by

\documentclass{article} \usepackage{calrsfs} \DeclareMathAlphabet{\pazocal}{OMS}{zplm}{m}{n} \newcommand{\La}{\mathcal{L}} \newcommand{\Lb}{\pazocal{L}} \begin{document} $\La\Lb$ \end{document}

This LaTeX snippet displays a roundhand L followed by a chancery L

Accordingly, the need for both chancery and roundhand alphabets is attested.

Complicating the addition of new alphabets is the fact that the current math-script alphabets may be chancery in one font and roundhand in another. Cambria Math, the first widely used Unicode math font, has chancery letters at the math-script code points, while the Unicode Standard has roundhand letters at those code points. For example, here’s the upper-case math-script H (U+210B) in Cambria Math followed by the one in the Unicode Standard:

The STIX math fonts have also had roundhand letters at the math-script codepoints, but in the STIX Two Math font, they have been changed to chancery. This removes the worst conflict in defining the new alphabets, although other math fonts might have roundhand letters at the current math-script codepoints.

We discuss two unambiguous ways to allow math-chancery and math-roundhand symbols to appear in the same plain-text document:

- Follow a character in the current math-script alphabets with one of two variation selectors much as we use variation selectors (U+FE0E, U+FE0F) for emoji to force text and emoji glyphs, respectively. Specifically, to ensure use of the math-chancery alphabet, follow the current math-script letter with U+FE00. To ensure use of the math roundhand alphabet, follow the current math-script letters with U+FE01.
- Add the missing bold and regular script alphabets

The variation selector approach has the advantages

- Contemporary software supports variation selectors for East Asia and emoji, so adding new variation selector usage shouldn’t be much of a burden
- The variation selector U+FE00 is already used with a number of math operators
- No new code points need to be allocated
- Typical documents can continue to do what they have been doing: ignore the distinction
- If a math font doesn’t support the variation selectors, it falls back naturally to the current script letters instead of displaying the missing-glyph box (but the style difference is lost)

Adding two variation selectors for the math script letters may make people ask why we didn’t use variation selectors for the math alphabets in the first place, but we all know the arguments in favor of what we did (see the blog post on Math Font Binding). Adding two variation selectors seems to solve the script quandary quite well, although the use of variation selectors is generally a poor one for situations where symbol shapes need to be used in a contrastive manner—this case should therefore not serve as a general precedent, but should be seen as an exception, tailored to fit this specific case. One way to implement the variation-selector combinations is to use the OpenType feature tags ‘cv01’ and ‘cv02’.

The second approach adds the missing normal and bold script alphabets. These two new alphabets could go in the 1D380…1D3FF block which is reserved for math alphabets. Programs continue to display what they currently display by default.

It might be worthwhile for programs like Microsoft Word to have a math document-level property that specifies which script alphabet to use for the whole document. Then a user who wants the fancy script glyphs could get them without making any changes except for choosing the desired document property setting. A similar setting could be used for choosing sans-serif alphabets as the default. Such alphabets are often used in chemical formulas.

The choice of chancery glyphs for the math script letters in Cambria Math is partly my fault. I had expected to see roundhand letters in Cambria Math as in the Unicode code charts. In my physics career I used math-script letters a lot, starting with my PhD thesis on Zeeman laser theory (1967) and followed by many papers published in the Physical Review and elsewhere and in my three books on lasers and quantum optics. Occasionally in a review article, chancery letters were substituted for roundhand letters because the publishers didn’t have the latter. And in the early days, the IBM Selectric Script ball and the script daisy wheels only had chancery letters. So I kind of got used to this substitution. Cambria Math was designed partly to look really good on screens, which didn’t have the resolution to display the narrow stem widths of Times New Roman and roundhand letters well. ClearType rendering certainly helped, but it seemed like a good idea to use less resolution demanding chancery letters. (Later Word 2013 disabled ClearType for various reasons and many readers of this blog have complained passionately ever since! With high resolution screens as on my Samsung laptop or the Surface Book, even Times New Roman looks crisp and nice with only gray-scale antialiasing, so hopefully this problem will diminish in time.)

LaTeX has the \mathsf{} and \mathsfit{} control words for math sans-serif upright and italic characters, respectively, and they work with Greek letters. Unlike the chancery/roundhand distinction, which is seldom used contrastively, upright and italic are usually used contrastively in mathematics. The Unicode Standard has upright and italic sans-serif math alphabets corresponding to the ASCII letters, but not for the Greek letters. Accordingly, these two math Greek alphabets should probably be added. The STIX Two Math font has them in the Private Use Area for the time being since users requested them.

Thanks to Asmus Freytag, John Hudson, Rick McGowan and Ken Whistler for enlightening discussions that substantially improved this post.

The post Unicode Math Calligraphic Alphabets appeared first on Math in Office.

]]>