The post Math Speech Strings and Localization appeared first on Math in Office.

]]>To localize the speech for Unicode characters, perform a binary search by character code in the math symbol speech table for the desired language. The table for English follows. Notably missing are speech strings for U+002D ‘–‘, U+2329 ‘〈’, and U+232A ‘〉’. Input routines replace these nonmathematical characters by the corresponding math symbols U+2212 ‘−‘, U+27E8 ‘⟨’, and U+27E9 ‘⟩’, respectively. The character names in the table don’t always agree with the names in the Unicode Standard. The latter names cannot be changed for stability reasons even if the names are suboptimal or incorrect for math.

! ( ) + / < = > @ [ ] { | } |
, (space: comma gives a small pause)
factorial open paren close paren plus over less than equals greater than , next row, open bracket close bracket open brace vertical bar close brace |
! ( ) + / < = > @ [ ] { | } |

00A6
00AC 00AF 00B0 00B1 00B7 00F7 0131 0237 |
, atop,
not overbar degrees plus or minus dot divided by dotless i dotless j |
¦
¬ ¯ ° ± · ÷ ı ȷ |

0300
0301 0302 0303 0305 0307 0308 |
grave
acute hat tilde bar dot double dot |
̀
́ ̂ ̃ ̅ ̇ ̈ |

03B1
03B2 03B3 03B4 03B5 03B6 03B7 03B8 03B9 03BA 03BB 03BC 03BD 03BE 03BF 03C0 03C1 03C2 03C3 03C4 03C5 03C6 03C7 03C8 03C9 03D1 03D5 03DC 03DD 03F5 |
alpha
beta gamma delta script epsilon zeta eta theta iota kappa lambda mu nu xi omicron pi rho final sigma sigma tau upsilon script phi chi psi omega script theta phi cap digamma digamma epsilon |
α
β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϑ ϕ Ϝ ϝ ϵ |

200B
2016 2026 2032 2044 2045 2046 2146 |
,
double vertical line dot dot dot prime slash , equation , differential d |
‖ … ′ ⁄ ⁅ ⁆ ⅆ |

2190
2191 2192 2193 2194 21D2 21D4 |
left arrow
up arrow goes to down arrow left right arrow implies if and only if |
←
↑ → ↓ ⇒ ⇔ |

2200
2201 2202 2203 2204 2205 2206 2207 2208 2209 220A 220B 220C 220D 220E 220F |
for all
complement partial there exists there doesn’t exist empty set increment dell element of not element of small element of contains as member doesn’t contain as member small contains as member q e d product |
∀
∁ ∂ ∃ ∄ ∅ ∆ ∇ ∈ ∉ ∊ ∋ ∌ ∍ ∎ ∏ |

2210
2211 2212 2213 2214 2215 2216 2217 2218 2219 221A 221B 221C 221D 221E 221F |
coproduct
sum minus minus or plus dot plus linear divide set minus asterisk operator ring operator bullet square root cube root fourth root proportional to infinity right angle |
∐
∑ − ∓ ∔ ∕ ∖ ∗ ∘ ∙ √ ∛ ∜ ∝ ∞ ∟ |

2220
2221 2222 2223 2224 2225 2226 2227 2228 2229 222A 222B 222C 222D 222E 222F |
angle
measured angle spherical angle divides doesn’t divide parallel to not parallel to logical andd logical or intersection union integral double integral triple integral contour integral surface integral |
∠
∡ ∢ ∣ ∤ ∥ ∦ ∧ ∨ ∩ ∪ ∫ ∬ ∭ ∮ ∯ |

2230
2231 2232 2233 2234 2235 2236 2237 2238 2239 223A 223B 223C 223D 223E 223F |
volume integral
clockwise integral clockwise contour integral anticlockwise contour integral therefore because ratio proportion dot minus excess geometric proportion homothetic tilde operator reverse tilde operator inverted lazy s sine wave |
∰
∱ ∲ ∳ ∴ ∵ ∶ ∷ ∸ ∹ ∺ ∻ ∼ ∽ ∾ ∿ |

2240
2241 2242 2243 2244 2245 2246 2247 2248 2249 224A 224B 224C 224D 224E 224F |
wreath product
not tilde minus tilde asymptotically equal to not asymptotically equal to approximately equal to approximately but not equal to neither approximately nor equal to almost equal to not almost equal to almost equal or equal to triple tilde all equal to equivalent to geometrically equivalent to difference between |
≀
≁ ≂ ≃ ≄ ≅ ≆ ≇ ≈ ≉ ≊ ≋ ≌ ≍ ≎ ≏ |

2250
2251 2252 2253 2254 2255 2256 2257 2258 2259 225A 225B 225C 225D 225E 225F |
approaches the limit
geometrically equal to nearly equals image of or approximately equal to colon equals equals colon, ring in equal to ring equal to corresponds to estimates equiangular to star equals delta equals equals by definition measured by questioned equals |
≐
≑ ≒ ≓ ≔ ≕ ≖ ≗ ≘ ≙ ≚ ≛ ≜ ≝ ≞ ≟ |

2260
2261 2262 2263 2264 2265 2266 2267 2268 2269 226A 226B 226C 226D 226E 226F |
not equal
identical to not identical to strictly equivalent to less than or equal to greater than or equal to less than over equal to greater than over equal to less than but not equal to greater than but not equal to much less than much greater than between not equivalent to not less than not greater than |
≠
≡ ≢ ≣ ≤ ≥ ≦ ≧ ≨ ≩ ≪ ≫ ≬ ≭ ≮ ≯ |

2270
2271 2272 2273 2274 2275 2276 2277 2278 2279 227A 227B 227C 227D 227E 227F |
not less than or equal
not greater than or equal less than or equivalent greater than or equivalent to neither less than nor equivalent to neither greater than nor equivalent to less than or greater than greater than or less than neither less than nor greater than neither greater than nor less than precedes succeeds precedes or equals succeeds or equals precedes or is equivalent to succeeds or is equivalent to |
≰
≱ ≲ ≳ ≴ ≵ ≶ ≷ ≸ ≹ ≺ ≻ ≼ ≽ ≾ ≿ |

2280
2281 2282 2283 2284 2285 2286 2287 2288 2289 228A 228B 228C 228D 228E 228F |
doesn’t precede
doesn’t succeed subset of superset of not subset of not superset of subset or equals superset or equals neither a subset nor equal to neither a superset nor equal to subset of with not equal to superset of with not equal to multiset multiset times multiset union square image of |
⊀
⊁ ⊂ ⊃ ⊄ ⊅ ⊆ ⊇ ⊈ ⊉ ⊊ ⊋ ⊌ ⊍ ⊎ ⊏ |

2290
2291 2292 2293 2294 2295 2296 2297 2298 2299 229A 229B 229C 229D 229E 229F |
square original of
square image of or equal to square original of or equal to square cap square cup circled plus circled minus circled times circled divide circled dot circled ring circled asterisk circled equals circled dash squared plus squared minus |
⊐
⊑ ⊒ ⊓ ⊔ ⊕ ⊖ ⊗ ⊘ ⊙ ⊚ ⊛ ⊜ ⊝ ⊞ ⊟ |

22A0
22A1 22A2 22A3 22A4 22A5 22A6 22A7 22A8 22A9 22AA 22AB 22AC 22AD 22AE 22AF |
squared times
squared dot right tack left tack down tack up tack reduces to models results in forces triple vertical bar right turnstile double vertical bar double right turnstile does not prove doesn’t result in doesn’t force negated double vertical bar double right turnstile |
⊠
⊡ ⊢ ⊣ ⊤ ⊥ ⊦ ⊧ ⊨ ⊩ ⊪ ⊫ ⊬ ⊭ ⊮ ⊯ |

22B0
22B1 22B2 22B3 22B4 22B5 22B6 22B7 22B8 22B9 22BA 22BB 22BC 22BD 22BE 22BF |
precedes under relation
succeeds under relation is a normal subgroup of contains as normal subgroup is a normal subgroup of – or equals contains as normal subgroup of or equals original of image of multimap hermeetian conjugate matrix intercalate xor nand nor right angle with arc right triangle |
⊰
⊱ ⊲ ⊳ ⊴ ⊵ ⊶ ⊷ ⊸ ⊹ ⊺ ⊻ ⊼ ⊽ ⊾ ⊿ |

22C0
22C1 22C2 22C3 22C4 22C5 22C6 22C7 22C8 22C9 22CA 22CB 22CC 22CD 22CE 22CF |
n-ary logical andd
n-ary logical or n-ary intersection n-ary union diamond dot star division times bowtie left normal factor semidirect product right normal factor semidirect product left semidirect product right semidirect product reverse tilde equals curly logical or curly logical andd |
⋀
⋁ ⋂ ⋃ ⋄ ⋅ ⋆ ⋇ ⋈ ⋉ ⋊ ⋋ ⋌ ⋍ ⋎ ⋏ |

22D0
22D1 22D2 22D3 22D4 22D5 22D6 22D7 22D8 22D9 22DA 22DB 22DC 22DD 22DE 22DF |
double subset
double superset double intersection double union pitchfork equal and parallel to dotted less than dotted greater than very much less than very much greater than less than equals or greater than greater than equals or less than equals or less than equals or greater than equals or precedes equals or succeeds |
⋐
⋑ ⋒ ⋓ ⋔ ⋕ ⋖ ⋗ ⋘ ⋙ ⋚ ⋛ ⋜ ⋝ ⋞ ⋟ |

22E0
22E1 22E2 22E3 22E4 22E5 22E6 22E7 22E8 22E9 22EA 22EB 22EC 22ED 22EE 22EF |
doesn’t precede or equal
doesn’t succeed or equal not square image of or equal to not square original of or equal to square image of or not equal to square original of or not equal to less than but not equivalent to greater than but not equivalent to precedes but not equivalent to succeeds but not equivalent to not normal subgroup of does not contain as normal subgroup not normal subgroup of or equal to does not contain as normal subgroup or = vertical ellipsis midline horizontal ellipsis |
⋠
⋡ ⋢ ⋣ ⋤ ⋥ ⋦ ⋧ ⋨ ⋩ ⋪ ⋫ ⋬ ⋭ ⋮ ⋯ |

22F0
22F1 22F2 22F3 22F4 22F5 22F6 22F7 22F8 22F9 22FA 22FB 22FC 22FD 22FE 22FF |
up right diagonal ellipsis
down right diagonal ellipsis element of with long horizontal stroke element of w vertical bar at end of stroke small element of w vertical bar at end of stroke dotted element of overbar element of small overbar element of underbar element of double stroke element of long-stroke contains contains w vertical bar at end of stroke small contains w vertical bar at end of stroke overbar contains small overbar contains z notation bag membership |
⋰
⋱ ⋲ ⋳ ⋴ ⋵ ⋶ ⋷ ⋸ ⋹ ⋺ ⋻ ⋼ ⋽ ⋾ ⋿ |

2308
2309 230A 230B 2329 232A 23B4 23B5 23DC 23DD 23DE 23DF 23E0 23E1 |
open ceiling
close ceiling open floor close floor open angle bracket close angle bracket over bracket under bracket over paren under paren over brace under brace over shell under shell |
⌈
⌉ ⌊ ⌋ 〈 〉 ⎴ ⎵ ⏜ ⏝ ⏞ ⏟ ⏠ ⏡ |

24AD
2502 252C 2534 2581 2588 2592 25A0 25AD 27E6 27E7 27E8 27E9 3016 3017 |
root (UnicodeMath)
vertical bar lower limit upper limit underbar equation array (UnicodeMath) of (UnicodeMath) matrix (UnicodeMath) boxed formula (UnicodeMath) open white square bracket close white square bracket open angle bracket close angle bracket , (UnicodeMath “begin”) , (UnicodeMath “end”) |
⒭
│ ┬ ┴ ▁ █ ▒ ■ ▭ ⟦ ⟧ ⟨ ⟩ 〖 〗 |

The index of a string in the following table can be used as a language token. These speech tokens can be used in a localized table to retrieve the corresponding strings in the localized language. OfficeMath supports about 18 languages this way. This approach doesn’t support speaking math in a different order than in English. The aim is to provide understandable speech even if it isn’t the most elegant.

Speech string |
Meaning |

accent
box boxed formula brackets brackets with separators equation array fraction function left sub superscript lower limit matrix n ary expression (null string place holder: no speech) overbar phantom root slashed fraction stack stretch stack subscript sub superscript superscript underbar upper limit |
First 24 entries are names of LineServices math objects and must be in the order of tomAccent..tomUpperLimit |

half, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, halves, thirds, fourths, fifths, sixths, sevenths, eighths, ninths, tenths | Ordinals. Used for simple fractions and nested parens |

cosine, cotangent, cosecant, secant, sine, tangent, arccosine, arccotangent, arccosecant, arcsecant, arcsine, arctangent | Trigonometric functions |

open paren, open second paren, open third paren, open fourth paren, open fifth paren, open sixth paren, open seventh paren, open eighth paren, open ninth paren, open tenth paren | Open parentheses |

close paren, close second paren, close third paren, close fourth paren, close fifth paren, close sixth paren, close seventh paren, close eighth paren, close ninth paren, close tenth paren | Close parentheses |

absolute value
argument a. smash base base text bold bold fraktur bold italic bold script by cap column cross cubed |
Lead in for absolute value object
Second arg of function object Kind of phantom object First arg of sub/superscript object Second argument of ruby object Math alphanumeric qualifier Math alphanumeric qualifier Math alphanumeric qualifier Math alphanumeric qualifier <row> “by” <column> matrix Upper-case modifier Part of “end column” x if followed by math bold Simple exponent |

degree
denominator determinant double-struck d smash element empty empty equation end end absolute value end equation equation fraktur from function name |
Radical degree
Lead in for compound denom Determinant object Math alphanumeric qualifier Kind of phantom object Matrix element Empty argument Empty equation (math zone) Verb to end object or argument End compound absolute value End equation (math zone) E.g., in “end equation” Math alphanumeric qualifier Lower limit lead in if upper limit First argument of function object |

h smash
horizontal phantom hyperbolic integral integrand inverse limit as lower element equation monospace numbered equation numerator over , phantom smash product quantity radicand ruby ruby text |
Kind of phantom object
Kind of phantom object Trig modifier Integral n-ary object Third argument for integral Trig modifier Lead words for lim function Name of lower element in stack Lead in for equation (math zone) Math alphanumeric qualifier Numbered equation Start of compound numerator Lower limit lead in; no upper limit Pause Kind of phantom object Product n-ary object End absolute value to a power Radical radicand Object with phonetic annotation Phonetic annotation text |

sans-serif
sans-serif bold sans-serif italic sans-serif bold italic squared script start sub summand summation to to the upper element upright vertical phantom with |
Math alphanumeric qualifier
Math alphanumeric qualifier Math alphanumeric qualifier Math alphanumeric qualifier Simple exponent Math alphanumeric qualifier Verb to start object or argument Subscript lead in Third argument of summation Summation n-ary object Lead in to n-ary upper limit Lead in to general superscript Name of upper element in stack Math alphanumeric qualifier Kind of phantom object As in “base” … “with” “upper limit” |

The post Math Speech Strings and Localization appeared first on Math in Office.

]]>The post Using RichEdit for Text Processing appeared first on Math in Office.

]]>The first thing is to load the RichEdit dll. You can use the system \windows\system32\msftedit.dll unless you need features that have been added more recently to the Office riched20.dll. One such dll is located on my laptop in C:\Program Files\Microsoft Office\root\vfs\ProgramFilesCommonX64\Microsoft Shared\OFFICE16\riched20.dll. Another is a recent Office RichEdit that the Windows 11 Notepad uses. On my laptop, its path is C:\Program Files\WindowsApps\Microsoft.WindowsNotepad_11.2210.5.0_x64__8wekyb3d8bbwe\riched20.dll. You can find the installation location via PowerShell:

**Get-AppxPackage | ?{ $_.Name.Contains(“Notepad”) } | %{ $_.InstallLocation }**

To load the dll, execute

HINSTANCE hRE = LoadLibrary(L"riched20.dll");

You may need to give the full path to the desired RichEdit dll. Note that the \windows\system32\riched20.dll is old and exists for backward compatibility with very old programs: it’s Version 3 from Windows 2000 and has been only updated for security fixes. It has no features added in this century, so it’s missing a lot of functionality!

A RichEdit control is an ITextServices2 object. To create it, query the RichEdit dll for the exported CreateTextServices() function:

PCreateTextServices pfnCreateTextServices = (PCreateTextServices)GetProcAddress(hRE, "CreateTextServices"); CNoShowHost noShowHost; // ITextHost object described below IUnknown *pUnk; HRESULT hr = pfnCreateTextServices(nullptr, &noShowHost, &pUnk); if (hr != NOERROR) return hr; ITextServices2* pserv; hr = pUnk->QueryInterface(IID_ITextServices2, (void **)&pserv); pUnk->Release(); if (hr != NOERROR) return hr; ITextDocument2* pdoc; hr = pserv->QueryInterface(IID_ITextDocument2, (void **)&pdoc) != S_OK); if (hr != NOERROR) return hr;

ITextServices2 inherits from ITextServices. The method ITextServices::TxSendMessage() lets you send messages to the RichEdit control, and the pdoc pointer is the top TOM2 interface that lets you use TOM methods for processing text. For example,

LRESULT lres; pserv->TxSendMessage(WM_SETTEXT, 0, reinterpret_cast<LPARAM>(szTextRtf), &lres);

inserts the RTF string szTextRtf into the control. You can also insert a Unicode string as the LPARAM.

An interesting case illustrating the TOM interfaces is finding a math formula in a document. Read in the RTF or HTML document and then search the document using the ITextRange2::Find() method. First, to get the user selection ITextSelection* from the pdoc pointer, execute

ITextSelection2* psel; pdoc->GetSelection2(&psel);

The math expression you want to find in the document should be in a separate RichEdit control with an ITextDocument2* pdocExpr pointer. Get an ITextRange2* from pdocExpr by executing

pdocExpr->Range2(0, tomForward, &pRangeExpr); pRangeExpr->MoveEnd(tomCharacter, -1, nullptr);

Here the MoveEnd() call unselects the final CR (all rich-text controls end with a final CR (U+000D)). Then to find the math expression selected by pRangeExpr and starting in the document where psel is pointing, call

LONG Delta; psel->Find(pRangeExpr, tomForward, Flags, &Delta);

Here the values for Flags are defined in ITextRange::FindText(), and Delta gets set to the count of characters found (if found). The search is “fuzzy”, that is, it ignores character format attributes like foreground/background color, font face name, and font height. But it distinguishes Unicode characters, so the following kinds of letters are all distinct 𝐻𝑯𝐇ℌℋ. And math objects like fractions, subscripts, superscripts, integrals, matrices, equation arrays, etc., are all distinct. This approach is more general than searching for MathML, LaTeX, RTF, HTML, OMML, UnicodeMath, etc., since it first converts these formats to OfficeMath and then does a fuzzy search on the resulting OfficeMath. For a more detailed description of the algorithm, see Math Find/Replace and Rich Text Searches.

A RichEdit control needs a host given by an implementation of ITextHost or ITextHost2. If no display is needed, ITextHost is adequate. Here’s a skeleton version of such a host. You can use it as is, or enhance it as desired.

class CNoShowHost : public ITextHost { // IUnknown methods STDMETHODIMP QueryInterface(REFIID, void **) noexcept {return E_NOTIMPL;} STDMETHODIMP_(ULONG) STDMETHODCALLTYPE AddRef(void) noexcept {return 1;} STDMETHODIMP_(ULONG) STDMETHODCALLTYPE Release(void) noexcept {return 1;} // ITextHost Methods HDC TxGetDC() {return nullptr;} INT TxReleaseDC(HDC) {return 0;} BOOL TxShowScrollBar(INT, BOOL) {return false;} BOOL TxEnableScrollBar(INT, INT) {return false;} BOOL TxSetScrollRange(INT, LONG, INT, BOOL) {return false;} BOOL TxSetScrollPos(INT, INT, BOOL) {return false;} void TxInvalidateRect(LPCRECT, BOOL) {} void TxViewChange(BOOL) {} BOOL TxCreateCaret(HBITMAP, INT, INT) {return false;} BOOL TxShowCaret(BOOL) {return false;} BOOL TxSetCaretPos(INT, INT) {return false;} BOOL TxSetTimer(UINT, UINT) {return false;} void TxKillTimer(UINT) {} void TxScrollWindowEx(INT, INT, LPCRECT, LPCRECT, HRGN, LPRECT, UINT) {} void TxSetCapture(BOOL) {} void TxSetFocus() {} void TxSetCursor(HCURSOR, BOOL) {} BOOL TxScreenToClient(LPPOINT) {return false;} BOOL TxClientToScreen(LPPOINT) {return false;} HRESULT TxActivate(LONG *) {return E_UNEXPECTED;} HRESULT TxDeactivate(LONG) {return E_UNEXPECTED;} HRESULT TxGetClientRect(LPRECT) {return E_UNEXPECTED;} HRESULT TxGetViewInset(LPRECT) {return E_UNEXPECTED;} HRESULT TxGetCharFormat(const CHARFORMATW **) {return E_UNEXPECTED;} HRESULT TxGetParaFormat(const PARAFORMAT **) {return E_UNEXPECTED;} COLORREF TxGetSysColor(int) {return 0;} HRESULT TxGetBackStyle(TXTBACKSTYLE *) {return E_UNEXPECTED;} HRESULT TxGetMaxLength(DWORD *) {return S_OK;} HRESULT TxGetScrollBars(DWORD *) {return E_UNEXPECTED;} HRESULT TxGetPasswordChar(_Out_ WCHAR *) {return E_UNEXPECTED;} HRESULT TxGetAcceleratorPos(LONG *) {return E_UNEXPECTED;} HRESULT TxGetExtent(LPSIZEL) {return E_UNEXPECTED;} HRESULT OnTxCharFormatChange(const CHARFORMATW *) {return E_UNEXPECTED;} HRESULT OnTxParaFormatChange(const PARAFORMAT *) {return E_UNEXPECTED;} HRESULT TxGetPropertyBits(DWORD /* dwMask */, DWORD *pdwBits) {*pdwBits = TXTBIT_RICHTEXT | TXTBIT_MULTILINE; return S_OK;} HRESULT TxNotify(DWORD, void *) {return E_UNEXPECTED;} HIMC TxImmGetContext() {return nullptr;} void TxImmReleaseContext(HIMC) {} HRESULT TxGetSelectionBarWidth(LONG *) {return E_UNEXPECTED;}

The post Using RichEdit for Text Processing appeared first on Math in Office.

]]>The post Default Math Properties appeared first on Math in Office.

]]>You can change the default settings to suit your tastes or a publisher’s conventions. In the math ribbon (type Alt+= to insert a math zone and select the math ribbon tab), click on the Tools button over toward the left side of the ribbon. A dialog will be displayed that shows a variety of math properties along with buttons to access the math autocorrect and recognized-function dialogs.

The document default math properties in this dialog are described in a somewhat technical way in the math section of the RTF specification. The properties belong to the RTF {\mmathPr…} group. They are also children of the <mathPr> OMML element. In this post, I describe the properties in a less technical way. For easy reference to the RTF specification, the relevant RTF control word is listed in parentheses along with the corresponding ITextDocument property values. The dialog also has some options that are not document default math properties, such as “Copy MathML to the clipboard as plain text” instead of “Copy Linear Format to the clipboard as plain text.” Such options do not affect the layout of a document and hence are stored in the system registry rather than in the document.

The Word Equation/Conversions dialog gives a drop-down list of math fonts that can be used as the default math font for the document. Cambria Math and STIX Two Math MS are examples of math fonts. (RTF: \mmathFont*N*)

Specifies that nested fractions should be displayed such that the numerator and denominator are written in a script or script-script size instead of regular-text size. Specifically, characters in the outermost fraction’s numerator and denominator are displayed using the full text size, characters in a nested fraction are displayed in the script size (about 70% as large as the text size), and fractions nested inside a nested fraction are displayed in scriptscript size (about 60% as large as the text size). TeX uses this “small fraction” choice by default, but Word 2007 does not, basically because in all the physics books I’ve read I don’t remember seeing reduced sizes used in display math. But if you prefer them, you can change them. For in-line math expressions, small fractions are used. (RTF: \msmallFrac*N*; SetMathProperty—tomMathDispFracTeX)

By default, a line break occurs before the binary operator. That is, the binary operator is the first control word on the wrapped line. But you can change it so that a line break occurs after the operator (tomMathBrkBinAfter), or so that the operator is duplicated (tomMathBrkBinDup), that is, it appears at the end of the first line and at the start of the second. (RTF: \mbrkBin*N*; SetMathProperty)

If the minus operator (U+2212) coincides with a line break, by default the minus appears after the line break. But it can appear before and after the break (tomMathBrkBinSubMM) or a plus before the break and a minus after the break (tomMathBrkBinSubPM), or vice versa (tomMathBrkBinSubMP). (RTF: \mbrkBinSub*N*; SetMathProperty)

Limits of integrals in display-mode equations can be either centered above and below the integral or positioned just to the right of the integral. The default setting is to position the limits to the right of the operator (subscript/superscript). (RTF: \mintLim*N*; SetMathProperties—tomMathDispIntUnderOver)

Limits of summations, products, and other *n*-ary operators can be either centered above and below the *n*-ary operator, or positioned just to the right of the operator. The default setting is above and below the operator. (RTF: \mnaryLim*N*; SetMathProperties—tomMathDispNarySubSup)

In the United States, the differential d (ⅆ – U+2146) is almost always displayed as a math italic d, but in Europe, an upright d is standard. The latter choice emphasizes that the differential d is different from regular mathematical variables. Similarly, the Napierian logarithm base ⅇ (U+2147) and the imaginary unit ⅈ (square root of -1, U+2148) are displayed as math italic in the United States and upright in Europe. (RTF: \mdispDef*N*; SetMathProperties—tomMathDocDiffDefault, tomMathDocDiffUpright, tomMathDocDiffItalic, tomMathDocDiffOpenItalic).

Math object arguments may be optional, e.g., integral limits, or essential, e.g., numerator or denominator. To control what is displayed for empty arguments there are three possibilities: tomMathDocEmptyArgAuto, tomMathDocEmptyArgAlways (display ⬚), and tomMathDocEmptyArgNever. The first of these doesn’t display anything for optional arguments unless you use the arrow keys to move into the argument, at which point the ⬚ is displayed and you can enter math text. This setting is controlled by SetMathProperties() and isn’t persisted since it’s a UI property, not a document property. Note that if tomMathDocEmptyArgAlways is active, you can still display nothing by entering a zero-width space (U+200B).

Document properties to use the default math paragraph settings for equations, i.e., use values given by \mlMargin*N*, \mrMargin*N*, \mdefJc*N*, \mwrapIndent*N*, \mwrapRight*N*, etc., defined below. The default is to use the default math settings described below, but you can change it to use the text paragraph settings. (RTF: \mdispDef*N*; SetMathProperties—tomMathDispDef)

Document property for the left margin for math. Math margins are added to the paragraph settings for margins. (RTF: \mlMargin*N*; SetProperty—tomMathLMargin)

Right margin for math. (RTF: \mrMargin*N*; SetProperty—tomMathRMargin))

Document property for the default justification of displayed math zones. Individual equations can overrule the default setting. Displayed math zones can be left justified, right justified, centered, or centered as a group. When a displayed math zone is centered as a group, the equation(s) are ordinarily left aligned within a block, and the entire block is centered with respect to column margins. The user can use a context menu to align equations in more general ways, e.g., on the equal signs. (RTF: \mdefJc*N*; SetMathProperties—tomMathParaAlignCenterGroup, tomMathParaAlignCenter, tomMathParaAlignLeft, tomMathParaAlignRight)

Indent of wrapped line of an equation. The line or lines of a wrapped equation after the line break can either be indented by a specified amount from the left margin, or right-aligned. The default indent is 1”. (RTF: \mwrapIndent*N*; SetProperty—tomMathWrapIndent)

If enabled, right justify wrapped lines of an equation. If disabled, the line or lines of a wrapped equation after the line break are indented by **\mwrapIndent N** from the left margin. (RTF: \mwrapRight

(RTF: \mpreSp*N*; SetProperty—tomMathPreSpace).

(\RTF: mintraSp*N*; SetProperty—tomMathPostSpace).

(RTF: \minterSp*N*; SetProperty—tomMathIntraSpace).

(RTF: \mpostSp*N*; SetProperty—tomMathInterSpace).

The property tomMathZoneSurround (RTF: \mzSurround) inserts the spacing given by Value before and after math zones.

The Value argument for the property tomDocMathBuild can consist of any combination of the flags tomMathAutoCorrect, tomTeX, and tomMathAlphabetics. The flag tomMathAutoCorrect autocorrects using entries in the built-in math autocorrection list, which includes most standard TeX control words. The flag tomTeX uses [La]TeX build-up/down rules instead of UnicodeMath rules. The flag tomMathAlphabetics converts ASCII and lower-case Greek letters to math italic, math bold, and math bold-italic characters according to the UI bold and italic font settings. These values are used by math UI but are not persisted in file formats.

If a line break occurs at the invisible times (U+2063), ordinarily one would use the \times (× U+00D7) for a visible times character, but a raised dot is another possibility. This document property isn’t currently implemented.

MathML 4 introduces a new attribute called “intent”. This attribute allows you to disambiguate math notation. For example, it lets you specify whether a superscript is a power or an index. To reduce the number of places the intent attribute is needed, there is a set of document default intent values. With them, you only need the intent attribute when the instance differs from the default values.

Currently, MathML doesn’t formalize document defaults for math, but MathML math zones can inherit them from the host container, e.g., from HTML5. Such defaults are compatible with MathML and need to be stored outside the individual MathML <math> elements. In principle, MathML could have a new element <mdefaults> that contains the default properties. The container document would need to recognize this element and use the properties it contains. Making this a MathML element seems desirable since MathML is responsible for the needs of mathematics.

The post Default Math Properties appeared first on Math in Office.

]]>The post RichEdit Hyperlinks appeared first on Math in Office.

]]>The first autoURLs appeared in RichEdit 2.0, which shipped with Office 97, and have the usual web form, such as, http://www.msn.com. The permitted URL schemes were http:, file:, mailto:, ftp:, https:, gopher:, nntp:, prosper:, telnet:, news:, wais:, and outlook:. To include spaces in the URL, the whole URL had to be enclosed in an angle bracket pair as in <http://www.xxx.com/fun computing>. RichEdit 3.0, which shipped with Windows 2000 up through Windows 7, added the capability to recognize URLs of the form www.msn.com and ftp.unicode.org. RichEdit 4.1, which shipped with Windows XP up through Windows 7, added friendly name hyperlinks as well as autoURLs of the form \\word\richedit2\murrays. RichEdit 7, which shipped with Office 2010, added recognition for spaces in URLs without needing enclosure in <>. It also added recognition of telephone numbers, drive-letter paths, email addresses, and URLs enclosed in ASCII double quotes “”. It made all of these recognitions optional, since you might not want to recognize, for example, phone numbers, or you might want to recognize telephone numbers exclusively.

The recognition is dynamic, fast, and displayed by default with underline and a blue text color. The autoURL notifications can be sent to the client application by user actions such as typing the Enter key or clicking the left mouse button.

To enable or disable recognition of URLs and file paths in a RichEdit control, send the control the message EM_AUTOURLDETECT with lparam = 0 and wparam = 1 or 0, respectively. When autoURL recognition and link notifications are enabled, mouse movement over a link or clicking on a link sends an EN_LINK notification with the URL start and end character positions to the client.

More generally, wparam can have any combination of the following flags:

AURL_ENABLEURL | 1 | Recognize standard web URLs and file paths |

AURL_ENABLEEMAILADDR | 2 | Recognize email addresses |

AURL_ENABLETELNO | 4 | Recognize telephone numbers |

AURL_ENABLEEAURLS | 8 | Recognize East Asian URLs |

AURL_ENABLEDRIVELETTERS | 16 | Recognize file paths that start with a drive letter |

AURL_DISABLEMIXEDLGC | 32 | Disable mixed Latin Greek Cyrillic IDNs |

AURL_DISABLEAUTOFORMAT | 64 | Disable auto URL formatting |

AURL_URLRTFHTMLSTRICT | 128 | Only encode URLs defined in RTF/HTML source |

AURL_NOINITIALSCAN | 256 | Don’t scan doc when enabling autoURL reco |

AURL_ENABLEGETURL | 512 | Make ITextRange2::GetURL() return autoURLs |

AURL_ENABLEEAURLS is a preferred way to enabling East Asian URL recognition. For compatibility with older software, lparam = 1 also enables East Asian URL recognition. But lparam can be used instead to point to a client null-terminated string specifying URL scheme protocols. The string consists of URI scheme names each terminated by a ‘:’. See https://www.ietf.org/rfc/rfc2396.txt for validation criteria. The default string is “:callto:file:ftp:gopher:http:https:mailto:news:nntp:notes:onenote:outlook:prospero:read:tel:telnet:wais:webcal:”. The message EM_GETAUTOURLDETECT (WM_USER + 92) gets the flags, but not the scheme string.

In memory, autoURLs are identified by the CFE_LINK character formatting attribute. You can retrieve this attribute using the EM_GETCHARFORMAT or ITextFont2::GetEffects(). Alternatively, you can use tomLink unit in the TOM (Text Object Model) ITextRange::StartOf(), EndOf(), Expand(), Move(), MoveEnd(), and MoveStart() methods to navigate and select autoURLs and friendly-name links as well. ITextFont2::SetEffects(Value, Mask) with Value = 0 and Mask = CFM_LINK turns off autoURL detection for the range associated with the ITextFont2 (sets the link type to tomNoAutoLink) provided AURL_URLRTFHTMLSTRICT is active.

A friendly-name hyperlink has a name, which is displayed, and a hidden instruction part that contains the URL. Such hyperlinks are commonly used when an author wants to display an informative name for a link rather than the URL itself. It can be hard to read URLs these days what with all the protection built into them. So, friendly-name URLs are much nicer

A friendly name hyperlink is essentially a field with two parts: an instruction part containing the URL and a result part containing the name. In fact that’s the way it appears in RTF, which has the syntax {\field{\*\fldinst {HYPERLINK “…”}}{\fldresult{…}}} and in HTML with <a href=”*url*”>*name*</a>.

In RichEdit, a hyperlink is represented by character formatting effects, unlike by the delimiters used for math and other in-line objects. As such, hyperlinks cannot be nested, although friendly-name hyperlinks can be located next to one another. In contrast, autoURLs need to be separated by at least one character. The whole friendly-name hyperlink has the character formatting effects of CFE_LINK and CFE_LINKPROTECTED, whereas autoURLs only have the CFE_LINK attribute. The CFE_LINKPROTECTED is included so that the autoURL scanner skips over friendly-name links. The instruction part, i.e., the URL, has the CFE_HIDDEN attribute as well, since it’s not supposed to be displayed. The URL itself is enclosed in ASCII double quotes and preceded by the string “HYPERLINK “. Since CFE_HIDDEN plays an integral role in friendly-name hyperlinks, it cannot be used in the name.

For example, in WordPad, which uses RichEdit, a hyperlink with the name MSN would have the plain text

HYPERLINK “http://www.msn.com”MSN

The whole link would have CFE_LINK and CFE_LINKPROTECTED character formatting attributes and all but the “MSN” would have the CFE_HIDDEN attribute.

You can insert a friendly-name hyperlink by reading in the corresponding RTF or by sending the RTF in a WM_SETTEXT or EM_SETTEXTEX message. For the example above, the RTF could be

{\rtf1{\field{\*\fldinst{ HYPERLINK “http://www.msn.com”}}{\fldresult{MSN}}}}.

Note that if you encode a path name in the fldinst part, each backslash has to be doubled. In a C++ string, this means each backslash has to be quadrupled.

If the friendly name is the same as the URL, the link is converted to an autoURL unless the autoURL recognizer fails to recognize the URL completely. The reason for this conversion is so that the user can edit the URL and have it be the same as what gets launched when the user clicks on the URL. In Word, you can change the friendly name without updating the URL, which can be misleading in this case. The problem is mitigated in Word since Word has an edit-link dialog that shows both the URL and the friendly name. RichEdit is a component and doesn’t have dialogs, so it’s more secure to convert such links to autoURLs.

Using RTF to insert links works well, but there are also programmatic approaches. The ITextRange2:: SetURL (BSTR bstr) method applies the URL in the bstr to the range of text selected by the ITextRange2. The text in the bstr needs to start and end with ASCII double quotes. The SetURL() method inserts the word “HYPERLINK” in front of the URL. You can remove the link status from a friendly name hyperlink by calling SetURL() with a NULL bstr or one that has only the start and end quotes, signifying an empty string.

To retrieve the URL, select it and then call ITextRange2::GetURL(&bstr). That way the bstr doesn’t have HYPERLINK and quotes. To get the friendly name, use ITextRange2::GetText2(tomNoHidden, &bstr). Then the client can insert whatever surround characters it desires. If you call GetURL() for a URL like www.msn.com, it returns http://www.msn.com (assuming you have included the AURL_ENABLEGETURL flag in your EM_AUTOURLDETECT message).

As for autoURLs, the RichEdit client enables hyperlink notifications (EN_LINK) by sending RichEdit the ENM_LINK flag in the mask included with the EM_SETEVENTMASK message. The client can enable tooltips displaying the URLs by sending the EM_SETEDITSTYLE message with the SES_HYPERLINKTOOLTIPS (8) flag.

To find out what kind of link a range is in or selects, get an ITextFont2 from the range (call ITextRange2::GetFont2(ppFont)) and then call ITextFont2::GetLinkType. The value returned has the following semantics

tomNoLink | 0 | Not any kind of link |

tomClientLink | 1 | Client link |

tomFriendlyLinkName | 2 | Friendly name of a friendly-name link |

tomFriendlyLinkAddress | 3 | Address of a friendly-name link |

tomAutoLinkURL | 4 | Auto URL |

tomAutoLinkEmail | 5 | Email address |

tomAutoLinkPhone | 6 | Phone number |

tomAutoLinkPath | 7 | File path |

tomNoAutoLink | 15 | Auto link recognition suppressed |

If you only select part of the friendly name, the URL isn’t included. If you select the whole friendly name, the URL is included. If you select from inside the friendly name up through a character outside the friendly name, the whole link is selected along with whatever is selected outside the link

Consider the friendly hyperlink for the text “Hello” pointing to the URL “http://www.hello.com”. It has the following dispatch behavior:

- If the cursor is in the middle of the word “Hello” (say between “He” or “el” or “ll” or “lo”) and you hit Enter, an EN_LINK notification is sent and the client launches the link.
- However if the cursor precedes the “H” or follows the “o”, no EN_LINK notification is sent and an end of paragraph is inserted.

Word has the same behavior for the Enter key. The problem is that the Enter key is also used for inserting a paragraph break. So in these edge cases, a choice had to be made and the usual meaning of Enter prevailed. If there’s pointer access to a link, e.g., mouse or touch, the hyperlink can be launched easily that way.

The post RichEdit Hyperlinks appeared first on Math in Office.

]]>The post Setting and Getting Text in Various Formats appeared first on Math in Office.

]]>Option |
Value |
s/g |
Meaning |

tomUnicodeBiDi | 0x00000001 | s | Use Unicode BiDi algorithm for inserted text |

tomAdjustCRLF | 0x00000001 | g | If range start is inside multicode unit like CRLF, surrogate pair, etc., move to start of unit |

tomUseCRLF | 0x00000002 | g | Paragraph ends use CRLF (U+000D U+000A) |

tomTextize | 0x00000004 | g | Embedded objects export alt text; else U+FFFC |

tomAllowFinalEOP | 0x00000008 | g | If range includes final EOP, export it; else don’t |

tomUnlink | 0x00000008 | s | Disables link attributes if present |

tomUnhide | 0x00000010 | s | Disables hidden attribute if present |

tomFoldMathAlpha | 0x00000010 | g | Replace math alphanumerics with ASCII/Greek |

tomIncludeNumbering | 0x00000040 | g | Lists include bullets/numbering |

tomCheckTextLimit | 0x00000020 | s | Only insert up to text limit |

tomDontSelectText | 0x00000040 | s | After insertion, call Collapse(tomEnd) |

tomTranslateTableCell | 0x00000080 | g | Export spaces for table delimiters |

tomNoMathZoneBrackets | 0x00000100 | g | Used with tomConvertUnicodeMath and tomConvertTeX. Set discards math zone brackets |

tomLanguageTag | 0x00001000 | s/g | Sets BCP-47 language tag for range; gets tag |

tomConvertRTF | 0x00002000 | s/g | Set or get RTF |

tomGetTextForSpell | 0x00008000 | g | Export spaces for hidden/math text, table delims |

tomConvertMathML | 0x00010000 | s/g | Set or get MathML |

tomGetUtf16 | 0x00020000 | g | Causes tomConvertRTF, etc. to get UTF-16. SetText2 accepts 8-bit or 16-bit RTF |

tomConvertLinearFormat | 0x00040000 | s/g | Alias for tomConvertUnicodeMath |

tomConvertUnicodeMath | 0x00040000 | s/g | UnicodeMath |

tomConvertOMML | 0x00080000 | s/g | Office MathML |

tomConvertMask | 0x00F00000 | s/g | Mask for mutually exclusive modes |

tomConvertRuby | 0x00100000 | s | See section below on Entering Ruby Text |

tomConvertTeX | 0x00200000 | s/g | See LaTeX Math in Office |

tomConvertMathSpeech | 0x00300000 | g | Math speech (English only here) |

tomConvertSpeechTokens | 0x00400000 | g | Simple Unicode and speech tokens |

tomConvertNemeth | 0x00500000 | s/g | Nemeth math braille in U+2800 block |

tomConvertNemethAscii | 0x00600000 | g | Corresponding ASCII braille |

tomConvertNemethNoItalic | 0x00700000 | g | Nemeth braille in U+2800 block w/o math italic |

tomConvertNemethDefinition | 0x00800000 | g | Fine-grained speech in braille |

tomConvertHtml | 0x00900000 | s/g | Convert HTML |

tomConvertEnclose | 0x00A00000 | s | See section below on Entering Enclosed Text |

tomConvertCRtoLF | 0x01000000 | g | Plain-text paragraphs end with LF, not CRLF |

tomLaTeXDelim | 0x02000000 | g | Use LaTeX math-zone delimiters \(…\) inline, \[…\] display; else $…$, $$…$$. Set handles all |

tomGhostText | 0x04000000 | s | Set ghost text (used for text prediction) |

tomNoGhostText | 0x04000000 | g | Get text without ghost text |

Nonzero values within the mask defined by tomConvertMask (0x00F00000) are mutually exclusive, that is, they cannot be combined (OR’d) with one another. The options UnicodeMath, [La]TeX (tomConvertTeX), and Nemeth math braille (tomConvertNemeth) are also mutually exclusive. You can set only one at a time. But other options can be OR’d in if desired.

A string of Nemeth math braille codes in the Unicode range U+2800..U+283F can be inserted and built up by calling ITextRange2::SetText2(tomConvertNemeth, bstr). If the string is valid, you can get it back in any of the math formats including Nemeth math braille. For example, if you insert the string

⠹⠂⠌⠆⠨⠏⠼⠮⠰⠴⠘⠆⠨⠏⠐⠹⠨⠈⠈⠙⠨⠹⠌⠁⠬⠃⠀⠎⠊⠝⠀⠨⠹⠼⠀⠨⠅⠀⠹⠂⠌⠜⠁⠘⠆⠐⠤⠃⠘⠆⠐⠻⠼

you see

You can also input braille with a standard keyboard by typing a control word \braille assigned to the Unicode character U+24B7 (Ⓑ). (See LaTeX Math in Office for how to add commands to math autocorrect). The \braille command causes math input to accept braille input via a regular keyboard using the braille ASCII codes sometimes referred to as North American Braille Computer Codes. The character ~ (U+007E) disables this input mode. These braille codes are described in the post Nemeth Braille—the first math linear format and can be input using refreshable braille displays. Alternatively, such input can be automated by calling ITextSelection::TypeText(bstr). Just as in entering UnicodeMath, the equations build up on screen as soon as the math braille input becomes unambiguous. The implementation includes the math braille UI that cues the user where the insertion point is for unambiguous editing of math zones using braille. Note that as of this posting, the math braille facility isn’t hooked up to Narrator or other screen readers.

The tomConvertMathSpeech currently only gets math speech in English. Microsoft Office apps like Word, PowerPoint and OneNote deliver math speech in over 18 languages to the assistive technology (AT) program Narrator via the UIA ITextRangeProvider::GetText() function. Other ATs could also get math speech this way, although they usually get MathML and generate speech from that. Dictating (setting) math speech would be nice for both blind and sighted folks. Imagine, you can say 𝑎² + 𝑏² = 𝑐² faster than you can type it or write it! The SetText2(tomConvertMathSpeech, bstr) is ready to handle such input, but the feature is not available yet.

In a nonmath context, the option, tomConvertRuby (0x00100000), can be used to convert strings like “{…|…}” to ruby inline objects, where the first ellipsis represents the ruby text and the second ellipsis the base text. The ASCII curly braces and vertical bar are translated to the internal ruby-object structure characters U+FDD1, U+FDEF, and U+FDEE, respectively. Alternatively, the string can contain those structure characters directly. If a digit follows the start delimiter (‘{‘ or U+FDD1}, the digit defines the ruby options

rubyAlign val |
Meaning |

center (0) | Center <ruby> with respect to <base> |

distributeLetter (1) | Distribute difference in space between longer and shorter text in the latter, evenly between each character |

distributeSpace (2) | Distribute difference in space between longer and shorter text in the latter using a ratio of 1:2:1 which corresponds to lead : inter-character : end |

left (3) | Align <ruby> with the left of <base> |

right (4) | Align <ruby> with the right of <base> |

If you add 5 to these values, the ruby object will display the ruby text below the base text instead of above it. For example, calling ITextRange2::SetText2(tomConvertRuby, bstr) with bstr containing the string “{1にほんご|日本語}” inserts

The string can contain text in addition to ruby objects and the ruby objects can be nested to create compound ruby objects such as

The post Rounded Rectangles and Ellipses – Math in Office (microsoft.com) describes ways to enclose text in possibly rounded rectangles and ellipses. The SetText2(tomConvertEnclose, bstr) option is similar to the tomConvertRuby option. It converts strings like “{…}” to a tomEnclose object.

In addition to the ITextRange2::SetText2/GetText2(), the messages WM_SETTEXT, EM_SETTEXTEX, WM_GETTEXT, and EM_GETTEXTEX are useful. The set-text messages work with plain text or RTF in rich-text controls. EM_SETTEXTEX accepts both 16-bit RTF as well as 8-bit RTF, while WM_SETTEXT doesn’t handle 16-bit RTF.

The post Setting and Getting Text in Various Formats appeared first on Math in Office.

]]>Analog and button pushing

The first computer I ever used was an Electronics Associates analog computer at the Perkin Elmer Corporation where I worked as an intern in the summers of 1961—1963.

The post Computers I have known appeared first on Math in Office.

]]>The first computer I ever used was an Electronics Associates analog computer at the Perkin Elmer Corporation where I worked as an intern in the summers of 1961—1963. I wired up the machines to simulate aspects of the response of control systems that guided the balloon-borne Stratoscope II telescope. The telescope could be pointed to an accuracy of 0.1 arc seconds, which is the angle subtended by a dime at two miles. During my third summer, I saw an LGP-30 digital drum computer. One weekend I wanted to see what was going on, so I pushed a button that printed out the progress. That action apparently wrecked the whole weekend run much to the frustration of Bob Bernard who ran the machines and called me a button pusher. The LGP-30 was hardly more powerful than the analog computers, although it was suited to solving different problems. The drum had considerable latency, so programs had to be written carefully to catch the magnetized bits optimally. Unlike other computers of the day which used octal, the LGP-30 used hexadecimal with f g j k q w representing 10_{10} – 15_{10} instead of A B C D E F.

The following year I was a physics graduate student at Yale and learned how to program Fortran II on an IBM 709 computer. That computer used vacuum tubes and had 32768 36-bit words which could handle 6 characters in the BCD character set (0-9 A-Z +-.,()$*). Input was on IBM computer cards and output was on a printer or on a pen-and-ink plotter. We prepared our card decks with IBM 026 and later 029 keypunches. With the 029, you could insert a character by holding the source card and typing a character. In 1964, the Yale Computer Center upgraded to Fortran IV and an IBM 7094 computer which was made with discreet transistors, had a 2-microsecond machine cycle, and the same memory architecture as the IBM 709. Since compilations took appreciable time, I used to make simple changes in the binary cards using a keypunch. You had to kill the card check sum before resubmitting the card deck. You could punch more holes or fill up holes with chads that had been punched out. Amazingly the filled-in holes passed through the card reader without falling out. I learned enough assembly language to understand the machine-language 1’s and 0’s. I used the computer for calculating graphs in my PhD dissertation and papers on Zeeman laser theory. The Yale Computer Center also had an IBM 1401 computer with tape drives and an IBM 1620 computer which was a decimal machine. I didn’t use either except to read/write magnetic tapes.

One need for magnetic tape was to collect data for a Stromberg Carlson SC4020 Microfilm Printer & Plotter located at Bell Labs in Murray Hill, NJ. Marlan Scully, Willis Lamb, and I made what was likely the first computer movie (*Build up of laser radiation from spontaneous emission*) in 1965. You can see the movie by thumbing through the corners of Applied Optics circa 1970.

After finishing my PhD in June 1967, I went to Bell Labs in Holmdel, NJ to work as a post doc on laser theory. Bell Labs had an IBM 360 65, which used 8-bit bytes, EBCDIC character codes, and zipped along at 563 kips. The 7-bit ASCII character encoding came out in 1963, but I didn’t get to use it until 1973 on a DEC 10. Both character code standards have lower case, although Fortran IV was all upper case. The card decks required some JCL (job control language) which was sort of awkward and not used on later computers. After a year, I developed the SCROLL math display language and implemented it in Fortran IV. SCROLL was the first facility that formatted and displayed built-up equations on a computer. The notation was Polish prefix.

At the end of my two-year post doc, I was torn between joining the computer science department at Bell Labs in Murray Hill, NJ, and becoming an Assistant Professor of Optical Sciences at the University of Arizona. I went to the latter partly because Marlan, Willis, and I wanted to write a book on Laser Physics. The U of A had a CDC 6400 with 60-bit words, an 18-bit address space, 1 mips, and magically no JCL!

I found out about a special-projects program on computers hosted by the U of A Electrical Engineering department and volunteered to teach a course on comparative programming languages. After a year or so, I concluded that it would be good to formalize the program into a department of its own. I called Ralph Griswold, a Bell Labs colleague, and asked him if he’d be interested in such an endeavor. It was perfect timing. He had been interested in a change and creating a computer science department in an exotic location was compelling. See 50 Years of Computer Science 1971–2021.

Soon we had a Digital Equipment DEC 10 time-shared computer! You could dial in with a 110-baud teletype terminal, or better yet with a 300-baud CRT terminal. I never could abide 110 baud, but I used 300-baud connections for a while. Then I got access to a Tektronix 4010 graphics terminal which sped along at 9600 baud. That could fill up a 24-row screen in a mere second! And you could graph formulas on it. The DEC 10 had 36-bit words and an 18-bit address space. It also had an extended addressing capability consisting of multiple segments of 18-bit address spaces. A similar segment-offset architecture was used later in the Intel 286 microprocessor.

I spent 1975—1976 on sabbatical at the University of Stuttgart and the Max Planck Institute for Solid State Research and learned many things, one of which was that something called a microprocessor was being used in fledgling computers. On December 24, 1976, I bought for $2500 and assembled an IMSAI 8080 microcomputer kit. It had a whopping 48 KB thanks to a dynamic RAM card that one of my physics colleagues said would never be reliable. “Stick with the robust 8 KB static memory cards!” he urged. The IMSAI was like the Altair 8800 that Bill Gates wrote his famous 4K and 8K Basic interpreters on. The 4-MHz Zilog Z80 microprocessor was considerably more powerful than the 2-MHz Intel 8080, so I installed a Z80 processor card in the microcomputer’s S-100 bus. The IMSAI-8080 front panel has 22 switches and many LEDs. I rewired the front panel so that the 8 status LEDs could be controlled by software and set them up to display the contents of the memory byte pointed to by the address switches. I custom wire-wrapped most of the cards in the computer. There was a ROM with a 2 KB monitor program that let you examine and change memory. That program was the start for what evolved into my SST debugger. I added a CRT terminal, a modem, a floppy-disk drive, and a board with programmable relays to control the house lights and the front-door keypad. A friend who worked at a garage-door opener company down in Nogales, Sonora, gave me some garage-door openers that we used to control the house lights and the front door. A far cry from today’s smart phones! The whole system was hard wired since WiFi didn’t exist back in the 1970’s. One advantage of that was that it couldn’t be hacked (until I opened it up to remote control via modem). There were manual overrides for all functionality since I didn’t really trust computers! All programming was in tight Z80 assembly language.

64 KB sounds miniscule by today’s standards with our gigabytes and terabytes and subnanosecond machine cycles. But it was impressive how much we could do with so little. In addition to writing and printing books and papers, we could control experiments. A nifty example was Rick’s measurements of photon echo. For that, you subject a medium to two pulses of light separated by a time interval of *τ *and then watch for a light echo a time *τ* afterwards. But the experimental apparatus was very noisy, so the echo was drowned out if you only measured it once. If you measure it many times and add the results, the noise averages out to an overall flat background and the echo appears on top. But who wants to measure something thousands of times? Enter a microcomputer, which was happy to sit there and do so!

I got a Diablo daisy-wheel printer and wrote a program to send the printer proportionally spaced text. I used this approach to create the camera-ready pages of Rick Shoemaker’s and my first microcomputer book Interfacing Microcomputers to the Real World. That book describes the state of microcomputing at the time in detail. I enhanced the print program to handle mathematical text in multiple fonts using algorithms like those for the SCROLL language. Another physicist, Mike Aronson, who had written the PMATE editor I was using, suggested that the input format should resemble real linearized math as in the C language rather than the Polish prefix format used in SCROLL. So I wrote a translator to accept a simplified linear format, the forerunner of UnicodeMath which we use in Microsoft Office apps today. The translator was coded so tightly in Z80 assembly language that it along with the rest of the formatter fit into 16KB of ROM for a controller some friends of mine created for Diablo daisy-wheel printers. Those friends had also made the Z80 processor card in my IMSAI. When the printer was used with a tractor feed, it could print the whole document with one daisy, roll the document back, print with the next daisy, etc. It was positively wild watching the printer type the symbols into place after printing the main text.

In August 1981, IBM released a cool microcomputer that really surprised Rick and me. We figured that IBM wouldn’t get into microcomputing because it wouldn’t understand the market. IBM was into big machines, wrote its own software, supported a fancy sales force, had proprietary hardware, and didn’t collaborate with other companies. But the IBM PC was developed by a small independent group under Don Estridge in IBM Boca Raton, FL, that espoused open architecture and non-proprietary components and software. It used a 16-bit Intel microprocessor, the 8088, which is an 8086 with an 8-bit data bus, a 20-bit address space instead of the microcomputer industry’s 16-bit address space, and an optional 8087 floating-point processor. IBM had invented the floppy disk, but the PC used Tandon disk drives, and the PCs were sold in major outlets like ComputerLand and Sears Roebuck. The operating system was Microsoft’s MSDOS 1.0, which was an upgrade from the popular CP/M-80 microcomputer OS. It had Bill Gates 8K Basic interpreter stored in ROM in high memory. IBM documented the PC thoroughly as well. If you want lots of details, you can read Rick’s and my second microcomputer book *The IBM Personal Computer from the Inside Out*, also “typeset” on my Diablo daisy-wheel printer. With the PC, IBM was considerably ahead of the competition from the TRS-80, Apple II, and other microcomputers. Thanks to IBM’s excellent documentation, competitors emerged. One that I liked a lot was the Victor 9000. Its floppy disks held 1.2 MB compared to 360 KB on the IBM PC at the time. It had a cute cousin, the Apricot.

One of the many cool third-party IBM PC add-ons was the Hercules Graphics Card, which converted the 80 column by 25 row monochrome display card with 9×14 character cells into a 720×350 monochrome graphics card. Rick and Chris Koliopoulos copied the ROM Basic down into the upper 32K of the video space and modified it to support graphics.

IBM extended its PC lead in August 1984 with the IBM PC/AT, which used an Intel 80286 fully 16-bit processor with the ability to access up to 16 MB of memory in “protected mode”, considerably larger than the 8088’s 1 MB address space. It took a full year for the competition to create personal computers as powerful. My IBM AT had a 10 MB hard drive which was a great upgrade from the floppy disks and more than a third as large as the Model 1 1301 disk drive used with some IBM 7094 computers. I also got an HP laser printer, which HP released in April 1984. Being a laser physicist, I naturally enhanced my PS Technical Word Processor to work with it. So much easier than using multiple passes with a Diablo daisy-wheel printer!

On August 2, 1985, Estridge and his wife died in a plane crash caused by a strong thunderstorm near Dallas, Texas. That tragedy was a turning point for IBM’s microcomputer successes. Subsequent PC releases didn’t keep up with the competition from Compaq and other companies, possibly due to worries that other IBM computer systems might not survive the competition. Steve Jobs was never afraid to cannibalize his products. “If you don’t cannibalize yourself, others will” was his philosophy. The IBM PS/2 released in April 1987 was no match for the competition.

The software industry standardized on Compaq 386’s for a while. I used a Compaq 386 desktop computer and a Toshiba T5100 laptop to enhance my SST debugger to run in protected mode and access all of memory via the selector/offset memory model. In that way 80286 PC’s, which were more prevalent than 386 PC’s at the time, could access all their memory. That capability was key to getting Windows 3.0 to access all of memory and fend off OS/2. Rick and I updated our PC book in 1995 using Microsoft Word and renamed it to The Personal Computer from the Inside Out. Since then we’ve resisted the temptation to write more about the incredible evolution of microcomputers. Windows 95, an updated version of Windows 3.1, could run in as little as 4 MB of memory, although 8 MB was recommended. Nowadays 8 GB or more is recommended for a Windows laptop! Of course, today’s laptops can do so much more that the microcomputers running Windows 95. I almost never used a main frame after getting into microcomputers. The Data General Eclipse minicomputer was the biggest machine I used after 1975 and then only for a few years. The PC’s had all the power I needed.

The post Computers I have known appeared first on Math in Office.

]]>The post Microsoft 365 Modern Comments appeared first on Math in Office.

]]>The most powerful commenting experience to date was in desktop Word, which used Word text boxes for comments. This allowed users to use most Word features in Word comments. Meanwhile, RichEdit lacks many advanced Word features. So, some Word power users have found their workflows hampered or even broken. Initially, RichEdit lacked quite a few popular features, such as HTML interoperability, web image access, built-in autocorrect and autoformatting, considerable Word user interface (UI) functionality in multilevel lists, and so on. We have been working diligently on remedying the most grievous limitations.

On the other hand, RichEdit instances are much smaller and faster than Word text boxes which improves performance for documents that have many comments. Other apps don’t have all of Word’s editing power and it’s desirable to have a uniform experience across the apps. Another difference is that only the author sees comments until posted, as in chats. This behavior is hopefully less distracting in collaborative scenarios.

We have been enhancing RichEdit in the most requested areas, e.g., HTML conversion fidelity, image support, proofing (built-in autocorrect and most autoformatting options), multilevel-list UI, and more compatible hyperlink handling. PowerPoint has used RichEdit’s multilevel lists for many years, but PowerPoint has its own UI, and RichEdit’s multilevel list UI and accessibility have needed improvements. Together with the improvements needed for Notepad, RichEdit has been getting better and better .

More work remains, such as grammar checking, text prediction, tables, and math typography. RichEdit has as good or better math support as the Office apps, but math isn’t currently enabled in Modern-Comments. Math might be easy to support. RichEdit plain text represents math zones in UnicodeMath starting with a ‘⁅’ (U+2045) and ending with a ‘⁆’ (U+2046). Inserting the plain text into a RichEdit control and calling ITextRange2::BuildUpMath builds up the UnicodeMath to OfficeMath. RichEdit has been able to roundtrip OfficeMath in plain text this way ever since Office 2007. For example, copy a document with math zones into Notepad, copy it back to RichEdit, and call ITextRange2::BuildUpMath(). The math zones are restored. The ”⁅<UnicodeMath>⁆” plain-text syntax is similar to the way LaTeX starts math zones with “\[“ and ends them with “\]”. In addition, the math ribbon would need to be integrated, although math keyboard entry is faster if you know UnicodeMath or LaTeX.

The post Microsoft 365 Modern Comments appeared first on Math in Office.

]]>The post RichEdit Autoformatting appeared first on Math in Office.

]]>`--`

b to a—b, smart quotes, e.g., `"word"`

to “word”, and automatic bulleted/numbered lists. Such autoformatting is available in recent builds of RichEdit since it’s needed for the Office Modern Comments facility. Word and RichEdit have also had autoURL and auto table conversion for many years. The post RichEdit Emoticon Shortcuts – Math in Office (microsoft.com) describes the RichEdit option to convert common emoticon shortcuts to emoji.
Word also supports converting _…_ to italicize the text in the “…” and *…* to bold the text in the “…”. These and some of the other autoformatting notations are reminiscent of Markdown which was invented to enter popular HTML constructs easily in a readable notation. That’s the same rationale that’s behind UnicodeMath for entering math equations and expressions. So, maybe RichEdit should have a Markdown editor option that includes UnicodeMath .

RichEdit’s conversion of simple numeric fractions to Unicode fractions is described in the post Function to get Unicode Fractions – Math in Office (microsoft.com). It supports the whole Unicode fraction repertoire, ↉ ½ ⅓ ¼ ⅕ ⅙ ⅐ ⅛ ⅑ ⅒ ⅔ ⅖ ¾ ⅗ ⅜ ⅘ ⅚ ⅞, whereas Word only handles ½ ⅓ ¼ and ¾. You can create arbitrary numeric fractions using the Unicode superscript and subscript digits along with the U+2044 fraction slash. For example, ⁴⁵⁶⁄₇₈₉₀. Admittedly it’s not great typography, but it works. To enter such characters, it’s handy to have the Alt+x hot key. You can also enter ⅟₁₆, ⅟₃₂, ⅟₆₄, ⅟₁₂₈, and ⅟₂₅₆, where I use ⅟ (U+215F) for the numerator and fraction slash.

To enable RichEdit autoformatting options, send the message EM_SETAUTOFORMAT with wparam = acflgAutoFormat and lparam = 0. The acflgAutoFormat flag can be combined with other flags to disable autoformatting options. The flags and messages are defined by

acflgAutoFormat = 1 acflgNoSmartQuotes = 2 acflgNoSuperscript = 128 acflgNoEmDash = 256 acflgNoFraction = 1024 acflgNoBulletedLists = 8192 acflgNoNumberedLists = 16384 EM_GETAUTOFORMAT = WM_USER + 392 EM_SETAUTOFORMAT = WM_USER + 393

For the time being, also send EM_SETMSOAUTOCORRECT (WM_USER + 391) with wparam = 1 and lparam = 0. This message enables the RichEdit integration of the shared MSO autocorrect facility. As such, the current RichEdit autoformatting code requires MSO to be loaded, even though the autoformatting code doesn’t use it.

The post RichEdit Autoformatting appeared first on Math in Office.

]]>The “\rect(a+b)” is the UnicodeMath representation of this boxed formula.

The post Rounded Rectangles and Ellipses appeared first on Math in Office.

]]>Mobile versions of Excel needed custom rounded boxes with background color for cell tokens in the formula bar. Accordingly, the RichEdit boxed-formula implementation was enhanced to offer that feature along with the ellipse. The Direct2D line styles are supported, and the line and background colors can be specified by COLORREFs. This post describes the user interfaces (UI) and APIs for creating such objects in a RichEdit control. Examples include

In Excel, it turned out that having a math zone in the formula bar complicated font binding. So RichEdit offers the same functionality in an Enclose object, which works in ordinary text.

Let’s start with the UI entry in a math zone. To insert an empty rectangle object, you can use UnicodeMath. For this note that each math object is associated with its own Unicode character. The character for the square rectangle is U+25AD, ▭. In a math zone, the text \rect autocorrects to ▭. The characters for the rounded box and ellipse are U+25A2 (▢) and U+2B2D (⬭), respectively. In the RichEdit math autocorrect, you can type \rrect for ▢ and \ellipse for ⬭. But you can also just type the hex character code followed by alt+x to insert the character. Then put the text you want inside the parentheses and hit the Space bar to build it up into the object as illustrated above. The colors for the border and background are taken from the text and background colors in the character format for the opening character of the rectangle object. Currently the Enclose object can only be inserted via copy/paste or the ITextRange2::SetInlineObject() function described below.

To understand the APIs for inserting and manipulating a math object, here’s a quick summary of how such objects are represented in the RichEdit backing store. In OfficeMath built-up format, as distinguished from a linear format like UnicodeMath or LaTeX, mathematical objects like fraction and boxed formula are represented by a start delimiter, the first argument, an argument separator if the object has more than one argument, the second argument, etc., with the final argument terminated by the end delimiter. For example, the fraction 𝑎 over 𝑏 is represented in built-up format by {_{frac} 𝑎|𝑏} where {_{frac} is the start delimiter, | is the argument separator, and } is the end delimiter. Similarly, the subscript object 𝑎_{𝑏 }is represented by {_{sub} 𝑎|𝑏}. Here the start delimiter is the same character for all math objects and is the Unicode character U+FDD0 in RichEdit (Word uses a different character). The kind of object is specified by a rich-text object-name property associated with the start delimiter. So in plain text, the built-up forms of the fraction and subscript are identical if the fraction arguments are the same as their subscript counterparts. In the example here, a plain-text search for {_{frac} 𝑎|𝑏} matches {_{sub} 𝑎|𝑏} as well {_{frac} 𝑎|𝑏}. A rich-text search can distinguish between the two.

To insert an empty math object into a RichEdit control, call ITextRange2::SetInlineObject(Type, Align, Char, Char1, Char2, Count, TeXStyle, cCol). In particular, to insert one of the enclosures in a math zone, call

ITextRange2::SetInlineObject(tomBoxedFormula, Align, Char, Char1, 0, 0, 0, 0),

where Char = ▭, ▢, or ⬭ produce a square rectangle, a rounded rectangle, or an ellipse, respectively. To insert such an enclosure in ordinary text, call

ITextRange2::SetInlineObject(tomEnclose, Align, Char, Char1, 0, 0, 0, 0)

To change the border and/or background colors, select the object start character and apply the formatting via EM_SETCHARFORMAT, EM_SETRANGEFORMAT, or ITextFont::SetForeColor() and SetBackColor().

The various border styles are determined by the Align argument. For the rectangle, the Align values are defined by any combination of OR’ing bits given by

tomBoxHideTop | 1 |

tomBoxHideBottom | 2 |

tomBoxHideLeft | 4 |

tomBoxHideRight | 8 |

tomBoxStrikeH | 16 |

tomBoxStrikeV | 32 |

tomBoxStrikeTLBR | 64 |

tomBoxStrikeBLTR | 128 |

Here the low eight bits of Align control which sides are hidden (if any) along with whether four possible strike-throughs are drawn. Since all eight bits were defined for these purposes before, they are persisted in most existing file formats along with the Char code. The UnicodeMath notation that defines the Align value 𝑛 for a string 𝑥 is ▭ (𝑛&𝑥).

For the rounded rectangle, the Align values are defined by

tomRoundedBoxDashStyleMask | 0x07 | D2D1_DASH_STYLE |

tomRoundedBoxHideBorder | 0x08 | |

tomRoundedBoxCapStyleMask | 0x30 | D2D1_CAP_STYLE * 16 |

tomRoundedBoxNullRadius | 0x40 | |

tomRoundedBoxCompact | 0x80 |

Here bits 0..2 of Align give the Direct2D D2D1_DASH_STYLE, and bits 4..5 give the D2D1_CAP_STYLE. For example, Align = D2D1_DASH_STYLE_CUSTOM (5) gives the “input 1” example above. The radius (in twips) of the rounded corners of a rounded rectangle is given by Char1. The Office MathML converters don’t support the rounded rectangle and ellipse yet, even though MathML does.

The post Rounded Rectangles and Ellipses appeared first on Math in Office.

]]>The post OfficeMath appeared first on Math in Office.

]]>So, we call it *OfficeMath*. “Office” alludes to Microsoft Office but needn’t be exclusive. “Office” suggests a high-quality level (okay, maybe I’m biased ). OfficeMath might suggest calculations rather than math text, but documentation can resolve that ambiguity, which also exists for the linear formats AsciiMath and UnicodeMath. The heart of OfficeMath is its in-memory model, named “Professional” in the OfficeMath UI. This model is mirrored in the OMML file format. It features *N*-ary structures such as integrals with limits *and* integrands, subscripts, superscripts and accents with well-defined bases, and math functions with function names and arguments. This level of detail is ordinarily reserved for content math formats such as Content MathML and OpenMath. OfficeMath incorporated these structures to support high-quality math typography, with the nice side effect of facilitating symbolic manipulations and graphing (OneNote Math Assistant). This post summarizes OfficeMath’s history, model, file format support, interoperability, math font, math formatting, user interfaces, and includes links to further information in OfficeMath-oriented posts in Math in Office.

Editing Math using Ribbon, Dialogs, Context Menus

A good place to learn about the origins of OfficeMath is the post LineServices, which tells how the LineServices line-layout component came to be and how it evolved to yield TeX-quality math typography. OfficeMath depends on other technologies as well, including the creation of the math-font OpenType standard described in High-Quality Editing and Display of Mathematical Text in Office 2007 and OpenType Math Tables. For older history, the post How I got into technical WP describes the first math display program (Scroll, 1970) and predecessors of UnicodeMath.

OfficeMath was based on Unicode from the start. Unicode 3.2 (March 2002) already had most of the current Unicode math character set. The Unicode Technical Committee is committed to including all attested math symbols in the Unicode Standard, so Unicode makes an ideal foundation on which to build math functionality. It also streamlines incorporation into Microsoft Office applications, since they are based on Unicode.

As with [La]TeX, MathML, MathType, and most other math presentation formats, OfficeMath puts math expressions and equations into *math zones*. Math-zone typography differs from the typography of ordinary text (see the section on Formatting below). The user creates a math zone with the Alt+= hot key or inserts one from the ribbon Insert tab.

In the OfficeMath in-memory format, mathematical objects like fraction and subscript are represented by a start delimiter, the first argument, an argument separator if the object has more than one argument, the second argument, etc., with the final argument terminated by an end delimiter. For example, the fraction 𝑎/2 is represented in built-up format by {_{frac} 𝑎|2} where {_{frac} is the start delimiter, | is the argument separator, and } is the end delimiter. Similarly, the subscript object 𝑎₂ is represented by {_{sub} 𝑎|2}. The start delimiter is the same character for all math objects as are the separator and end delimiters. In RichEdit, these delimiters are given by the Unicode characters U+FDD0, U+FDEE, and U+FDEF, respectively. In OMML, the start delimiter is represented by a container element, such as <f> for fraction and arguments appear within argument element containers, such as <num>…</num> for a numerator.

The type of object is specified by a character-format property associated with the start delimiter. In plain text, the built-up forms of the fraction and subscript are identical if the fraction arguments are the same as their subscript counterparts. In the example here, a plain-text search for {_{frac} 𝑎|2} matches {_{sub} 𝑎|2} as well as {_{frac} 𝑎|2}. Searching for OfficeMath equations involves plain-text searches like this together with comparison of the object types. The OfficeMath math objects are listed in the table in the next section along with their OMML and Presentation MathML representations. The objects are represented by prefix notation: the character formatting of the object start delimiter contains the object properties (see ITextRange2::GetInlineObject()). This differs from infix notation like a/b, which needs to be parsed. The OfficeMath in-memory format is a “built-up” format as distinguished from linear formats like UnicodeMath and LaTeX.

The OMML format is the XML format that encapsulates the OfficeMath in-memory “Professional” format. When OfficeMath was designed, Presentation MathML 3.0 was nearing publication. But Presentation MathML is missing two important elements which therefore require <mrow> emulations to represent OfficeMath. Specifically, Presentation MathML doesn’t have an explicit *N*-ary element, nor does it have an explicit math-function element. Furthermore, OfficeMath needs to embed client (Word, PowerPoint, Excel, …) XML easily into the math XML. The MathML <semantics> element can embed such information, but it’s awkward. Accordingly, OMML was created to describe the OfficeMath in-memory format naturally. With best practices, MathML without the <semantics> element can be used to round-trip OfficeMath equations apart from non-math formatting like revision markings and embedded objects.

Here is a listing from MathML and Ecma Math (OMML) of the OMML elements and exact or approximate MathML counterparts

Built-up Office Math Object |
OMML tag |
MathMl |

Accent | acc | mover/munder |

Bar | bar | mover/munder |

Box | box | menclose (approx) |

Boxed Formula | borderBox | menclose |

Delimiters | d | mfenced or corresponding <mrow>… |

Equation Array | eqArr | mtable (with alignment groups) |

Fraction | f | mfrac |

Math Function | func | mrow with FunctionApply (2061) mo |

Left SubSup | sPre | mmultiscripts (special case of) |

Lower Limit | limLow | munder |

Matrix | m | mtable |

N-ary |
nary | mrow msubsup/moverunder with N-ary mo |

Phantom | phant | mphantom and/or mpadded |

Radical | rad | msqrt/mroot |

Group Char | groupChr | mover/munder |

Subscript | sSub | msub |

SubSup | sSubSup | msubsup |

Superscript | sSup | msup |

Upper Limit | limUpp | mover |

Other OMML references are Extracting OMML from Word 2003 Math Zone Images and OMML Specification, Version 2.

More MathML discussion is given in MathML 3.0, Improved MathML support in Word 2007, Rendering MathML in HTML5, and MathML on the Windows Clipboard.

Mathematical RTF is essentially OMML in RTF syntax. See also Office Math RTF and OMML Documentation and Updated RTF Specification.

Linear Format Notations for Mathematics include UnicodeMath and LaTeX Math in Office. See also Recognizing LaTeX Input in UnicodeMath Input Model.

Major interoperability is afforded via Presentation MathML and [La]TeX math. In addition, the Design Science MEE and MathType equations can be converted to OfficeMath as described in Converting Microsoft Equation Editor Objects to OfficeMath. MathType can convert OfficeMath to MathType equations. These equation facilities are compared in Equation-Editor Office-Math Feature Comparison and Other Office Math Editing Facilities. The latter also compares them to the Microsoft Word EQ Field.

With a bit of effort, equations can be imported into Office applications from Wikipedia. You can also create HTML documents with equations in them.

A basic part of OfficeMath is the Unicode OpenType math font. The first such font, Cambria Math, and the OpenType math tables were developed together with the Office 2007 math software, each influencing the other to obtain high quality results. Some history is given in the post High-Quality Editing and Display of Mathematical Text in Office 2007. The font contains extensive math tables, glyph variants and glyphs for most of the Unicode math character set. The tables were incorporated into the OpenType standard as noted in OpenType Math Tables. Posts elaborating on the math font are Special Capabilities of a Math Font and High Fonts and Math Fonts.

Cambria Math and Cambria are serifed fonts designed to look good on digital displays. As such, the stem widths never get skinny, in contrast to Times Roman fonts. If you prefer, the STIX math font is a Times Roman font that includes the OpenType math table support and works with OfficeMath. It might be good to add a variable-font weight axis to math fonts for this purpose.

This section discusses how OfficeMath handles math formatting involving math spacing, math styles, and alignments, and gives links to posts with further information. A math zone is defined by the math-zone character-format effect, an effect like bold or italic. As such, this is a non-nestable property, unlike math objects like fractions, which can be nested arbitrarily deeply. Adjacent math zones automatically merge into a single math zone.

An essential part of good math typography is math spacing. Within a math zone, OfficeMath follows the math spacing rules given in Appendix G of *The TeXbook* plus some enhancements that weren’t added to TeX for reasons of archivability. Section 3.16 of UnicodeMath summarizes the rules for the most common situations. Also see User Spaces in Math Zones for ways that OfficeMath autocorrects typical user input spacing errors. Two Math Typography Niceties shows how phantom objects can improve math spacing beyond the standard spacing rules.

Math bold and math italic define different math variables in math zones (𝐚 ≠ 𝑎 ≠ a ≠ 𝒂), while in ordinary text, bold and italic are used for emphasis. In math zones, math bold and math italic characters are different Unicode alphanumeric characters, while in ordinary text, bold and italic are character format attributes with no change in character codes. For example, 𝐚 is U+1D41A, 𝑎 is U+1D44E, a is U+0061, and 𝒂 is U+1D482. Even though the math and ordinary-text uses of bold/italic are unrelated semantically, the user can control these math styles using the usual bold and italic UI as described in Using Math Italic and Bold in Word 2007. There are other math styles that yield still different mathematical variables, such as open-face, script, Fractur, and sans serif (see Section 2.2 of Unicode Technical Report #25). In general, character formatting is controlled in math zones as described in Restricted Math Zone Character Formatting. In informal documents, people may want to use sans-serif characters instead of serif characters for aesthetic reasons rather than for defining different variables. Currently OfficeMath doesn’t support this choice, but maybe it should.

Occasionally one needs to embed ordinary text, such as words, into math zones. OfficeMath defines a character format attribute “ordinary text” for this purpose. Text with this attribute uses standard character formatting for italic, bold, etc. Unless the “ordinary text” attribute is active, the bold and italic settings only affect math alphanumerics; ASCII digits, punctuation, operators, and non-math characters are all rendered nonbold and upright.

In addition, OfficeMath has a “no-build-up” attribute to treat operator characters literally rather than use them in build-up translations. For example, if ‘/’ is marked with this attribute, build up in UnicodeMath mode leaves it as the character ‘/’ rather than converting it with the arguments around it into a built-up “stacked” fraction. This attribute can be entered in UnicodeMath by “quoting” the operator, namely preceding the operator by a backslash.

Since math zones are one level deep, you can embed ordinary text into a math zone, but you can’t nest a math zone within that ordinary text or elsewhere within the math zone. This hasn’t proven to be a limitation, although TeX can embed ordinary text inside math zones and nested math zones inside the ordinary text. It always seems to be possible to unwrap such nested math-zone scenarios into unnested math zones.

It’s useful to be able to define math properties for an entire document, rather than specify them for each math zone. This is described in Default Document Math Properties. A new property could be defined to use sans-serif math characters instead of serif characters.

There are two kinds of math zones: inline and display. For example, an inline math zone in TeX has the form $…$ and a display math zone has the form $$…$$. Inline math zones use reduced spacing and character sizes to make expressions fit better in line with normal text. In OfficeMath a display math zone starts at the start of a document or follows a hard or soft paragraph end (U+000D or U+000B, respectively) and ends with a hard or soft paragraph end. In some cases, it would be useful to apply display math-zone formatting to inline math zones, but this isn’t currently available.

Inter-equation alignment and line breaking involve multiple lines. To handle these cases and equation numbering, OfficeMath has the Math Paragraph, while MathML uses tables and MathType uses PILEs. A math paragraph is a sequence of one or more display math zones separated by soft paragraph ends (U+000B). Line breaking can be automatic or manual as described in Breaking Equations into Multiple Lines. Background on paragraph formatting is given in Paragraphs and Paragraph Formatting.

In a document with more than a few equations, it’s useful to number equations referred to from elsewhere in the document. The math paragraph has elegant equation-number support, but it hasn’t been exposed beyond prototyping. The earliest way to handle equation numbering is described in Cool Equation Number Macros for Word 2007. Later ideas are in More on Equation Numbering and equation numbering using equation arrays is described in Equation Numbering in Office 2016. This last approach isn’t quite as convenient as the ideal math-paragraph equation numbering, but it can handle virtually all cases.

OfficeMath UI can be grouped into keyboard, menu/ribbon, ink, and accessibility categories. Let’s consider each of these in turn. The keyboard, menu/ribbon, and ink categories are discussed in Chapter 6 of the book *Creating Research and Scientific Documents with Microsoft Word*.

A succinct summary of entering and editing math with a keyboard is given in the original blog’s first post, Formula Autobuildup in Word 2007. Basically, type the hot key Alt+= to insert a math zone and then type math using TeX control words for symbols. For example, in UnicodeMath mode, typing a/b=c inserts

The UnicodeMath syntax resembles that used in programming languages except that it uses many Unicode operators. Naturally there’s much more to math than symbols and fractions, and the keyboard input methods are described in UnicodeMath for the Unicode input method and in LaTeX/TeX input method for the LaTeX/TeX input method.

In UnicodeMath mode, build up to the “Professional” format is automatic as described in When Formula Autobuildup Occurs. In Word’s LaTeX mode, you must request build up. Enter Ctrl+= to build up a math zone into “Professional” format and Shift+Ctrl+= to build the math zone down into the current linear format (UnicodeMath or LaTeX). Or you can click on the corresponding options of the math-zone acetate rectangle.

In addition to the LaTeX/TeX control words, there are operator shortcuts described in Math Keyboard Shortcuts, Negated Operators, Keyboard Operator Shortcuts, Entering Unicode Characters, and Klinke’s Streamlined Math Input Notation. For example, /= autocorrects to ≠ and <= to ≤. Subscripts and superscripts are entered using _ and ^, respectively as discussed in Section 2.2 of UnicodeMath and in Keyboard Entry of Subscripts and Superscripts. Nice things to add include making the leading backslash optional and having an autocomplete drop-down menu of possible control words once you’ve entered the first few characters. For example, many control words start with \left and it would be nice to be able to select the desired one rather than type in the whole word like \leftrightarrow for .

In LaTeX mode, the subscript, superscript, numerator, and other math arguments are single entities. An entity can be a character or control word for a character like \alpha for α, or it can be an expression in curly braces like {a+b}. In UnicodeMath mode, the argument can be a sequence of alphanumeric characters. You can see such a difference by comparing what a^12 becomes: in LaTeX you get 𝑎¹2 and in UnicodeMath you get 𝑎¹². To get the latter in LaTeX input mode, enter a^{12}.

Unicode has many math characters (see Section 2 of Unicode Technical Report #25, Unicode Support for Mathematics). The post Math Symbol Hierarchy divides the math operator symbols into basic, intermediate, and full Unicode math categories. Most technical papers use the symbols in the basic and intermediate categories. The remaining characters are very specialized, e.g., ⪑, so you’ll probably never need them.

Built-up math zones convert alphabetic characters to math alphabetic characters, e.g., ‘a’ becomes ‘𝑎’, which is given by the Unicode character U+1D44E. Conversion to math alphabetic is overruled for special situations like trigonometric function names and can be overruled for arbitrary text. Also, it doesn’t occur for Greek upper-case letters as noted in Math Greek Letters. Math spacing is important and User Spaces in Math Zones explains how UnicodeMath build up may remove a space that’s automatically inserted by math spacing rules. In LaTeX mode, spaces are ignored except to terminate control words.

You can navigate through a math zone Using Left/Right Arrow Keys in Mathematical Text or you can use a mouse. Math Selection is like selection of ordinary text, but if you select a math object start/end/separator delimiter, the whole object is selected. Up and down-arrow keys try to go to the logical target, e.g., up arrow in the denominator of a fraction goes to the numerator. In navigating and selecting text, it’s useful to understand the concept of the Text Insertion Point. The insertion point is *in between* characters, not on top of a character.

You can enter accented characters as discussed in Math Accents and in Representation of Math Accents. You can enter matrices as discussed in Entering Matrices. If you want to line up two or more equations just right, see Equation Arrays.

In OfficeMath, empty numerators, denominators, subscripts, superscripts, and other essential arguments, etc., display the place-holder character ⬚. If you want to hide the ⬚, insert a “zero-width space” given by the Unicode character U+200B as discussed in The Invisibles. In OneNote you can edit optional arguments. These arguments are normally not shown, but you can move inside them by using the left/right arrow keys. When the IP is inside an optional argument, the ⬚ is displayed and you can enter characters. For example, you can convert a square root into an *n*^{th} root by navigating into the root’s index argument and typing n. To make such changes in Word or PowerPoint, you need to use a context-menu option.

If you become familiar with keyboard entry, you’ll probably find that the fastest way to enter math (see also the Ink section next). But admittedly, it’s not obvious how to enter many things. The math ribbon displays lots of math objects in readily clickable form. As such it provides easily discoverable ways to enter common mathematical expressions. For a comparison of keyboard and ribbon, see Math Ribbon Entry of Subscripts and Superscripts.

Math Context Menus provide context-sensitive ways to modify math objects, such as changing a stacked fraction into a slashed fraction, or aligning a set of equations at their equal signs. See also More on Math Context Menus. You can use the Office Insert Symbol Dialog to insert any Unicode character including all Unicode math symbols. The more common math symbols can be inserted using the symbol galleries on the math ribbon. You can also insert many math symbols using the Windows+. hot key.

Smart phones running OfficeMath don’t sport a math ribbon, but a math on-screen keyboard could let you enter lots of math entities easily. Think of exposing math symbols instead of emoji and using surround menus. Also, smart phones can work with ink…

You can enter equations with a pen as described in OneNote Math Assistant and the links therein. Microsoft’s math ink recognition first shipped in Windows 7 with the applet called the Math Input Panel. This applet lets you enter mathematical text using a pen or a mouse. It recognizes what you enter and displays the result using a private version of RichEdit. It also lets you copy the results to Word, Mathematica, or any other application that reads Presentation MathML.

Many people may find that writing equations by hand is the easiest and fastest way to enter them into a computer. Since I’ve made similar claims for UnicodeMath entry, a colleague of mine and I decided to have a race. I chose nine equations from theoretical physics, and we started entering. The colleague entering via handwriting beat me by a nose, but had two errors, whereas I had none. But really, we both won, since we demonstrated that we could enter equations into Word remarkably fast.

Math accessibility falls into two categories: speech and braille. Microsoft Office Math Speech shipped in over 18 languages in January 2017. As described in Speaking of math…, math speech has two granularities: coarse-grained for fluent speech and fine-grained for editing. Together with touch typing on a keyboard, this combination enables a blind, nondeaf person to consume and edit math, both elementary and advanced.

The OfficeMath speech capability could be extended in useful ways such as offering alternate speech as discussed in Speaking Subscripts, Superscripts, and Fractions. Also, the facility “spoon feeds” the math speech to UI Automation. Some Assisted Technologies (ATs) such as NVDA and JAWS would like to get MathML for math zones and generate the math speech (and braille) themselves. Ways to do this will be the subject of a future post. Interestingly MathML can, in principle, be used both for generating math speech *and* for editing math as discussed in MathML and OMML User Selection Attributes and Editing Math using MathML for Speech.

Key infrastructure for math braille shipped in August 2017, namely the RichEdit build up/down machinery used by OfficeMath applications added support for entering and editing math using Nemeth Braille—the first math linear format. More work is needed for applications to expose math braille to end users. The main reason for using Nemeth math braille is given in Braille for Math Zones, which points out that the usual braille digit code ambiguities don’t exist in math zones, which is where the math is. Specifically, braille contractions aren’t used in math zones, so digits can be represented unambiguously using computer braille codes; no numerical indicator is needed for digits in Nemeth math zones (aside from an obscure case). Nemeth braille in math zones works with all languages (is globalized), whereas braille in ordinary text is localized to the language being used.

Other posts describing work on math braille include Unicode – Nemeth Character Mappings, which discusses extending the Nemeth specification to include many Unicode math symbols not in the current Nemeth specification and Nemeth Braille Alphanumerics and Unicode Math Alphanumerics, which relates how the Unicode math alphanumerics can be represented using Nemeth braille. The post Math Braille UI describes ways to reveal the math insertion point (IP) using a refreshable braille display. The braille IP location is complicated relative to that for ordinary text in that math structure characters described in OfficeMath aren’t always represented by a Nemeth code. For fractions, they are, but the start delimiter of a subscript object, for example, isn’t present in the Nemeth code.

Math dictation would be another math input method for blind and sighted users alike. Imagine, you can say 𝑎² + 𝑏² = 𝑐² faster than you can write or type it! Math dictation would work with all devices, computers, tablets, and phones. Hopefully someday…

The post OfficeMath appeared first on Math in Office.

]]>