April 5th, 2023

Math Dictation

Murray Sargent
Principal Software Engineer

You can dictate \(a^2+b^2=c^2\) faster than you can write or type it, so math dictation can be handy for anyone working with math, notably on mobile devices. It can also make math more accessible. Math speech is similar to UnicodeMath, which you can use to enter equations into Word, PowerPoint, and other apps. Accordingly, we translate English math speech as recorded via Office dictation into UnicodeMath and build it up into OfficeMath. Currently we can dictate equations in English from algebra, trigonometry, and calculus into OneNote and PowerPoint.

Examples of math dictation and the resulting OfficeMath are

Math Dictation Resulting OfficeMath
A squared plus b squared equals c squared

\(a^2+b^2=c^2\)

Integral from minus infinity to infinity of e to the minus x squared dx equals square root of pi

\(\displaystyle\int_{-\infty}^\infty e^{-x^2} dx=\sqrt\pi\)

limit as N goes to infinity of left paren 1 + 1 over n right paren to the N equals e

\(\displaystyle\lim_{n\rightarrow\infty}{\left(1+\frac{1}{n}\right)^n}=e\)

sine squared x plus cosine squared x equals one

\(\sin^2{x}+\cos^2{x}=1\)

derivative of f of x with respect to x = second derivative of f of x with respect to x = second partial derivative of f of x,y with respect to x = 0

\(\displaystyle\frac{df\left(x\right)}{dx}=\frac{d^2f\left(x\right)}{dx^2}=\frac{\partial^2f\left(x,y\right)}{\partial x^2}=0\)

One over two pi space integral from zero to 2π of D theta over begin a + b sine theta end equals one over square root of begin a squared – b squared end

\(\displaystyle\frac{1}{2\pi}\int_{0}^{2\pi}\frac{d\theta}{a+b\sin{\theta}}=\frac{1}{\sqrt{a^2-b^2}}\)

left paren a plus b right paren to the n equals sum from k = 0 to n of left paren n a top k right paren a to the k space b to the begin n – k end

\(\displaystyle\left(a+b\right)^n=\sum_{k=0}^{n}{\binom{n}{k}a^kb^{n-k}}\)

X equals begin minus B plus or minus square root of begin b ^2 – 4 A C end end over 2A

\(\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}\)

Absolute value of x equals cases if x greater than or equal to 0, ampersand x next if x less than 0, ampersand -x close

\(\displaystyle|x|=\cases{\mathrm{if}\,x≥0,\,x\cr \mathrm{if}\,x<0,\,-x}\)

Del cross bold cap e equals minus partial derivative of bold cap b with respect to t

\(\displaystyle\nabla\times\mathbf{E}=\ -\frac{\partial\mathbf{B}}{\partial t}\)

i H bar space partial over partial T space cap sigh left paren X, t right paren equals [minus h bar squared over 2M space space partial squared over partial X ^2 plus cap V left paren X, t )] cap psi left paren X, t right paren

\(\displaystyle i\hbar\frac{\partial}{\partial t}\Psi\left(x,t\right)=\left[-\frac{\hbar^2}{2m}\frac{\partial^2}{\partial x^2}+V\left(x,t\right)\right]\Psi\left(x,t\right)\)

real part of e to the -i omega t equals cosine omega t

\(\displaystyle Re{\left(e^{-i\omega t}\right)}=\cos{\omega t}\)

If you know a TeX control word, you can dictate it by saying “backslash <control word>”.

Math Speech Cues

Just as you may need to include words like “comma” and “question mark” in ordinary dictation, you may need to include words like “space” in math dictation to overrule the operator precedence of UnicodeMath. For example, the speech for the sixth equation above starts with “One over two pi space…”. The “space” is converted to a “ “, which instructs the UnicodeMath build-up engine to build up the 1/2𝜋. Else the integral that follows would end up in the denominator following the 2𝜋. Similarly, the examples contain “begin” and “end” or parentheses to overrule operator precedence. These special words are defined and illustrated in UnicodeMath.

In addition, there are words to choose math styles like script, bold, bold-italic, fraktur, and open-face (or double struck). Examples of such characters are ℋ𝐇𝑯ℌℍ. You get these by saying “script cap h”, “bold cap h”, “bold italic cap h”, “fraktur cap h”, and “double struck cap h”, respectively.

The speech can include common English idioms as in “a doesn’t equal b” or “a isn’t equal to b”, both of which result in “𝑎 ≠ 𝑏”.

Enabling math dictation input

To dictate math in OneNote, enter a math zone with Alt+= hot key and dictate. It’s important to enunciate clearly. Also set the feature gate “Microsoft.Office.SharedText.OneNoteMathDictation” to true. To dictate math in PowerPoint, do the same after setting the feature gate “Microsoft.Office.Graphics.EnableMathDictation” to true.

Implementation

The translator converts incoming math speech into UnicodeMath and builds up the result into OfficeMath in the app backing store. Some additions have been made to the build-up engine, such as the new keyword \abs for converting “absolute value of …” which is more natural than “vertical bar … vertical bar”. Also, the vertical bar notation is ambiguous: it can mean absolute value, cardinality of a set or group, or determinant. The speech “absolute value of …” is unambiguous. The current dictated math speech produced for \(a^2+b^2=c^2\) has some fixups that result in A ^2 + B ^2 = C ^2 , which is close to the UnicodeMath a^2+b^2=c^2. The translator deletes the spaces and converts letters to lower case. The translator also converts “a squared plus b squared equals c squared”. To enable math dictation, include the tomMathSpeech flag (0x10000000) in the call to MathBuildUp().

Dictation Problems

Three places to make improvements and fixes are:

  • Speech recognizer, e.g., it could benefit from a “math mode”
  • Speech-to-UnicodeMath translator
  • UnicodeMath build-up engine

Math speech recognition works remarkably well considering that we don’t tell the recognizer that math is involved. Nevertheless, there are results that are wrong for math. For example, “n” is often replaced by “end” and “b” by “be”. You can get the correct letter by preceding the letter with “letter” which the translator discards. “Sum” often ends up as “some” and “begin” may end up as “beginning”. You can avoid the “sum” error by saying “summation”. So, some editing may be needed before converting the speech to math. The math translation engine autocorrects “be” to “b”, “some” to “sum”, “sigh” to “𝜓”, and “beginning” to “begin”, since these words aren’t generally used in math zones.

Author

Murray Sargent
Principal Software Engineer

Yale BS, MS, PhD in theoretical physics. Worked 22 years in laser theory & applications first at Bell Labs and then Professor of Optical Sciences, University of Arizona. Worked on technical word processing, writing the first math display program (1969) and the technical word processor PS (1980s). Developed the SST debugger we used to get Windows 2.0 running in protected mode thereby eliminating the 640KB DOS barrier (1988). Have more than 100 refereed publications, 3 laser-physics books, 4 ...

More about author

0 comments

Discussion are closed.