{"id":501,"date":"2022-09-29T18:01:34","date_gmt":"2022-09-30T01:01:34","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/math-in-office\/?p=501"},"modified":"2022-09-29T18:01:34","modified_gmt":"2022-09-30T01:01:34","slug":"setting-and-getting-text-in-various-formats","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/math-in-office\/setting-and-getting-text-in-various-formats\/","title":{"rendered":"Setting and Getting Text in Various Formats"},"content":{"rendered":"<p>You can get and set text from\/into RichEdit in a variety of formats including RTF, HTML, MathML, OMML, UnicodeMath, Nemeth Braille, and speech. This post documents RichEdit options for a general way to access text using <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/hh768660(v=vs.85).aspx\">ITextRange2::SetText2<\/a>(options, bstr) and <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/hh768646(v=vs.85).aspx\">ITextRange2::GetText2<\/a>(options, pbstr). As such, this post is for programmers. All options work in the current Microsoft Office RichEdit (riched20.dll in an Office subdirectory) and many work in the Windows RichEdit (msftedit.dll). The options are defined in the following table in which s\/g stands for SetText2\/GetText2, respectively.<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Option<\/strong><\/td>\n<td><strong>Value<\/strong><\/td>\n<td><strong>s\/g<\/strong><\/td>\n<td><strong>Meaning<\/strong><\/td>\n<\/tr>\n<tr>\n<td>tomUnicodeBiDi<\/td>\n<td>0x00000001<\/td>\n<td>s<\/td>\n<td>Use <a href=\"http:\/\/www.unicode.org\/reports\/tr9\/\">Unicode BiDi algorithm<\/a> for inserted text<\/td>\n<\/tr>\n<tr>\n<td>tomAdjustCRLF<\/td>\n<td>0x00000001<\/td>\n<td>g<\/td>\n<td>If range start is inside multicode unit like CRLF, surrogate pair, etc., move to start of unit<\/td>\n<\/tr>\n<tr>\n<td>tomUseCRLF<\/td>\n<td>0x00000002<\/td>\n<td>g<\/td>\n<td>Paragraph ends use CRLF (U+000D U+000A)<\/td>\n<\/tr>\n<tr>\n<td>tomTextize<\/td>\n<td>0x00000004<\/td>\n<td>g<\/td>\n<td>Embedded objects export alt text; else U+FFFC<\/td>\n<\/tr>\n<tr>\n<td>tomAllowFinalEOP<\/td>\n<td>0x00000008<\/td>\n<td>g<\/td>\n<td>If range includes final EOP, export it; else don\u2019t<\/td>\n<\/tr>\n<tr>\n<td>tomUnlink<\/td>\n<td>0x00000008<\/td>\n<td>s<\/td>\n<td>Disables <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2009\/09\/24\/richedit-friendly-name-hyperlinks\/\">link<\/a> attributes if present<\/td>\n<\/tr>\n<tr>\n<td>tomUnhide<\/td>\n<td>0x00000010<\/td>\n<td>s<\/td>\n<td>Disables <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/bb787883(v=vs.85).aspx\">hidden<\/a> attribute if present<\/td>\n<\/tr>\n<tr>\n<td>tomFoldMathAlpha<\/td>\n<td>0x00000010<\/td>\n<td>g<\/td>\n<td>Replace <a href=\"http:\/\/www.unicode.org\/reports\/tr25\/\">math alphanumerics<\/a> with ASCII\/Greek<\/td>\n<\/tr>\n<tr>\n<td>tomIncludeNumbering<\/td>\n<td>0x00000040<\/td>\n<td>g<\/td>\n<td>Lists include bullets\/numbering<\/td>\n<\/tr>\n<tr>\n<td>tomCheckTextLimit<\/td>\n<td>0x00000020<\/td>\n<td>s<\/td>\n<td>Only insert up to <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/bb761647(v=vs.85).aspx\">text limit<\/a><\/td>\n<\/tr>\n<tr>\n<td>tomDontSelectText<\/td>\n<td>0x00000040<\/td>\n<td>s<\/td>\n<td>After insertion, call <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/bb787740(v=vs.85).aspx\">Collapse<\/a>(tomEnd)<\/td>\n<\/tr>\n<tr>\n<td>tomTranslateTableCell<\/td>\n<td>0x00000080<\/td>\n<td>g<\/td>\n<td>Export spaces for <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2008\/09\/15\/richedits-nested-table-facility\/\">table delimiters<\/a><\/td>\n<\/tr>\n<tr>\n<td>tomNoMathZoneBrackets<\/td>\n<td>0x00000100<\/td>\n<td>g<\/td>\n<td>Used with tomConvertUnicodeMath and tomConvertTeX. Set discards math zone brackets<\/td>\n<\/tr>\n<tr>\n<td>tomLanguageTag<\/td>\n<td>0x00001000<\/td>\n<td>s\/g<\/td>\n<td>Sets <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2015\/10\/19\/richedit-language-tag-handling\/\">BCP-47 language tag<\/a> for range; gets tag<\/td>\n<\/tr>\n<tr>\n<td>tomConvertRTF<\/td>\n<td>0x00002000<\/td>\n<td>s\/g<\/td>\n<td><a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2015\/11\/22\/inserting-and-getting-math-text-in-richedit\/\">Set or get RTF<\/a><\/td>\n<\/tr>\n<tr>\n<td>tomGetTextForSpell<\/td>\n<td>0x00008000<\/td>\n<td>g<\/td>\n<td>Export spaces for hidden\/math text, table delims<\/td>\n<\/tr>\n<tr>\n<td>tomConvertMathML<\/td>\n<td>0x00010000<\/td>\n<td>s\/g<\/td>\n<td>Set or get <a href=\"http:\/\/www.w3.org\/Math\/\">MathML<\/a><\/td>\n<\/tr>\n<tr>\n<td>tomGetUtf16<\/td>\n<td>0x00020000<\/td>\n<td>g<\/td>\n<td>Causes tomConvertRTF, etc. to get UTF-16. SetText2 accepts 8-bit or 16-bit RTF<\/td>\n<\/tr>\n<tr>\n<td>tomConvertLinearFormat<\/td>\n<td>0x00040000<\/td>\n<td>s\/g<\/td>\n<td>Alias for tomConvertUnicodeMath<\/td>\n<\/tr>\n<tr>\n<td>tomConvertUnicodeMath<\/td>\n<td>0x00040000<\/td>\n<td>s\/g<\/td>\n<td><a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2016\/09\/07\/unicodemath\/\">UnicodeMath<\/a><\/td>\n<\/tr>\n<tr>\n<td>tomConvertOMML<\/td>\n<td>0x00080000<\/td>\n<td>s\/g<\/td>\n<td><a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2009\/01\/16\/omml-specification-version-2\/\">Office MathML<\/a><\/td>\n<\/tr>\n<tr>\n<td>tomConvertMask<\/td>\n<td>0x00F00000<\/td>\n<td>s\/g<\/td>\n<td>Mask for mutually exclusive modes<\/td>\n<\/tr>\n<tr>\n<td>tomConvertRuby<\/td>\n<td>0x00100000<\/td>\n<td>s<\/td>\n<td>See section below on Entering Ruby Text<\/td>\n<\/tr>\n<tr>\n<td>tomConvertTeX<\/td>\n<td>0x00200000<\/td>\n<td>s\/g<\/td>\n<td>See <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2017\/07\/30\/latex-math-in-office\/\">LaTeX Math in Office<\/a><\/td>\n<\/tr>\n<tr>\n<td>tomConvertMathSpeech<\/td>\n<td>0x00300000<\/td>\n<td>g<\/td>\n<td><a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2016\/06\/30\/speaking-of-math\/\">Math speech<\/a> (<a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2017\/02\/27\/microsoft-office-math-speech\/\">English only<\/a> here)<\/td>\n<\/tr>\n<tr>\n<td>tomConvertSpeechTokens<\/td>\n<td>0x00400000<\/td>\n<td>g<\/td>\n<td>Simple Unicode and speech tokens<\/td>\n<\/tr>\n<tr>\n<td>tomConvertNemeth<\/td>\n<td>0x00500000<\/td>\n<td>s\/g<\/td>\n<td><a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2016\/07\/31\/nemeth-braille-the-first-math-linear-format\/\">Nemeth math braille<\/a> in U+2800 block<\/td>\n<\/tr>\n<tr>\n<td>tomConvertNemethAscii<\/td>\n<td>0x00600000<\/td>\n<td>g<\/td>\n<td>Corresponding ASCII braille<\/td>\n<\/tr>\n<tr>\n<td>tomConvertNemethNoItalic<\/td>\n<td>0x00700000<\/td>\n<td>g<\/td>\n<td>Nemeth braille in U+2800 block w\/o math italic<\/td>\n<\/tr>\n<tr>\n<td>tomConvertNemethDefinition<\/td>\n<td>0x00800000<\/td>\n<td>g<\/td>\n<td>Fine-grained speech in braille<\/td>\n<\/tr>\n<tr>\n<td>tomConvertHtml<\/td>\n<td>0x00900000<\/td>\n<td>s\/g<\/td>\n<td>Convert HTML<\/td>\n<\/tr>\n<tr>\n<td>tomConvertEnclose<\/td>\n<td>0x00A00000<\/td>\n<td>s<\/td>\n<td>See section below on Entering Enclosed Text<\/td>\n<\/tr>\n<tr>\n<td>tomConvertCRtoLF<\/td>\n<td>0x01000000<\/td>\n<td>g<\/td>\n<td>Plain-text paragraphs end with LF, not CRLF<\/td>\n<\/tr>\n<tr>\n<td>tomLaTeXDelim<\/td>\n<td>0x02000000<\/td>\n<td>g<\/td>\n<td>Use LaTeX math-zone delimiters \\(&#8230;\\) inline, \\[&#8230;\\] display; else $&#8230;$, $$&#8230;$$. Set handles all<\/td>\n<\/tr>\n<tr>\n<td>tomGhostText<\/td>\n<td>0x04000000<\/td>\n<td>s<\/td>\n<td>Set ghost text (used for text prediction)<\/td>\n<\/tr>\n<tr>\n<td>tomNoGhostText<\/td>\n<td>0x04000000<\/td>\n<td>g<\/td>\n<td>Get text without ghost text<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Mutually exclusive options<\/h2>\n<p>Nonzero values within the mask defined by tomConvertMask (0x00F00000) are mutually exclusive, that is, they cannot be combined (OR\u2019d) with one another. The options UnicodeMath, [La]TeX (tomConvertTeX), and Nemeth math braille (tomConvertNemeth) are also mutually exclusive. You can set only one at a time. But other options can be OR\u2019d in if desired.<\/p>\n<h2>Nemeth math braille options<\/h2>\n<p>A string of Nemeth math braille codes in the Unicode range U+2800..U+283F can be inserted and built up by calling ITextRange2::SetText2(tomConvertNemeth, bstr). If the string is valid, you can get it back in any of the math formats including Nemeth math braille. For example, if you insert the string<\/p>\n<p>\u2839\u2802\u280c\u2806\u2828\u280f\u283c\u282e\u2830\u2834\u2818\u2806\u2828\u280f\u2810\u2839\u2828\u2808\u2808\u2819\u2828\u2839\u280c\u2801\u282c\u2803\u2800\u280e\u280a\u281d\u2800\u2828\u2839\u283c\u2800\u2828\u2805\u2800\u2839\u2802\u280c\u281c\u2801\u2818\u2806\u2810\u2824\u2803\u2818\u2806\u2810\u283b\u283c<\/p>\n<p>you see<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2021\/02\/integral-e1613425332545.jpg\"><img decoding=\"async\" class=\"alignnone size-medium wp-image-208\" src=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2021\/02\/integral-300x69.jpg\" alt=\"Image integral\" width=\"300\" height=\"69\" \/><\/a><\/p>\n<p>You can also input braille with a standard keyboard by typing a control word \\braille assigned to the Unicode character U+24B7 (\u24b7). (See <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2017\/07\/30\/latex-math-in-office\/\">LaTeX Math in Office<\/a> for how to add commands to math autocorrect). The \\braille command causes math input to accept braille input via a regular keyboard using the <a href=\"http:\/\/www.dotlessbraille.org\/asciibrltable.htm\">braille ASCII codes<\/a> sometimes referred to as North American Braille Computer Codes. The character ~ (U+007E) disables this input mode. These braille codes are described in the post <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2016\/07\/31\/nemeth-braille-the-first-math-linear-format\/\">Nemeth Braille\u2014the first math linear format<\/a> and can be input using <a href=\"https:\/\/en.wikipedia.org\/wiki\/Refreshable_braille_display\">refreshable braille displays<\/a>. Alternatively, such input can be automated by calling <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/bb787836(v=vs.85).aspx\">ITextSelection::TypeText<\/a>(bstr). Just as in entering UnicodeMath, the equations build up on screen as soon as the math braille input becomes unambiguous. The implementation includes the <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2017\/06\/21\/math-braille-ui\/\">math braille UI<\/a> that cues the user where the insertion point is for unambiguous editing of math zones using braille. Note that as of this posting, the math braille facility isn\u2019t hooked up to Narrator or other screen readers.<\/p>\n<h2>Getting (and Setting) Math Speech<\/h2>\n<p>The tomConvertMathSpeech currently only gets math speech in English. Microsoft Office apps like Word, PowerPoint and OneNote deliver <a href=\"https:\/\/blogs.msdn.microsoft.com\/murrays\/2017\/02\/27\/microsoft-office-math-speech\/\">math speech<\/a> in over 18 languages to the assistive technology (AT) program Narrator via the UIA <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/ee671389(v=vs.85).aspx\">ITextRangeProvider::GetText<\/a>() function. Other ATs could also get math speech this way, although they usually get MathML and generate speech from that. Dictating (setting) math speech would be nice for both blind and sighted folks. Imagine, you can say \ud835\udc4e\u00b2 + \ud835\udc4f\u00b2 = \ud835\udc50\u00b2 faster than you can type it or write it! The SetText2(tomConvertMathSpeech, bstr) is ready to handle such input, but the feature is not available yet.<\/p>\n<h2>Entering ruby text<\/h2>\n<p>In a nonmath context, the option, tomConvertRuby (0x00100000), can be used to convert strings like \u201c{\u2026|\u2026}\u201d to <a href=\"http:\/\/blogs.msdn.com\/b\/murrays\/archive\/2014\/12\/28\/ruby-text-objects.aspx\">ruby inline objects<\/a>, where the first ellipsis represents the ruby text and the second ellipsis the base text. The ASCII curly braces and vertical bar are translated to the internal ruby-object structure characters U+FDD1, U+FDEF, and U+FDEE, respectively. Alternatively, the string can contain those structure characters directly. If a digit follows the start delimiter (\u2018{\u2018 or U+FDD1}, the digit defines the ruby options<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>rubyAlign val<\/strong><\/td>\n<td width=\"422\"><strong>Meaning<\/strong><\/td>\n<\/tr>\n<tr>\n<td>center (0)<\/td>\n<td width=\"422\">Center &lt;ruby&gt; with respect to &lt;base&gt;<\/td>\n<\/tr>\n<tr>\n<td>distributeLetter (1)<\/td>\n<td width=\"422\">Distribute difference in space between longer and shorter text in the latter, evenly between each character<\/td>\n<\/tr>\n<tr>\n<td>distributeSpace (2)<\/td>\n<td width=\"422\">Distribute difference in space between longer and shorter text in the latter using a ratio of 1:2:1 which corresponds to lead : inter-character : end<\/td>\n<\/tr>\n<tr>\n<td>left (3)<\/td>\n<td width=\"422\">Align &lt;ruby&gt; with the left of &lt;base&gt;<\/td>\n<\/tr>\n<tr>\n<td>right (4)<\/td>\n<td width=\"422\">Align &lt;ruby&gt; with the right of &lt;base&gt;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>If you add 5 to these values, the ruby object will display the ruby text below the base text instead of above it. For example, calling ITextRange2::SetText2(tomConvertRuby, bstr) with bstr containing the string \u201c{1\u306b\u307b\u3093\u3054|\u65e5\u672c\u8a9e}\u201d inserts<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/09\/ruby.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-505\" src=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/09\/ruby.png\" alt=\"Image ruby\" width=\"149\" height=\"88\" \/><\/a><\/p>\n<p>The string can contain text in addition to ruby objects and the ruby objects can be nested to create compound ruby objects such as<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/09\/rubyc-e1664499589171.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-507\" src=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/09\/rubyc-e1664499589171.png\" alt=\"Image rubyc\" width=\"150\" height=\"110\" \/><\/a><\/p>\n<h2>Entering enclosed text<\/h2>\n<p>The post <a href=\"https:\/\/devblogs.microsoft.com\/math-in-office\/rounded-rectangles-and-ellipses\/\">Rounded Rectangles and Ellipses &#8211; Math in Office (microsoft.com)<\/a> describes ways to enclose text in possibly rounded rectangles and ellipses. The SetText2(tomConvertEnclose, bstr) option is similar to the tomConvertRuby option. It converts strings like \u201c{\u2026}\u201d to a tomEnclose object.<\/p>\n<h2>Other ways to get\/set text<\/h2>\n<p>In addition to the ITextRange2::SetText2\/GetText2(), the messages <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/winmsg\/wm-settext\">WM_SETTEXT<\/a>, <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/controls\/em-settextex\">EM_SETTEXTEX<\/a>, <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/winmsg\/wm-gettext\">WM_GETTEXT<\/a>, and <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/controls\/em-gettextex\">EM_GETTEXTEX<\/a> are useful. The set-text messages work with plain text or RTF in rich-text controls. EM_SETTEXTEX accepts both 16-bit RTF as well as 8-bit RTF, while WM_SETTEXT doesn\u2019t handle 16-bit RTF.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You can get and set text from\/into RichEdit in a variety of formats including RTF, HTML, MathML, OMML, UnicodeMath, Nemeth Braille, and speech. This post documents RichEdit options for a general way to access text using ITextRange2::SetText2(options, bstr) and ITextRange2::GetText2(options, pbstr). As such, this post is for programmers. All options work in the current Microsoft [&hellip;]<\/p>\n","protected":false},"author":40611,"featured_media":55,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-501","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-math-in-office"],"acf":[],"blog_post_summary":"<p>You can get and set text from\/into RichEdit in a variety of formats including RTF, HTML, MathML, OMML, UnicodeMath, Nemeth Braille, and speech. This post documents RichEdit options for a general way to access text using ITextRange2::SetText2(options, bstr) and ITextRange2::GetText2(options, pbstr). As such, this post is for programmers. All options work in the current Microsoft [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/501","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/users\/40611"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/comments?post=501"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/501\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media\/55"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media?parent=501"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/categories?post=501"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/tags?post=501"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}