{"id":292,"date":"2021-05-30T17:09:10","date_gmt":"2021-05-31T00:09:10","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/math-in-office\/?p=292"},"modified":"2021-05-30T17:52:57","modified_gmt":"2021-05-31T00:52:57","slug":"richedit-html-support","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/math-in-office\/richedit-html-support\/","title":{"rendered":"RichEdit HTML Support"},"content":{"rendered":"<p>RichEdit has had limited HTML support for many years, but it wasn\u2019t general enough to document publicly. A recent RichEdit client (to be described in a future post) needs better support, so we have been improving it. For example, we have added HTML copy\/paste, images, and math (of course!) to the Microsoft Office riched20.dll. Ideally RichEdit HTML should be able to represent any property that RichEdit RTF can represent. That still wouldn\u2019t make RichEdit a general HTML editor replete with forms and JavaScript functionality. But it would add good interoperability with Office apps, Teams, and the web, all of which use HTML as a lingua franca. This post describes the current RichEdit HTML capabilities which are a subset of its RTF capabilities. The HTML converters are works in progress and this post will be updated as more functionality is added. For example, RichEdit can write HTML tables, but not yet read them.<\/p>\n<p>Contents<\/p>\n<p><a href=\"#_Toc73284394\">RichEdit HTML Support 1<\/a><\/p>\n<p><a href=\"#_Toc73284395\">HTML copy\/paste format 1<\/a><\/p>\n<p><a href=\"#_Toc73284396\">Rich text 2<\/a><\/p>\n<p><a href=\"#_Toc73284397\">Images. 2<\/a><\/p>\n<p><a href=\"#_Toc73284398\">Programming details. 2<\/a><\/p>\n<p><a href=\"#_Toc73284399\">Messages. 3<\/a><\/p>\n<p><a href=\"#_Toc73284400\">TOM methods. 3<\/a><\/p>\n<p><a href=\"#_Toc73284401\">Math format options. 3<\/a><\/p>\n<p>&nbsp;<\/p>\n<h2><a name=\"_Toc73284395\"><\/a>HTML copy\/paste format<\/h2>\n<p>The \u201cHTML format\u201d clipboard format includes header and comment data in addition to the HTML to be copied (see <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/dataxchg\/html-clipboard-format#description\">https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/dataxchg\/html-clipboard-format#description<\/a>). This info needs to be added to copy HTML between RichEdit, Word, PPT, OneNote, Teams, and other apps. Frankly having to add this info seems like overkill. RTF can be copied and pasted without such overhead. We illustrate the format as written by RichEdit with the HTML for Einstein&#8217;s energy equation \ud835\udc38 = \ud835\udc5a\ud835\udc50\u00b2. In the HTML, OMML is the math format used by default since that&#8217;s what Word and PowerPoint expect. Here\u2019s the HTML<\/p>\n<pre>Version:1.0\r\nStartHTML:0000000105\r\nEndHTML:0000000844\r\nStartFragment:0000000417\r\nEndFragment:0000000811\r\n\u00a0\r\n&lt;html xml:lang=\"en\" lang=\"en\" xmlns=\"http:\/\/www.w3.org\/1999\/xhtml\"\r\nxmlns:m=\"http:\/\/schemas.microsoft.com\/office\/2004\/12\/omml\"&gt;\r\n&lt;head&gt;&lt;style&gt;body{font-family:Arial,sans-serif;font-size:10pt;}&lt;\/style&gt;\r\n&lt;style&gt;.cf0{font-style:italic;font-family:Cambria Math;font-size:24pt;}&lt;\/style&gt;&lt;\/head&gt;\r\n&lt;body&gt;&lt;!--StartFragment --&gt;&lt;p&gt;&lt;m:oMathPara&gt;&lt;m:oMath class=\"cf0\"&gt;\r\n&lt;span class=\"cf0\"&gt;&lt;m:r&gt;&lt;i&gt;&amp;#x1D438;&lt;\/i&gt;&lt;\/m:r&gt;&lt;\/span&gt;\r\n&lt;span class=\"cf0\"&gt;&lt;m:r&gt;&lt;i&gt;=&lt;\/i&gt;&lt;\/m:r&gt;&lt;\/span&gt;&lt;span class=\"cf0\"&gt;\r\n&lt;m:r&gt;&lt;i&gt;&amp;#x1D45A;&lt;\/i&gt;&lt;\/m:r&gt;&lt;\/span&gt;\r\n&lt;m:sSup&gt;&lt;m:sSupPr&gt;&lt;m:ctrlPr&gt;&lt;\/m:ctrlPr&gt;&lt;\/m:sSupPr&gt;&lt;m:e&gt;&lt;span class=\"cf0\"&gt;\r\n&lt;m:r&gt;&lt;i&gt;&amp;#x1D450;&lt;\/i&gt;&lt;\/m:r&gt;&lt;\/span&gt;&lt;\/m:e&gt;&lt;m:sup&gt;&lt;span class=\"cf0\"&gt;&lt;m:r&gt;&lt;i&gt;2&lt;\/i&gt;\r\n&lt;\/m:r&gt;&lt;\/span&gt;&lt;\/m:sup&gt;&lt;\/m:sSup&gt;&lt;\/m:oMath&gt;&lt;\/m:oMathPara&gt;&lt;\/p&gt;\r\n&lt;!--EndFragment --&gt;&lt;\/body&gt;&lt;\/html&gt;\r\n<\/pre>\n<p>Here the StartHTML entry in the header gives the character position (cp) offset of the HTML &lt;body&gt; and EndHTML gives the cp at the end of the HTML &lt;body&gt;. The StartFragment gives the cp of the text that the user selected and the EndFragment gives the cp at the end of the selection. In this example, the equation \ud835\udc38 = \ud835\udc5a\ud835\udc50\u00b2 is selected and displayed on its own line (display mode rather than inline mode). The start of the displayed equation is given by the OMML \u00a0&lt;m:oMathPara&gt;. The corresponding MathML including an mml: prefix is<\/p>\n<pre>&lt;mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"block\"&gt;\r\n  &lt;mml:mi&gt;E&lt;\/mml:mi&gt;\r\n  &lt;mml:mo&gt;=&lt;\/mml:mo&gt;\r\n  &lt;mml:mi&gt;m&lt;\/mml:mi&gt;\r\n  &lt;mml:msup&gt;\r\n    &lt;mml:mi&gt;c&lt;\/mml:mi&gt;\r\n    &lt;mml:mn&gt;2&lt;\/mml:mn&gt;&lt;\/mml:msup&gt;&lt;\/mml:math&gt;\r\n<\/pre>\n<p>The Programming details section describes how to write HTML with OMML or MathML with and without the mml: prefix. The HTML5 standard includes MathML without a prefix. RichEdit can write and read HTML with all three math formats.<\/p>\n<h2><a name=\"_Toc73284396\"><\/a>Rich text<\/h2>\n<p>Character formatting includes font and family, height, text and back color, weight, spacing, bold, italic, underline, strikeout, subscript, superscript, small caps, all caps and hyperlinks. Paragraph formatting includes numbered and bulleted lists, left, right, and centered alignments, and paragraph margins.<\/p>\n<h2><a name=\"_Toc73284397\"><\/a>Images<\/h2>\n<p>RichEdit can read and write the HTML &lt;img&gt; element with a src attribute that has a base64 encoding of the binary image data. This is a technique used widely in Microsoft Office for HTML copy\/paste. For example, the tag might begin with \u201c&lt;img src=\\&#8221;data:image\/png;base64,\u201d.<\/p>\n<h2><a name=\"_Toc73284398\"><\/a>Programming details<\/h2>\n<p>HTML content can be read in and out via messages, hot keys (Ctrl+c, Ctrl+v, Ctrl+x), and TOM methods.<\/p>\n<h3><a name=\"_Toc73284399\"><\/a>Messages<\/h3>\n<p>A client can get HTML content by sending the <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/controls\/em-streamout\">EM_STREAMOUT<\/a> message with wParam = SF_HTML | SF_BINARY. The SF_BINARY (0x0008) is needed to write the data in the RichEdit binary format to temporary memory and then the SF_HTML (0x00100000) writes that data out as HTML. If clipboard HTML is desired, OR the SF_CLIPBOARD (0x80000000) flag into wParam.<\/p>\n<p>A client can stream in HTML content by sending the EM_ISTREAMIN message (WM_USER + 252), which streams in using the <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/objidl\/nn-objidl-istream#:~:text=The%20IStream%20interface%20defines%20methods%20similar%20to%20the,IStream%20interface%20pointer%20rather%20than%20a%20file%20handle.\">IStream<\/a> interface pointed to by the lParam instead of using the usual <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/richedit\/ns-richedit-editstream\">EDITSTREAM<\/a> struct. This choice is due to use of the Office HTML parser for input and the mso.dll must be loaded for that to work. Set wParam equal to 1, which signifies HTML. Currently only HTML can be streamed in using the EM_ISTREAMIN message.<\/p>\n<p>Other messages that can be used are WM_COPY, WM_PASTE, WM_CUT, and EM_PASTESPECIAL which are all described on the web.<\/p>\n<h3><a name=\"_Toc73284400\"><\/a>TOM methods<\/h3>\n<p>In addition to the ITextRange::Copy() and ITextRange::Paste() methods, you can input HTML content into a range by calling <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/tom\/nf-tom-itextrange2-settext2\">ITextRange2::SetText2<\/a>(tomConvertHtml, bstr), where tomConvertHtml is given by 0x00900000. Similarly, you can get the HTML content from a range by calling ITextRange2:GetText2(tomConvertHtml, pbstr).<\/p>\n<h3><a name=\"_Toc73284401\"><\/a>Math format options<\/h3>\n<p>By default, RichEdit writes equations in HTML in the OMML format since that format is what Office apps like Word and PowerPoint expect. But it can write equations in MathML with or without an mml: prefix. The function to call to set which math format to use is <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/api\/tom\/nf-tom-itextdocument2-setmathproperties\">ITextDocument2::SetMathProperties<\/a>() with tomHtmlOMML, tomHtmlMathML, or tomHtmlMath<\/p>\n<pre>  tomHtmlMathFormatMask\u00a0\u00a0\u00a0\u00a0\u00a0 = 0x00300000, \u00a0 \/\/ Mask for math-format flags\r\n  tomHtmlOMML\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 = 0,\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\/\/ m:\r\n  tomHtmlMathML\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 = 0x00100000,\u00a0\u00a0 \/\/ mml:\r\n  tomHtmlMath\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0= 0x00200000,\u00a0\u00a0 \/\/ No prefix MathML (HTML5)\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>RichEdit has had limited HTML support for many years, but it wasn\u2019t general enough to document publicly. A recent RichEdit client (to be described in a future post) needs better support, so we have been improving it. For example, we have added HTML copy\/paste, images, and math (of course!) to the Microsoft Office riched20.dll. Ideally [&hellip;]<\/p>\n","protected":false},"author":40611,"featured_media":55,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-292","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-math-in-office"],"acf":[],"blog_post_summary":"<p>RichEdit has had limited HTML support for many years, but it wasn\u2019t general enough to document publicly. A recent RichEdit client (to be described in a future post) needs better support, so we have been improving it. For example, we have added HTML copy\/paste, images, and math (of course!) to the Microsoft Office riched20.dll. Ideally [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/292","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/users\/40611"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/comments?post=292"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/292\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media\/55"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media?parent=292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/categories?post=292"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/tags?post=292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}