{"id":537,"date":"2022-12-31T14:20:47","date_gmt":"2022-12-31T22:20:47","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/math-in-office\/?p=537"},"modified":"2022-12-31T17:49:00","modified_gmt":"2023-01-01T01:49:00","slug":"using-richedit-for-text-processing","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/math-in-office\/using-richedit-for-text-processing\/","title":{"rendered":"Using RichEdit for Text Processing"},"content":{"rendered":"<p>Suppose you\u2019re writing a program that needs to process rich text. You could write your own functions. Alternatively, you could have RichEdit do the processing. For example, you might want to search for mathematical expressions in an RTF or HTML file or convert text in one math format to another format. Or change the kind of list numbering. Or recognize URLs, telephone numbers, etc. These manipulations don\u2019t need a display. This post describes how to create a RichEdit instance to do the processing.<\/p>\n<h2>Load the RichEdit dll<\/h2>\n<p>The first thing is to load the RichEdit dll. You can use the system \\windows\\system32\\msftedit.dll unless you need features that have been added more recently to the Office riched20.dll. One such dll is located on my laptop in C:\\Program Files\\Microsoft Office\\root\\vfs\\ProgramFilesCommonX64\\Microsoft Shared\\OFFICE16\\riched20.dll. Another is a recent Office RichEdit that the Windows 11 Notepad uses. On my laptop, its path is C:\\Program Files\\WindowsApps\\Microsoft.WindowsNotepad_11.2210.5.0_x64__8wekyb3d8bbwe\\riched20.dll. You can find the installation location via PowerShell:<\/p>\n<p><strong>Get-AppxPackage | ?{ $_.Name.Contains(&#8220;Notepad&#8221;) } | %{ $_.InstallLocation }<\/strong><\/p>\n<p>To load the dll, execute<\/p>\n<pre style=\"padding-left: 40px;\">HINSTANCE hRE = LoadLibrary(L\"riched20.dll\");<\/pre>\n<p>You may need to give the full path to the desired RichEdit dll. Note that the \\windows\\system32\\riched20.dll is old and exists for backward compatibility with very old programs: it\u2019s Version 3 from Windows 2000 and has been only updated for security fixes. It has no features added in this century, so it\u2019s missing a lot of functionality!<\/p>\n<h2>Create a RichEdit control<\/h2>\n<p>A RichEdit control is an <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/textserv\/nl-textserv-itextservices2\">ITextServices2<\/a> object. To create it, query the RichEdit dll for the exported CreateTextServices() function:<\/p>\n<pre>PCreateTextServices pfnCreateTextServices =\r\n   (PCreateTextServices)GetProcAddress(hRE, \"CreateTextServices\");\r\nCNoShowHost noShowHost;   \/\/ ITextHost object described below\r\nIUnknown *pUnk;\r\nHRESULT hr = pfnCreateTextServices(nullptr, &amp;noShowHost, &amp;pUnk);\r\nif (hr != NOERROR)\r\n   return hr;\r\n\r\nITextServices2* pserv;\r\nhr = pUnk-&gt;QueryInterface(IID_ITextServices2, (void **)&amp;pserv);\r\npUnk-&gt;Release();\r\nif (hr != NOERROR)\r\n   return hr;\r\n\r\nITextDocument2* pdoc;\r\nhr = pserv-&gt;QueryInterface(IID_ITextDocument2, (void **)&amp;pdoc) != S_OK);\r\nif (hr != NOERROR)\r\n    return hr;<\/pre>\n<p>ITextServices2 inherits from ITextServices. The method <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/textserv\/nf-textserv-itextservices-txsendmessage\">ITextServices::TxSendMessage<\/a>() lets you send messages to the RichEdit control, and the pdoc pointer is the top <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/tom\/nn-tom-itextdocument2\">TOM2 interface<\/a> that lets you use TOM methods for processing text. For example,<\/p>\n<pre>LRESULT lres;\r\npserv-&gt;TxSendMessage(WM_SETTEXT, 0, reinterpret_cast&lt;LPARAM&gt;(szTextRtf),\r\n    &amp;lres);<\/pre>\n<p>inserts the RTF string szTextRtf into the control. You can also insert a Unicode string as the LPARAM.<\/p>\n<h2>Finding a math expression<\/h2>\n<p>An interesting case illustrating the TOM interfaces is finding a math formula in a document. Read in the RTF or HTML document and then search the document using the <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/tom\/nf-tom-itextrange2-find\">ITextRange2::Find<\/a>() method. First, to get the user selection ITextSelection* from the pdoc pointer, execute<\/p>\n<pre>ITextSelection2* psel;\r\npdoc-&gt;GetSelection2(&amp;psel);<\/pre>\n<p>The math expression you want to find in the document should be in a separate RichEdit control with an ITextDocument2* pdocExpr pointer. Get an ITextRange2* from pdocExpr by executing<\/p>\n<pre>pdocExpr-&gt;Range2(0, tomForward, &amp;pRangeExpr);\r\npRangeExpr-&gt;MoveEnd(tomCharacter, -1, nullptr);<\/pre>\n<p>Here the MoveEnd() call unselects the final CR (all rich-text controls end with a final CR (U+000D)). Then to find the math expression selected by pRangeExpr and starting in the document where psel is pointing, call<\/p>\n<pre>LONG Delta;\r\npsel-&gt;Find(pRangeExpr, tomForward, Flags, &amp;Delta);<\/pre>\n<p>Here the values for Flags are defined in <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/tom\/nf-tom-itextrange-findtext\">ITextRange::FindText<\/a>(), and Delta gets set to the count of characters found (if found). The search is \u201cfuzzy\u201d, that is, it ignores character format attributes like foreground\/background color, font face name, and font height. But it distinguishes Unicode characters, so the following kinds of letters are all distinct \ud835\udc3b\ud835\udc6f\ud835\udc07\u210c\u210b. And math objects like fractions, subscripts, superscripts, integrals, matrices, equation arrays, etc., are all distinct. This approach is more general than searching for MathML, LaTeX, RTF, HTML, OMML, UnicodeMath, etc., since it first converts these formats to OfficeMath and then does a fuzzy search on the resulting OfficeMath. For a more detailed description of the algorithm, see <a href=\"https:\/\/learn.microsoft.com\/en-us\/archive\/blogs\/murrays\/math-findreplace-and-rich-text-searches\">Math Find\/Replace and Rich Text Searches<\/a>.<\/p>\n<h2>No-display ITextHost<\/h2>\n<p>A RichEdit control needs a host given by an implementation of <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/textserv\/nl-textserv-itexthost\">ITextHost<\/a> or ITextHost2. If no display is needed, ITextHost is adequate. Here\u2019s a skeleton version of such a host. You can use it as is, or enhance it as desired.<\/p>\n<pre>class CNoShowHost : public ITextHost\r\n{\r\n   \/\/ IUnknown methods\r\n   STDMETHODIMP QueryInterface(REFIID, void **) noexcept {return E_NOTIMPL;}\r\n   STDMETHODIMP_(ULONG) STDMETHODCALLTYPE AddRef(void) noexcept {return 1;}\r\n   STDMETHODIMP_(ULONG) STDMETHODCALLTYPE Release(void) noexcept {return 1;}\r\n\r\n   \/\/ ITextHost Methods\r\n   HDC  TxGetDC() {return nullptr;}\r\n   INT\u00a0\u00a0TxReleaseDC(HDC) {return 0;}\r\n   BOOL TxShowScrollBar(INT, BOOL) {return false;}\r\n   BOOL TxEnableScrollBar(INT, INT) {return false;}\r\n   BOOL TxSetScrollRange(INT, LONG, INT, BOOL) {return false;}\r\n   BOOL TxSetScrollPos(INT, INT, BOOL) {return false;}\r\n   void TxInvalidateRect(LPCRECT, BOOL) {}\r\n   void TxViewChange(BOOL) {}\r\n   BOOL TxCreateCaret(HBITMAP, INT, INT) {return false;}\r\n   BOOL TxShowCaret(BOOL) {return false;}\r\n   BOOL TxSetCaretPos(INT, INT) {return false;}\r\n   BOOL TxSetTimer(UINT, UINT) {return false;}\r\n   void TxKillTimer(UINT) {}\r\n   void TxScrollWindowEx(INT, INT, LPCRECT, LPCRECT, HRGN, LPRECT, UINT) {}\r\n   void TxSetCapture(BOOL) {}\r\n   void TxSetFocus() {}\r\n   void TxSetCursor(HCURSOR, BOOL) {}\r\n   BOOL TxScreenToClient(LPPOINT) {return false;}\r\n   BOOL TxClientToScreen(LPPOINT) {return false;}\r\n   HRESULT\u00a0TxActivate(LONG *) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxDeactivate(LONG) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxGetClientRect(LPRECT) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxGetViewInset(LPRECT) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxGetCharFormat(const CHARFORMATW **) {return E_UNEXPECTED;}\r\n   HRESULT TxGetParaFormat(const PARAFORMAT **) {return E_UNEXPECTED;}\r\n   COLORREF TxGetSysColor(int) {return 0;}\r\n   HRESULT\u00a0TxGetBackStyle(TXTBACKSTYLE *) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxGetMaxLength(DWORD *) {return S_OK;}\r\n   HRESULT\u00a0TxGetScrollBars(DWORD *) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxGetPasswordChar(_Out_ WCHAR *) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxGetAcceleratorPos(LONG *) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0TxGetExtent(LPSIZEL) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0OnTxCharFormatChange(const CHARFORMATW *) {return E_UNEXPECTED;}\r\n   HRESULT\u00a0OnTxParaFormatChange(const PARAFORMAT *) {return E_UNEXPECTED;}\r\n   HRESULT TxGetPropertyBits(DWORD \/* dwMask *\/, DWORD *pdwBits)\u00a0{*pdwBits = TXTBIT_RICHTEXT | TXTBIT_MULTILINE; return S_OK;}\r\n   HRESULT\u00a0TxNotify(DWORD, void *) {return E_UNEXPECTED;}\r\n   HIMC TxImmGetContext() {return nullptr;}\r\n   void TxImmReleaseContext(HIMC) {}\r\n   HRESULT\u00a0TxGetSelectionBarWidth(LONG *) {return E_UNEXPECTED;}<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Suppose you\u2019re writing a program that needs to process rich text. You could write your own functions. Alternatively, you could have RichEdit do the processing. For example, you might want to search for mathematical expressions in an RTF or HTML file or convert text in one math format to another format. Or change the kind [&hellip;]<\/p>\n","protected":false},"author":40611,"featured_media":55,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-537","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-math-in-office"],"acf":[],"blog_post_summary":"<p>Suppose you\u2019re writing a program that needs to process rich text. You could write your own functions. Alternatively, you could have RichEdit do the processing. For example, you might want to search for mathematical expressions in an RTF or HTML file or convert text in one math format to another format. Or change the kind [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/537","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/users\/40611"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/comments?post=537"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/537\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media\/55"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media?parent=537"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/categories?post=537"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/tags?post=537"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}