Using RichEdit for Text Processing

Murray Sargent

Suppose you’re writing a program that needs to process rich text. You could write your own functions. Alternatively, you could have RichEdit do the processing. For example, you might want to search for mathematical expressions in an RTF or HTML file or convert text in one math format to another format. Or change the kind of list numbering. Or recognize URLs, telephone numbers, etc. These manipulations don’t need a display. This post describes how to create a RichEdit instance to do the processing.

Load the RichEdit dll

The first thing is to load the RichEdit dll. You can use the system \windows\system32\msftedit.dll unless you need features that have been added more recently to the Office riched20.dll. One such dll is located on my laptop in C:\Program Files\Microsoft Office\root\vfs\ProgramFilesCommonX64\Microsoft Shared\OFFICE16\riched20.dll. Another is a recent Office RichEdit that the Windows 11 Notepad uses. On my laptop, its path is C:\Program Files\WindowsApps\Microsoft.WindowsNotepad_11.2210.5.0_x64__8wekyb3d8bbwe\riched20.dll. You can find the installation location via PowerShell:

Get-AppxPackage | ?{ $_.Name.Contains(“Notepad”) } | %{ $_.InstallLocation }

To load the dll, execute

HINSTANCE hRE = LoadLibrary(L"riched20.dll");

You may need to give the full path to the desired RichEdit dll. Note that the \windows\system32\riched20.dll is old and exists for backward compatibility with very old programs: it’s Version 3 from Windows 2000 and has been only updated for security fixes. It has no features added in this century, so it’s missing a lot of functionality!

Create a RichEdit control

A RichEdit control is an ITextServices2 object. To create it, query the RichEdit dll for the exported CreateTextServices() function:

PCreateTextServices pfnCreateTextServices =
   (PCreateTextServices)GetProcAddress(hRE, "CreateTextServices");
CNoShowHost noShowHost;   // ITextHost object described below
IUnknown *pUnk;
HRESULT hr = pfnCreateTextServices(nullptr, &noShowHost, &pUnk);
if (hr != NOERROR)
   return hr;

ITextServices2* pserv;
hr = pUnk->QueryInterface(IID_ITextServices2, (void **)&pserv);
if (hr != NOERROR)
   return hr;

ITextDocument2* pdoc;
hr = pserv->QueryInterface(IID_ITextDocument2, (void **)&pdoc) != S_OK);
if (hr != NOERROR)
    return hr;

ITextServices2 inherits from ITextServices. The method ITextServices::TxSendMessage() lets you send messages to the RichEdit control, and the pdoc pointer is the top TOM2 interface that lets you use TOM methods for processing text. For example,

pserv->TxSendMessage(WM_SETTEXT, 0, reinterpret_cast<LPARAM>(szTextRtf),

inserts the RTF string szTextRtf into the control. You can also insert a Unicode string as the LPARAM.

Finding a math expression

An interesting case illustrating the TOM interfaces is finding a math formula in a document. Read in the RTF or HTML document and then search the document using the ITextRange2::Find() method. First, to get the user selection ITextSelection* from the pdoc pointer, execute

ITextSelection2* psel;

The math expression you want to find in the document should be in a separate RichEdit control with an ITextDocument2* pdocExpr pointer. Get an ITextRange2* from pdocExpr by executing

pdocExpr->Range2(0, tomForward, &pRangeExpr);
pRangeExpr->MoveEnd(tomCharacter, -1, nullptr);

Here the MoveEnd() call unselects the final CR (all rich-text controls end with a final CR (U+000D)). Then to find the math expression selected by pRangeExpr and starting in the document where psel is pointing, call

LONG Delta;
psel->Find(pRangeExpr, tomForward, Flags, &Delta);

Here the values for Flags are defined in ITextRange::FindText(), and Delta gets set to the count of characters found (if found). The search is “fuzzy”, that is, it ignores character format attributes like foreground/background color, font face name, and font height. But it distinguishes Unicode characters, so the following kinds of letters are all distinct 𝐻𝑯𝐇ℌℋ. And math objects like fractions, subscripts, superscripts, integrals, matrices, equation arrays, etc., are all distinct. This approach is more general than searching for MathML, LaTeX, RTF, HTML, OMML, UnicodeMath, etc., since it first converts these formats to OfficeMath and then does a fuzzy search on the resulting OfficeMath. For a more detailed description of the algorithm, see Math Find/Replace and Rich Text Searches.

No-display ITextHost

A RichEdit control needs a host given by an implementation of ITextHost or ITextHost2. If no display is needed, ITextHost is adequate. Here’s a skeleton version of such a host. You can use it as is, or enhance it as desired.

class CNoShowHost : public ITextHost
   // IUnknown methods
   STDMETHODIMP QueryInterface(REFIID, void **) noexcept {return E_NOTIMPL;}
   STDMETHODIMP_(ULONG) STDMETHODCALLTYPE AddRef(void) noexcept {return 1;}
   STDMETHODIMP_(ULONG) STDMETHODCALLTYPE Release(void) noexcept {return 1;}

   // ITextHost Methods
   HDC  TxGetDC() {return nullptr;}
   INT  TxReleaseDC(HDC) {return 0;}
   BOOL TxShowScrollBar(INT, BOOL) {return false;}
   BOOL TxEnableScrollBar(INT, INT) {return false;}
   BOOL TxSetScrollRange(INT, LONG, INT, BOOL) {return false;}
   BOOL TxSetScrollPos(INT, INT, BOOL) {return false;}
   void TxInvalidateRect(LPCRECT, BOOL) {}
   void TxViewChange(BOOL) {}
   BOOL TxCreateCaret(HBITMAP, INT, INT) {return false;}
   BOOL TxShowCaret(BOOL) {return false;}
   BOOL TxSetCaretPos(INT, INT) {return false;}
   BOOL TxSetTimer(UINT, UINT) {return false;}
   void TxKillTimer(UINT) {}
   void TxSetCapture(BOOL) {}
   void TxSetFocus() {}
   void TxSetCursor(HCURSOR, BOOL) {}
   BOOL TxScreenToClient(LPPOINT) {return false;}
   BOOL TxClientToScreen(LPPOINT) {return false;}
   HRESULT TxActivate(LONG *) {return E_UNEXPECTED;}
   HRESULT TxDeactivate(LONG) {return E_UNEXPECTED;}
   HRESULT TxGetClientRect(LPRECT) {return E_UNEXPECTED;}
   HRESULT TxGetViewInset(LPRECT) {return E_UNEXPECTED;}
   HRESULT TxGetCharFormat(const CHARFORMATW **) {return E_UNEXPECTED;}
   HRESULT TxGetParaFormat(const PARAFORMAT **) {return E_UNEXPECTED;}
   COLORREF TxGetSysColor(int) {return 0;}
   HRESULT TxGetMaxLength(DWORD *) {return S_OK;}
   HRESULT TxGetScrollBars(DWORD *) {return E_UNEXPECTED;}
   HRESULT TxGetPasswordChar(_Out_ WCHAR *) {return E_UNEXPECTED;}
   HRESULT TxGetAcceleratorPos(LONG *) {return E_UNEXPECTED;}
   HRESULT OnTxCharFormatChange(const CHARFORMATW *) {return E_UNEXPECTED;}
   HRESULT OnTxParaFormatChange(const PARAFORMAT *) {return E_UNEXPECTED;}
   HRESULT TxGetPropertyBits(DWORD /* dwMask */, DWORD *pdwBits) {*pdwBits = TXTBIT_RICHTEXT | TXTBIT_MULTILINE; return S_OK;}
   HRESULT TxNotify(DWORD, void *) {return E_UNEXPECTED;}
   HIMC TxImmGetContext() {return nullptr;}
   void TxImmReleaseContext(HIMC) {}
   HRESULT TxGetSelectionBarWidth(LONG *) {return E_UNEXPECTED;}