RichEdit has two kinds of hyperlinks: automatic hyperlinks (autoURLs) and friendly-name hyperlinks. As its name suggests, the autoURL is automatically recognized by RichEdit as a hyperlink and is displayed as a URL. A friendly name hyperlink has a name, which is displayed, and a hidden instruction part that contains the URL. This post describes these hyperlinks and explains how to manipulate them programmatically. The descriptions include some features that have been added recently.
Automatic URLs
The first autoURLs appeared in RichEdit 2.0, which shipped with Office 97, and have the usual web form, such as, http://www.msn.com. The permitted URL schemes were http:, file:, mailto:, ftp:, https:, gopher:, nntp:, prosper:, telnet:, news:, wais:, and outlook:. To include spaces in the URL, the whole URL had to be enclosed in an angle bracket pair as in <http://www.xxx.com/fun computing>. RichEdit 3.0, which shipped with Windows 2000 up through Windows 7, added the capability to recognize URLs of the form www.msn.com and ftp.unicode.org. RichEdit 4.1, which shipped with Windows XP up through Windows 7, added friendly name hyperlinks as well as autoURLs of the form \\word\richedit2\murrays. RichEdit 7, which shipped with Office 2010, added recognition for spaces in URLs without needing enclosure in <>. It also added recognition of telephone numbers, drive-letter paths, email addresses, and URLs enclosed in ASCII double quotes “”. It made all of these recognitions optional, since you might not want to recognize, for example, phone numbers, or you might want to recognize telephone numbers exclusively.
The recognition is dynamic, fast, and displayed by default with underline and a blue text color. The autoURL notifications can be sent to the client application by user actions such as typing the Enter key or clicking the left mouse button.
To enable or disable recognition of URLs and file paths in a RichEdit control, send the control the message EM_AUTOURLDETECT with lparam = 0 and wparam = 1 or 0, respectively. When autoURL recognition and link notifications are enabled, mouse movement over a link or clicking on a link sends an EN_LINK notification with the URL start and end character positions to the client.
More generally, wparam can have any combination of the following flags:
AURL_ENABLEURL | 1 | Recognize standard web URLs and file paths |
AURL_ENABLEEMAILADDR | 2 | Recognize email addresses |
AURL_ENABLETELNO | 4 | Recognize telephone numbers |
AURL_ENABLEEAURLS | 8 | Recognize East Asian URLs |
AURL_ENABLEDRIVELETTERS | 16 | Recognize file paths that start with a drive letter |
AURL_DISABLEMIXEDLGC | 32 | Disable mixed Latin Greek Cyrillic IDNs |
AURL_DISABLEAUTOFORMAT | 64 | Disable auto URL formatting |
AURL_URLRTFHTMLSTRICT | 128 | Only encode URLs defined in RTF/HTML source |
AURL_NOINITIALSCAN | 256 | Don’t scan doc when enabling autoURL reco |
AURL_ENABLEGETURL | 512 | Make ITextRange2::GetURL() return autoURLs |
AURL_ENABLEEAURLS is a preferred way to enabling East Asian URL recognition. For compatibility with older software, lparam = 1 also enables East Asian URL recognition. But lparam can be used instead to point to a client null-terminated string specifying URL scheme protocols. The string consists of URI scheme names each terminated by a ‘:’. See https://www.ietf.org/rfc/rfc2396.txt for validation criteria. The default string is “:callto:file:ftp:gopher:http:https:mailto:news:nntp:notes:onenote:outlook:prospero:read:tel:telnet:wais:webcal:”. The message EM_GETAUTOURLDETECT (WM_USER + 92) gets the flags, but not the scheme string.
In memory, autoURLs are identified by the CFE_LINK character formatting attribute. You can retrieve this attribute using the EM_GETCHARFORMAT or ITextFont2::GetEffects(). Alternatively, you can use tomLink unit in the TOM (Text Object Model) ITextRange::StartOf(), EndOf(), Expand(), Move(), MoveEnd(), and MoveStart() methods to navigate and select autoURLs and friendly-name links as well. ITextFont2::SetEffects(Value, Mask) with Value = 0 and Mask = CFM_LINK turns off autoURL detection for the range associated with the ITextFont2 (sets the link type to tomNoAutoLink) provided AURL_URLRTFHTMLSTRICT is active.
RichEdit Friendly-Name Hyperlinks
A friendly-name hyperlink has a name, which is displayed, and a hidden instruction part that contains the URL. Such hyperlinks are commonly used when an author wants to display an informative name for a link rather than the URL itself. It can be hard to read URLs these days what with all the protection built into them. So, friendly-name URLs are much nicer 😊
A friendly name hyperlink is essentially a field with two parts: an instruction part containing the URL and a result part containing the name. In fact that’s the way it appears in RTF, which has the syntax {\field{\*\fldinst {HYPERLINK “…”}}{\fldresult{…}}} and in HTML with <a href=”url”>name</a>.
In RichEdit, a hyperlink is represented by character formatting effects, unlike by the delimiters used for math and other in-line objects. As such, hyperlinks cannot be nested, although friendly-name hyperlinks can be located next to one another. In contrast, autoURLs need to be separated by at least one character. The whole friendly-name hyperlink has the character formatting effects of CFE_LINK and CFE_LINKPROTECTED, whereas autoURLs only have the CFE_LINK attribute. The CFE_LINKPROTECTED is included so that the autoURL scanner skips over friendly-name links. The instruction part, i.e., the URL, has the CFE_HIDDEN attribute as well, since it’s not supposed to be displayed. The URL itself is enclosed in ASCII double quotes and preceded by the string “HYPERLINK “. Since CFE_HIDDEN plays an integral role in friendly-name hyperlinks, it cannot be used in the name.
For example, in WordPad, which uses RichEdit, a hyperlink with the name MSN would have the plain text
HYPERLINK “http://www.msn.com”MSN
The whole link would have CFE_LINK and CFE_LINKPROTECTED character formatting attributes and all but the “MSN” would have the CFE_HIDDEN attribute.
You can insert a friendly-name hyperlink by reading in the corresponding RTF or by sending the RTF in a WM_SETTEXT or EM_SETTEXTEX message. For the example above, the RTF could be
{\rtf1{\field{\*\fldinst{ HYPERLINK “http://www.msn.com”}}{\fldresult{MSN}}}}.
Note that if you encode a path name in the fldinst part, each backslash has to be doubled. In a C++ string, this means each backslash has to be quadrupled.
If the friendly name is the same as the URL, the link is converted to an autoURL unless the autoURL recognizer fails to recognize the URL completely. The reason for this conversion is so that the user can edit the URL and have it be the same as what gets launched when the user clicks on the URL. In Word, you can change the friendly name without updating the URL, which can be misleading in this case. The problem is mitigated in Word since Word has an edit-link dialog that shows both the URL and the friendly name. RichEdit is a component and doesn’t have dialogs, so it’s more secure to convert such links to autoURLs.
Using RTF to insert links works well, but there are also programmatic approaches. The ITextRange2:: SetURL (BSTR bstr) method applies the URL in the bstr to the range of text selected by the ITextRange2. The text in the bstr needs to start and end with ASCII double quotes. The SetURL() method inserts the word “HYPERLINK” in front of the URL. You can remove the link status from a friendly name hyperlink by calling SetURL() with a NULL bstr or one that has only the start and end quotes, signifying an empty string.
To retrieve the URL, select it and then call ITextRange2::GetURL(&bstr). That way the bstr doesn’t have HYPERLINK and quotes. To get the friendly name, use ITextRange2::GetText2(tomNoHidden, &bstr). Then the client can insert whatever surround characters it desires. If you call GetURL() for a URL like www.msn.com, it returns http://www.msn.com (assuming you have included the AURL_ENABLEGETURL flag in your EM_AUTOURLDETECT message).
As for autoURLs, the RichEdit client enables hyperlink notifications (EN_LINK) by sending RichEdit the ENM_LINK flag in the mask included with the EM_SETEVENTMASK message. The client can enable tooltips displaying the URLs by sending the EM_SETEDITSTYLE message with the SES_HYPERLINKTOOLTIPS (8) flag.
To find out what kind of link a range is in or selects, get an ITextFont2 from the range (call ITextRange2::GetFont2(ppFont)) and then call ITextFont2::GetLinkType. The value returned has the following semantics
tomNoLink | 0 | Not any kind of link |
tomClientLink | 1 | Client link |
tomFriendlyLinkName | 2 | Friendly name of a friendly-name link |
tomFriendlyLinkAddress | 3 | Address of a friendly-name link |
tomAutoLinkURL | 4 | Auto URL |
tomAutoLinkEmail | 5 | Email address |
tomAutoLinkPhone | 6 | Phone number |
tomAutoLinkPath | 7 | File path |
tomNoAutoLink | 15 | Auto link recognition suppressed |
User Interface
If you only select part of the friendly name, the URL isn’t included. If you select the whole friendly name, the URL is included. If you select from inside the friendly name up through a character outside the friendly name, the whole link is selected along with whatever is selected outside the link
Consider the friendly hyperlink for the text “Hello” pointing to the URL “http://www.hello.com”. It has the following dispatch behavior:
- If the cursor is in the middle of the word “Hello” (say between “He” or “el” or “ll” or “lo”) and you hit Enter, an EN_LINK notification is sent and the client launches the link.
- However if the cursor precedes the “H” or follows the “o”, no EN_LINK notification is sent and an end of paragraph is inserted.
Word has the same behavior for the Enter key. The problem is that the Enter key is also used for inserting a paragraph break. So in these edge cases, a choice had to be made and the usual meaning of Enter prevailed. If there’s pointer access to a link, e.g., mouse or touch, the hyperlink can be launched easily that way.
What was the rationale for requiring quotes on the URL passed to SetURL? Future expansion like “http:…”nohandcursor=1 ?
MSDN mentions a 0xFDDF sentinel prefix, what is that and why would you want to use it?
Quotes appear in the internal format. Probably SetURL() shouldn’t require them in the BSTR since SetURL() could add them. OneNote requested the sentinel prefix, perhaps to find friendly-name hyperlinks by examining the plain-text. I didn’t notice that MSDN mentioned it, or I would have documented it in the post.