Raymond’s complete guide to HSTRING semantics
The title of today’s article is a blatant ripoff of Eric Lippert’s complete guide to BSTR semantics.
I’m going to start with a lie: An
HSTRING is a reference-counted Unicode string.
Work with me here.
The string is immutable, and it uses the UTF-16LE encoding, as is traditional in Windows.
Here are the basic operations on
WindowsCreateString creates an
HSTRING from a UTF-16LE-encoded buffer and a specified length. The buffer does not require a terminating null. If the buffer contains embedded null characters, then the resulting
HSTRING will have embedded null characters. (In particular, if you pass a null-terminated string and you include the null terminator in the length, then the resulting string has an embedded null character. Note also that the length is in
wchar_t code units, not in bytes.)
WindowsDuplicateString increments the reference count on an
HSTRING, and returns a new
HSTRING which you should use to refer to the string.
WindowsDeleteString decrements the reference count on an
HSTRING. If the reference count drops to zero, then the string is destroyed. You shouldn’t use the
HSTRING after passing it to WindowsDeleteString.
There are a small number of string manipulation functions like WindowsSubstring and WindowsConcatString which create new strings from old strings. The set of operations is rather limited, however. If you want to perform fancy operations on
HSTRINGs, you’ll probably need to do them yourself. (Of course, if you’re using a projected language, the
HSTRING will project as something closer to what your projected language operates on natively, at which point you will most likely have a rich collection of library functions available to do advanced manipulations.)
To access the characters in the
HSTRING, use the WindowsGetStringRawBuffer function, which gives you two things: The return value is a pointer to the first character in the
HSTRING, and the optional output parameter is the number of code units. The buffer should be treated as read-only because
HSTRINGs are immutable.
The string contents in the buffer are always followed by a null character (which doesn’t count toward the string length); as a result, you can treat the string buffer as if it were a null-terminated string and get away with it most of the time.
The time you don’t get away with it is if the string contains embedded null characters. In that case, treating it as a null-terminated string will stop prematurely, mistaking the embedded null for the terminal null. You can use the WindowsStringHasEmbeddedNull function to detect whether an
HSTRING contains an embedded null and reject the operation if you don’t support embedded nulls.
One of the special rules for
HSTRING is similar to the corresponding rule for
BSTR, namely that a null pointer is equivalent to a zero-length string. But
HSTRING takes it further: Not only is a null pointer equivalent to a zero-length string, but in fact a null pointer is the representation of a zero-length string. In other words, if you call WindowsCreateString and specify that the string has length zero, then out will come a null pointer. It is legal to assume that a non-null
HSTRING represents a non-empty string. Conversely, it is legal to test an
HSTRING against a null pointer to see whether the string is empty.
Okay, so now I cop to the lie: An
HSTRING is not always a reference-counted string.
There are these things called fast-pass strings. Fast-pass strings are
HSTRINGs that involve no memory allocation. If you have a buffer that you want to turn into an
HSTRING, and you promise not to modify the buffer for the lifetime of your
HSTRING, then you can use the WindowsCreateStringReference function to create an
HSTRING around your buffer. The resulting
HSTRING is a legal
HSTRING, but instead of allocating memory on the heap for a reference-counted object, it uses the
HSTRING_HEADER structure which you passed to the WindowsCreateStringReference function to store the metadata, and it uses the buffer you passed to the function to store the string contents.
It’s called a fast-pass string because this special string doesn’t require any memory allocation, and no data copying occurs.
When you are finished with a fast-pass string, you just abandon the
HSTRING. The underlying memory for the fast-pass string was provided by you, so you are still on the hook for freeing that memory as appropriate.
The existence of fast-pass strings explains why the WindowsDuplicateString function returns you another
HSTRING: If the original string is fast-pass, then the WindowsDuplicateString function needs to convert it to a true reference-counted heap-allocated object, and then it returns an
HSTRING to that heap-allocated string. (On the other hand, if the
HSTRING is already a heap-allocated string with a reference count, then the WindowsDuplicateString function merely increments the reference count and returns the same
The rules for managing
HSTRINGs therefore go like this:
- If you receive an
HSTRINGas a function parameter, you are welcome to use it as-is until your function returns, but don’t call WindowsDeleteString on that string, because you are not the owner of the string. It was merely lent to you. (This is the same rule that applies to COM reference counts.)²
- If you need to keep using the
HSTRINGafter the function returns (say, because you’re saving it in a member variable), you must use
WindowsDuplicateStringand use the duplicate.
- Each call to WindowsCreateString or WindowsDuplicateString (or one of the helper functions that creates a string) should be matched to exactly one call to WindowsDeleteString which is passed the same handle that WindowsCreateString or WindowsDuplicateString returned.
You can think of fast-pass strings as lazy-heap-allocated strings: They get copied to the heap only if somebody needs to extend the lifetime of the string beyond the lifetime of the function.
The WRL library has wrapper classes for
HString class manages an
HSTRING, and the
HStringReference manages a fast-pass
¹ In theory, a debugging version of the WindowsDuplicateString function could create a full duplicate of the string anyway. That way, when you have an
HSTRING leak, you can use heap leak tools to find the code that duplicated the string and failed to destroy it. I don’t know if this theory actually occurs in practice.
² COM violates its own rule with the
CoGetInterfaceAndReleaseStream function, and that lapse came back to bite us.