The empty set contains nothing. This sounds really silly, but it’s actually really nice.
The Windows Runtime has a policy that if a method returns a collection (such as an IVector
), and the method produces no results, then it should return an empty collection, rather than a null reference. That way, consumers can just iterate over the collection without having to deal with a null test.
For example, suppose you have a method Widget::
which returns an IVectorView<Doodad>
representing the Doodad
objects that have been associated with a Widget
object. If no Doodad
s have been associated with the Widget
, then it should return an empty vector, not a null pointer. That allows developers to write the natural-looking code:
// C# foreach (var doodad in widget.GetAssociatedDoodads()) { ⟦ process each doodad ⟧ } // C++/WinRT for (auto&& doodad : widget.GetAssociatedDoodads()) { ⟦ process each doodad ⟧ } // JavaScript widget.GetAssociatedDoodads().forEach(doodad => { ⟦ process each doodad ⟧ });
rather than having to insert a null test (which is easily forgotten):
// C# var doodads = widget.GetAssociatedDoodads(); if (doodads != null) { // annoying null test foreach (var doodad in widget.GetAssociatedDoodads()) { ⟦ process each doodad ⟧ } } // C++/WinRT auto doodads = widget.GetAssociatedDoodads(); if (doodads) { // annoying null test for (auto&& doodad : doodads) { ⟦ process each doodad ⟧ } } // JavaScript var doodads = widget.GetAssociatedDoodads(); if (doodads) { // annoying null test doodads.forEach(doodad => { ⟦ process each doodad ⟧ }); }
The principle of the empty collection applies to other types of collections, like IMap<K, V>
, array
. You can think of strings as collections of characters, and you can think of memory buffers (such as IBuffer
) as collections of bytes.
An example of a poor design is the CryptographicBuffer
class. (Sorry, CryptographicBuffer
, for throwing you under the bus.)
Method | Expected Result | Actual Result |
---|---|---|
buffer = ConvertStringToBinary(""); |
buffer != null buffer.Length == 0 |
buffer == null buffer.Length /* crashes */ |
buffer = CreateFromByteArray(new[] {}); |
||
buffer = DecodeFromBase64String(""); |
||
buffer = DecodeFromHexString(""); |
||
buffer = GenerateRandom(0); |
buffer != null buffer.Length == 0 |
If the ConvertStringToBinary
, CreateFromByteArray
, DecodeFromBase64String
, DecodeFromHexString
are given empty strings or arrays, you expect them to produce an empty buffer, but instead they return no buffer at all.
This means that code like this looks correct:
// Write the string to a file as UTF-8 var buffer = CryptographicBuffer.ConvertStringToBinary( BinaryStringEncoding.Utf8, message); await FileIO.WriteBufferAsync(storageFile, buffer);
but then you discover (probably at a very inconvenient moment) that it crashes if the message is an empty string, because ConvertStringToBinary
returned null
(instead of a non-null reference to an empty buffer), and then WriteBufferAsync
threw an invalid parameter exception because the buffer cannot be null.
On the other hand, if you ask GenerateRandom
to generate zero random bytes, it correctly gives you an empty buffer, rather than a null pointer. So at least one of the methods in the CryptographicBuffer
class understands how empty collections work.
As a bonus insult, the CryptographicBuffer.
method requires that both buffers be non-null, so you can’t even do this:
// Do it twice and confirm the results are the same var buffer1 = CryptographicBuffer.ConvertStringToBinary( BinaryStringEncoding.Utf8, message); var buffer2 = CryptographicBuffer.ConvertStringToBinary( BinaryStringEncoding.Utf8, message); if (CryptographicBuffer.Compare(buffer1, buffer2)) { // the buffers are equal }
The code crashes if the message is an empty string because buffer1
and buffer2
will be null
, which is not a valid parameter to CryptographicBuffer.
. It’s a bit ironic that the CryptographicBuffer
can dish out null buffers but can’t take them.
Cryptography in general seems to have a hard time with the concept of zero. The UserDataProtectionManager.
method, for example, rejects attempts to protect an empty buffer, so if you want to protect a buffer that might be empty, you need to special-case the empty buffer.
// This version crashes if the buffer is empty. static class Protector { static UserDataProtectionManager manager = UserDataProtectionManager.TryGetDefault(); public Task<IBuffer> ProtectBufferAsync(IBuffer buffer) { if (manager != null) { return await manager.ProtectBufferAsync(buffer, UserDataAvailability.AfterFirstUnlock); } else { // No protection available - leave unprotected. return buffer; } } public Task<IBuffer> UnprotectBufferAsync(IBuffer buffer) { if (manager != null) { return await manager.UnProtectBufferAsync(buffer); } else { // No protection available - it was left unprotected. return buffer; } } }
A naïve way of fixing this is to detect an empty buffer and just skip the ProtectBufferAsync
call, letting an empty buffer be its own protected buffer. This is a bad idea, however, because a bad guy who sees an empty protected buffer will know that this represents an empty unprotected buffer. If the buffer represents a password, then they will know that the password is blank!
If you choose some sentinel non-empty buffer value to represent a non-empty buffer, you then have to have some way of distinguishing this from a genuine non-empty buffer that happens to match your sentinel. In mathematical terms, your function that converts buffers to non-empty buffers needs to be injective. One way is to append a dummy byte to the buffer, and remove the dummy byte when unprotecting.
// C# // Work around inability to protect empty buffers // by appending a dummy byte to all buffers. var paddedBuffer = WindowsRuntimeBuffer.Create(buffer.Length + 1); paddedBuffer.Length = actualBuffer.Capacity; buffer.CopyTo(paddedBuffer); var protectedBuffer = await manager.ProtectBufferAsync( paddedBuffer, UserDataAvailability.AfterFirstUnlock); // Reverse the workaround by removing the dummy byte // after unprotecting. var result = await manager.UnprotectBufferAsync(protectedBuffer); if (result.Status == UserDataBufferUnprotectStatus.Succeeded) { var trimmedBuffer = result.UnprotectedBuffer; trimmedBuffer.Length = trimmedBuffer.Length - 1; ⟦ do something with the trimmed buffer ⟧ } // C++ // Work around inability to protect empty buffers // by appending a dummy byte to all buffers. auto length = buffer.Length(); auto paddedBuffer = winrt::Buffer(length + 1); paddedBuffer.Length(length + 1); memcpy_s(paddedBuffer.data(), length, buffer.data(), length); auto protectedBuffer = co_await manager.ProtectBufferAsync( paddedBuffer, winrt::UserDataAvailability::.AfterFirstUnlock); // Reverse the workaround by removing the dummy byte // after unprotecting. auto result = co_await manager.UnprotectBufferAsync(protectedBuffer); if (result.Status() == winrt::UserDataBufferUnprotectStatus::Succeeded) { auto trimmedBuffer = result.UnprotectedBuffer(); trimmedBuffer.Length(trimmedBuffer.Length() - 1); ⟦ do something with the trimmed buffer ⟧ }
The inability to handle zero-byte buffers makes everybody’s life harder.
Zero. It’s a valid number. Please support it.
I propose a corollary: any method that receives a collection should treat null as an empty collection.
One question I have here is: how might you address fixing `CryptographicBuffer`?
I think that the four versions that produce the null buffer could likely just be changed to return the empty result, because you're going from an error case to a success-but-empty case. But, it's a functional change, and it makes me think back to all of the work that you've done historically to ensure compatibility.
Another option might be to deprecated the methods and introduce...
At least one of the things where the idioms in C++ still shine.
If a function returns a collection by value (or by const ref in case of getters), all it can do is return a collection, no null shenanigans.
And thanks to move-semantics and [N]RVO, it should generally be pretty cheap to return by value.
Personally, I'm a big fan of Rust's decision to make Option and Result iterable. They both yield one object on the happy path, and zero objects on the unhappy path, which can then be composed with the rest of Rust's iterator calculus. This may sound like an awkward and fiddly way of doing things... except in cases where the underlying object either is a collection, or you plan to convert it into a collection in...
That's not what Rust (like all strongly-typed functional programming languages) is fundamentally about, though. In Rust a value must be explicitly made "nullable" by giving it a type of , and a value of type can never be "null". Good programming practice in Rust involves properly typing infallible computations as (or at least ) as opposed to . A well-designed Rust library will not have functions returning s all over the place, regardless...
It is not either-or. There are situations where the correct API type is an Option of some kind (to distinguish between e.g. "there are no active frobnicators right now" and "frobnicator support is disabled, so there is no list of active frobnicators"), but also a particular caller wants to map None to the empty collection (and a different caller might not want to do that). Option::into_iter() lets both callers get what they want with minimal...