Embracing the power of the empty set in API design (and applying this principle to selectors and filters)

Raymond Chen

In mathematics, the empty set is a set with no members. This sounds totally useless, but it’s actually quite powerful.

Suppose you have a Widget object and a method GetChildren() that returns the child widgets. What should you do if the Widget has no children?

Sometimes, teams try to be clever and say, “Oh, if there are no children, then I will just return a null pointer instead of a list of child elements.” For example, IShellFolder::EnumObjects has the rule that if it returns S_FALSE, then it is allowed to return a null pointer instead of an enumerator.

As a result, this code is subtly wrong:

ComPtr<IEnumIDList> e;
if (SUCCEEDED(shellFolder->EnumObjects(hwnd, SHCONTF_FOLDERS, &e)) {
    for (CComHeapPtr<IDLIST_RELATIVE> child;
         e->Next(1, &child, nullptr) == S_OK;
         child.Free()) {
        // process the child folder
    }
}

The above code is at risk of crashing if there are no child folders: The IShellFolder implementation may chose to use the special case carve-out and return S_FALSE from the EnumObjects method, and set e = nullptr. If that happens, then the enumeration loop crashes on a null pointer.

The correct code would have to check for this corner case and skip the enumeration loop. One way is to check for S_FALSE explicitly:

ComPtr<IEnumIDList> e;
HRESULT hr = shellFolder->EnumObjects(hwnd, SHCONTF_FOLDERS, &e);
if (SUCCEEDED(hr) && (hr != S_FALSE)) {                          
    for (CComHeapPtr<IDLIST_RELATIVE> child;
         e->Next(1, &child, nullptr) == S_OK;
         child.Free()) {
        // process the child folder
    }
}

Another is to check the enumerator object for null.

ComPtr<IEnumIDList> e;
if (SUCCEEDED(shellFolder->EnumObjects(hwnd, SHCONTF_FOLDERS, &e) && e) {
    for (CComHeapPtr<IDLIST_RELATIVE> child;
         e->Next(1, &child, nullptr) == S_OK;
         child.Free()) {
        // process the child folder
    }
}

Another is to check the return value for S_OK exactly:

ComPtr<IEnumIDList> e;
if (shellFolder->EnumObjects(hwnd, SHCONTF_FOLDERS, &e) == S_OK) {
    for (CComHeapPtr<IDLIST_RELATIVE> child;
         e->Next(1, &child, nullptr) == S_OK;
         child.Free()) {
        // process the child folder
    }
}

Regardless of how you deal with the issue, it doesn’t hide the fact that this special case is an extra bit of hassle, and more importantly, it’s the sort of hassle that is easily overlooked when writing code.

The obvious-looking code is wrong. The weird-looking code is correct. This is the opposite of what we want: We want the natural code to be the correct one.

The general rule for methods that return collections is therefore that if there are no items to return, then the methods should return empty collections, rather than null references. The empty set is a perfectly valid set, and existing algorithms operate properly on an empty set. For example, you iterate over an empty collection the same way as a nonempty collection.

The empty set also follows mathematical rules when applied to union and intersection. If you have collection of sets that happens to be empty, then mathematically, the union of the elements of the empty set is another empty set. And the intersection of the elements of the empty set is the entire universe.

When applied to functions, this means that if a method takes a set of criteria with the requirement that elements must match at least one of the criteria (a “selector” pattern), then an empty set selects no elements. On the other hand, if the requirement is that elements must match all of the criteria (a “filter” pattern), then an empty set selects all elements.¹

For example, suppose we have a method that finds all widgets of the specified colors:

IVectorView<Widget>
    FindWidgetsOfColors(IIterable<Color> colors);

What should happen if you pass an empty list of colors?

Well, since you didn’t specify any colors, then are no matching widgets, so mathematically speaking, this should return an empty vector. You might also choose to declare that choosing widgets by color must provide at least one color, so calling FindWidgetsOfColors with an empty set of colors is allowed to fail with E_INVALIDARG. But if you’re going to return something, you had better return the empty set. Do not design the function so that passing an empty list of colors returns all the widgets.

Conversely, suppose you have a method that finds widgets subject to filter criteria:

enum WidgetFilter
{
    Active,
    Connected,
};

IVectorView<Widget>
    FindWidgetsWithFilter(IIterable<WidgetFilter> filters);

What should happen if you pass an empty list of filters?

Mathematically, an empty list of filters is an unfiltered query, and you should return all the widgets. Again, you might decide that a filtered query must provide at least one filter, but if you choose to allow an empty list of filters, it had better return all the widgets.

These powers of the empty set mean that many operations have their natural meanings. For example, if you have a filter set and decide that you want to allow inactive widgets, you can remove WidgetFilter.Active from your filters. If FindWidgetsWithFilter treated an empty filter list as meaning “Return no widgets at all”, then you’d need a special case for the possibility that the Active filter was the last filter.

Bonus chatter: Most programming languages understand the power of the empty set. For example, C++ std::all_of and JavaScript Array.prototype.every return true when given an empty collection, whereas C++ std::any_of and JavaScript Array.prototype.some return false when given an empty collection.

¹ This naming distinction between “selector” patterns (where each selector adds to the list of results) and “filter” patterns (where each filter shrinks the list of results) is my own and is not officially part of the Windows API design patterns.

My proposal that was accepted is that filters should be named after what they allow to pass through, not by what they remove. For example, passing the WidgetFilter.Active filter means that you want the active widgets. It doesn’t mean that you want to filter out the active widgets (and return only the inactive ones). If you want a filter to remove active widgets, then the filter should be named something like WidgetFilter.Inactive. For adjectives that don’t have an obvious antonym, you can use the prefix Not, as in WidgetFilter.NotBlinking.

Bonus chatter: We saw another application of this principle some time ago when we looked at what a timeout of zero should mean.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

14 comments

Discussion is closed. Login to edit/delete existing comments.

John Elliott August 21, 2024

I remember having this argument with a company that published a REST API, which would return a collection if there was a result, but a 404 error rather than an empty collection if there wasn’t.
Michael Dunn August 13, 2024 · Edited

Ruby works the same way — `[].all?` returns true and `[].any?` returns false — but every time I need to know which function returns what, I have to run them both in the console. I guess I’m not good enough at set theory to remember which is which.
Ian Boyd August 13, 2024

We all know that languages should not have `null`. (and then Javascript goes ahead and has two types of nulls).

But then you'll have people insist that you cannot program without `null`. Similar to how in the early 1990s you still had people arguing that they **need** `goto`.

So it was amazing when C# 8 just let you say:

> Nulls: off

And that's that. No more nulls. And socity didn't collapse. And it turns out, like `goto`, you don't need `null`.

It was was then that i realized that i should **never** return an `null` to mean 'no results', but instead it should return...
Read more
We all know that languages should not have `null`. (and then Javascript goes ahead and has two types of nulls).

But then you’ll have people insist that you cannot program without `null`. Similar to how in the early 1990s you still had people arguing that they **need** `goto`.

So it was amazing when C# 8 just let you say:

> Nulls: off

And that’s that. No more nulls. And socity didn’t collapse. And it turns out, like `goto`, you don’t need `null`.

It was was then that i realized that i should **never** return an `null` to mean ‘no results’, but instead it should return an empty iterator/collection/list.

It’s all about getting developers to fall into the pit-of-success.

Read less
Simon Geard August 13, 2024
It's stupid — if I search for customers with the name "Smith" and there aren't any, that's not a special case... it's a set of all customers named Smith, just as it would be if there were a thousand of them.

Seriously, the number of times I've seen functions that do something like:

<code>

...and then seen the caller do:

<code>

Why do people think they need to write unnecessary extra code which has no impact other than forcing other people to unnecessary extra code?

Read more
It’s stupid — if I search for customers with the name “Smith” and there aren’t any, that’s not a special case… it’s a set of all customers named Smith, just as it would be if there were a thousand of them.

Seriously, the number of times I’ve seen functions that do something like:
```
List findCustomers(String name) {
    var customers = //do search
    return customers.isEmpty() ? null : customers; // Return null to signal nothing found
}
```
…and then seen the caller do:
```
var customers = findCustomer(name);
if (customers == null) {
    customers = emptyList();//Null safety, because the author of findCustomer is an idiot.
}
```
Why do people think they need to write unnecessary extra code which has no impact other than forcing other people to unnecessary extra code?
Read less
Bastian Weßling August 12, 2024 · Edited
Interesting post, but I am confused since I see no difference between:
```
// No color passed => return empty set
IVectorView FindWidgetsOfColors(IIterable colors);

// No filter passed => return all widgets
IVectorView FindWidgetsWithFilter(IIterable filters);
```
To me, both of these are filter functions. A color is a filter that matches widgets with that color. But maybe that is more of a philosophical question.
- Matthew Barnett August 13, 2024
  
  I believe the point is that ‘FindWidgetsOfColors’ would be written to return _any_ that match and ‘FindWidgetsWithFilter’ would be written to return _all_ that match, although the names of the functions don’t make that explicit (it puzzled me too for a while!). Perhaps a better solution would be to write them to accept a predicate to be passed so that the user could choose how it should select the widgets.
  - Raymond Chen Author August 13, 2024
    
    Yeah, I didn't name FindWidgetsOfColors well.
    
    The predicate is basically the same as "FindAllWidgets" and then postfiltering the list. The idea behind passing your criteria up front is that the system can optimize the search. E.g. "I already index the widgets by color, so if you ask for the red ones, I can give you those immediately without having to search through the boxes where I keep the blue and green ones, only for your predicate to say 'Nope, wrong color' for all of them." That's just the for-if antipattern.
    
    Read more
    Yeah, I didn’t name FindWidgetsOfColors well.
    
    The predicate is basically the same as “FindAllWidgets” and then postfiltering the list. The idea behind passing your criteria up front is that the system can optimize the search. E.g. “I already index the widgets by color, so if you ask for the red ones, I can give you those immediately without having to search through the boxes where I keep the blue and green ones, only for your predicate to say ‘Nope, wrong color’ for all of them.” That’s just the for-if antipattern.
    
    Read less
- Raymond Chen Author August 13, 2024 · Edited
  
  If FindWidgetsOfColors were a filter function, then FindWidgetsOfColors({ Red, Blue }) would mean “find widgets that are both red and blue” (because filters combine via intersection: you have to pass all the filters). If we assume that each widget is only one color, then there would be no point to passing more than one color to a color filter, since it would never return any results.
  - Bastian Weßling August 14, 2024
    
    Okay, if we are talking about the zero element of or versus and, then we are on the same page. The difference between the semantics of both Find… functions didn’t register with me.
- Alexander Buynyachenko August 13, 2024 · Edited
  I read it this way:
```
FindWidgedWithColors
```
  always has explicit filter that is required to be passed in – it is the color filter.
```
FindWidgetsWithFilter
```
  gives you an option of not passing filter at all.
Dan Bugglin August 12, 2024 · Edited

.NET provides the APIs Array.Empty and Enumerable.Empty specifically for generating empty sets. Array.Empty even caches the array so you can reuse it everywhere you need that empty set of a specific type and aren't recreating it every time. I assume Enumerable.Empty does something similar.

There's also string.Empty but I'm not sure if that counts. Regardless .NET will deduplicate constant strings anyway so string.Empty won't get you that extra benefit since you were already getting it. I just looked it up and it sounds like the deduplication was introduced in .NET Framework 2.0 so in 1.x using string.Empty would have gotten you...
Read more
.NET provides the APIs Array.Empty and Enumerable.Empty specifically for generating empty sets. Array.Empty even caches the array so you can reuse it everywhere you need that empty set of a specific type and aren’t recreating it every time. I assume Enumerable.Empty does something similar.

There’s also string.Empty but I’m not sure if that counts. Regardless .NET will deduplicate constant strings anyway so string.Empty won’t get you that extra benefit since you were already getting it. I just looked it up and it sounds like the deduplication was introduced in .NET Framework 2.0 so in 1.x using string.Empty would have gotten you that benefit, hence its existence.

Edit: This comment is intended for any visitors looking for additional info more so than Mr Chen.

Read less
- Michael Taylor August 12, 2024 · Edited
  `String.Empty` exists in .NET for one reason - not all languages support strings or even empty strings. This constant was defined for languages that don't support string literals. That was back when .NET was introduced as a new runtime for any language. Behind the scenes it is no different than using an empty string literal. Early on (.NET 2) I saw a lot of "recommendations" to use it over a string literal yet none of the reasons actually made sense. I heard things like "it is faster" or "it is more readable" but none of that is true, in my...
  Read more
  `String.Empty` exists in .NET for one reason – not all languages support strings or even empty strings. This constant was defined for languages that don’t support string literals. That was back when .NET was introduced as a new runtime for any language. Behind the scenes it is no different than using an empty string literal. Early on (.NET 2) I saw a lot of “recommendations” to use it over a string literal yet none of the reasons actually made sense. I heard things like “it is faster” or “it is more readable” but none of that is true, in my experience. There are even analyzers that recommend it yet if you read the discussions in the C# language design they all same the same thing – they are identical and dedupped by the JIT (or compiler perhaps). The only real difference is a literal is a const whereas `String.Empty` is a readonly value and therefore cannot be used in cases where a const expression is needed.
  
  As for `Array.Empty` or `Enumerable.Empty`, they do the same thing. Under the hood they rely on the compiler’s behavior to generate code only once per (ref) type and so they define the instance once and just reuse it vs you creating your own instance each time. In theory the compiler could rewrite empty array declarations to just use the methods but it might break code. In the newer language versions though you don’t even need to use these types. Just use an empty initializer expression.
  
  MyData[] data = []; IEnumerable data2 = [];
  Read less
- Kevin Norris August 12, 2024
  
  In Rust, an "array" has a known size at compile time, and an "empty array" is therefore a zero-sized type (ZST). The compiler knows that every instance of a given ZST is identical, so it automatically constant-folds all ZST instances out of existence as a guaranteed optimization. This also means that a pointer to a ZST does not (need to) point to a real allocation, it can just point to a made-up aligned and non-null value (the compiler promises to never actually dereference the pointer, because there would be no data to load or store anyway, but it may optimize...
  Read more
  In Rust, an “array” has a known size at compile time, and an “empty array” is therefore a zero-sized type (ZST). The compiler knows that every instance of a given ZST is identical, so it automatically constant-folds all ZST instances out of existence as a guaranteed optimization. This also means that a pointer to a ZST does not (need to) point to a real allocation, it can just point to a made-up aligned and non-null value (the compiler promises to never actually dereference the pointer, because there would be no data to load or store anyway, but it may optimize on the assumption that the pointer is aligned and non-null). In practice, you’re more likely to see a slice or Vec than an array (in the sort of API that Raymond is describing, anyway)… but slices can play the same game, because they’re just fat pointers, and if the size is zero, then the pointee is again a ZST. So you can construct a slice pointer whose size is zero and whose data pointer is NonNull::dangling().as_ptr(), and the language considers that a valid empty slice, no need for a full allocation. std::slice::from_raw_parts() explicitly calls this out as a thing that is legal to do.
  
  In principle, this can also be done for str, but nobody would bother because you can just write “” and the compiler will do it for you.
  
  Read less
  - 紅樓鍮 August 15, 2024
    
    Note that Rust supports &[] (and &mut []) as well.