We just shipped an update to our experimental implementation of a multi value dictionary. In this post, our software developer intern Ian Hays talks about the changes. — Immo
Goodbye MultiDictionary
In my last post I went over MultiDictionary
, officially available on NuGet as the prerelease package Microsoft.Experimental.Collections. We received great feedback, questions and commentary in the comments, and it was clear that this was something that a lot of you felt passionately about (70 comments? Awesome!). We’ve read all of your comments and taken them into consideration for this next iteration of Microsoft.Experimental.Collections
.
You should also check out our interview on Channel 9:
Hello MultiValueDictionary
First off, let’s talk about the name. It was a bit ambiguous what the “Multi” in “MultiDictionary” referred to: at first glance, “multi” could mean there were multiple keys per value, or a dictionary of dictionaries, or that it was a bi-directional dictionary. To make it explicit and leave room for other variants in the future, we’ve renamed the type to MultiValueDictionary
to clarify that the type allows multiple values for a single key.
Let’s get right to the meat of the post: what’s changed? We’ll go into some of the major design decisions and changes that make up the new MultiValueDictionary
in the next sections.
IEnumerable of…?
MultiDictionary
could be thought of as Dictionary<TKey, TValue>
where we could have multiple elements with the same TKey
. MultiValueDictionary
is more akin to a Dictionary<TKey, IReadOnlyCollection<TValue>>
with a number of methods to enable easy modification of the internal IReadOnlyCollections
. This distinction may seem subtle, but it affects how you consume the data structure.
For example, let’s look at the Count and Values properties. MultiDictionary
would return the number of values and a collection of values, while MultiValueDictionary
returns the number of keys and a collection of IReadOnlyCollections
of values.
// MultiDictionary
var multiDictionary = new MultiDictionary<string, int>(); multiDictionary.Add(“key”, 1); multiDictionary.Add(“key”, 2); //multiDictionary.Count == 2 //multiDictionary.Values contains elements [1,2]
// MultiValueDictionary
var multiValueDictionary = new MultiValueDictionary<string, int>(); multiValueDictionary.Add(“key”, 1); multiValueDictionary.Add(“key”, 2); //multiValueDictionary.Count == 1 //multiValueDictionary.Values contains elements [[1,2]]
This behavioral change also affects the enumerator in the same way that it affects the Values
property. Previously the dictionary was flattened when enumerating, as it implemented IEnumerable<KeyValuePair<TKey, TValue>>
. MultiValueDictionary
now implements IEnumerable<KeyValuePair<TKey, IReadOnlyCollection<TValue>>
.
var multiValueDictionary = new MultiValueDictionary<string, int>();
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 2);
multiValueDictionary.Add("anotherKey", 3);
foreach (KeyValuePair<string, IReadOnlyCollection<int>> key in multiValueDictionary) { foreach (int value in key.Value) { Console.WriteLine(“{0}, {1}”, key.Key, value); } } // key, 1 // key, 2 // anotherKey, 3
As Sinix pointed out in the previous blog post comments, this is very similar to another type in the .NET Framework, ILookup<TKey, TValue>
. MultiValueDictionary
shouldn’t implement both the dictionary and lookup interfaces, because that would cause it through interface inheritance to implement two different versions of IEnumerable
: IEnumerable<KeyValuePair<TKey, IReadOnlyCollection<TValue>>
and IEnumerable<IGrouping<TKey, TValue>
. It wouldn’t be clear which version you would get when using foreach
. But since MultiValueDictionary
logically implements the concept, we’ve added a method AsLookup()
to MultiValueDictionary
which returns an implementation of the ILookup
interface.
var multiValueDictionary = new MultiValueDictionary<string, int>();
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 2);
multiValueDictionary.Add("anotherKey", 3);
var lookup = multiValueDictionary.AsLookup(); foreach (IGrouping<string, int> group in lookup) { foreach (int value in group) { Console.WriteLine(“{0}, {1}”, group.Key, value); } } // key, 1 // key, 2 // anotherKey, 3
Indexing and TryGetValue
In the first iteration of the MultiDictionary
we followed the precedent from Linq’s AsLookup()
with regards to the way the indexation into the MultiDictionary
worked. In a regular Dictionary
, if you attempt to index into a key that isn’t present you’ll get a KeyNotFoundException
, but like AsLookup()
, the MultiDictionary
returned an empty list instead. This was mostly to match the functionality of the Lookup
class that is conceptually similar to the MultiDictionary
, but also because this behavior was more practically applicable to the kinds of things you’d be using the MultiDictionary
.
With the behavior changes brought on by the MultiValueDictionary
and the addition of the AsLookup()
method, this old functionality doesn’t quite fit anymore. We heard feedback that this inconsistency between MultiDictionary
and Dictionary
was confusing, so the MultiValueDictionary
will now throw a KeyNotFoundException
when indexing on a key that isn’t present. We’ve also added a TryGetValue
method to accommodate the new behavior.
var multiValueDictionary = new MultiValueDictionary<string, int>();
multiValueDictionary.Add("key", 1);
//multiValueDictionary["notkey"] throws a KeyNotFoundException
IReadOnlyCollection<int> collection = multiValueDictionary["key"];
multiValueDictionary.Add("key", 2);
//collection contains values [1,2]
Another related change with the MultiValueDictionary
on the topic of the indexer is the return value. Previously we returned a mutable ICollection<TValue>
. Adding and removing values from the returned ICollection<TValue>
updated the MultiDictionary
. While there are uses for this functionality, it can be unexpected and create unintentional coupling between parts of an application. To address this we’ve changed the return type to IReadOnlyCollection<TValue>
. The read-only collection will still update with changes to the MultiValueDictionary
.
When a List just doesn’t cut it
One limitation of the MultiDictionary
was that internally, it used a Dictionary<TKey, List<TValue>>
and there was no way to change the inner collection type. With the MultiValueDictionary
we’ve added the ability to specify your own inner collection.
Showing a simple example of how they work is probably easier than trying to describe them first, so let’s do that.
var multiValueDictionary = MultiValueDictionary<string, int>.Create<HashSet<int>>();
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 1);
//multiDictionary["key"].Count == 1
Above, we replace the default List<TValue>
with a HashSet<TValue>
. As the examples show, HashSet
combines duplicate TValues
.
For every constructor there is a parallel generic static Create
method that takes the same parameters but allows specification of the interior collection type. It’s important to point out that this doesn’t affect the return value of the indexer/TryValueValue
though (they return very limited IReadOnlyCollections
regardless of the inner collection type).
If you want a little bit more control over how your custom collection is instantiated, there are also the more specific Create
methods that allow you to pass a delegate to specify the inner collection type:
var multiValueDictionary = MultiValueDictionary<string, int>.Create<HashSet<int>>(myHashSetFactory);
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 1);
//multiValueDictionary["key"].Count == 1
In either case, the specified collection type must implement ICollection<TValue>
and must not have IsReadOnly
set to true by default.
And that’s all!
You can download the new MultiValueDictionary
from NuGet and try it out for yourself! If you have any questions or if you just want to give feedback, please leave a comment or contact us.
0 comments