August 5th, 2014

MultiDictionary becomes MultiValueDictionary

Immo Landwerth
Program Manager

We just shipped an update to our experimental implementation of a multi value dictionary. In this post, our software developer intern Ian Hays talks about the changes. — Immo

Goodbye MultiDictionary

In my last post I went over MultiDictionary, officially available on NuGet as the prerelease package Microsoft.Experimental.Collections. We received great feedback, questions and commentary in the comments, and it was clear that this was something that a lot of you felt passionately about (70 comments? Awesome!). We’ve read all of your comments and taken them into consideration for this next iteration of Microsoft.Experimental.Collections.

You should also check out our interview on Channel 9:

Hello MultiValueDictionary

First off, let’s talk about the name. It was a bit ambiguous what the “Multi” in “MultiDictionary” referred to: at first glance, “multi” could mean there were multiple keys per value, or a dictionary of dictionaries, or that it was a bi-directional dictionary. To make it explicit and leave room for other variants in the future, we’ve renamed the type to MultiValueDictionary to clarify that the type allows multiple values for a single key.

Let’s get right to the meat of the post: what’s changed? We’ll go into some of the major design decisions and changes that make up the new MultiValueDictionary in the next sections.

IEnumerable of…?

MultiDictionary could be thought of as Dictionary<TKey, TValue> where we could have multiple elements with the same TKey. MultiValueDictionary is more akin to a Dictionary<TKey, IReadOnlyCollection<TValue>> with a number of methods to enable easy modification of the internal IReadOnlyCollections. This distinction may seem subtle, but it affects how you consume the data structure.

For example, let’s look at the Count and Values properties. MultiDictionary would return the number of values and a collection of values, while MultiValueDictionary returns the number of keys and a collection of IReadOnlyCollections of values.

// MultiDictionary

var multiDictionary = new MultiDictionary<string, int>(); multiDictionary.Add(“key”, 1); multiDictionary.Add(“key”, 2); //multiDictionary.Count == 2 //multiDictionary.Values contains elements [1,2]

// MultiValueDictionary

var multiValueDictionary = new MultiValueDictionary<string, int>(); multiValueDictionary.Add(“key”, 1); multiValueDictionary.Add(“key”, 2); //multiValueDictionary.Count == 1 //multiValueDictionary.Values contains elements [[1,2]]

This behavioral change also affects the enumerator in the same way that it affects the Values property. Previously the dictionary was flattened when enumerating, as it implemented IEnumerable<KeyValuePair<TKey, TValue>>. MultiValueDictionary now implements IEnumerable<KeyValuePair<TKey, IReadOnlyCollection<TValue>>.

var multiValueDictionary = new MultiValueDictionary<string, int>();
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 2);
multiValueDictionary.Add("anotherKey", 3);

foreach (KeyValuePair<string, IReadOnlyCollection<int>> key in multiValueDictionary) { foreach (int value in key.Value) { Console.WriteLine(“{0}, {1}”, key.Key, value); } } // key, 1 // key, 2 // anotherKey, 3

As Sinix pointed out in the previous blog post comments, this is very similar to another type in the .NET Framework, ILookup<TKey, TValue>. MultiValueDictionary shouldn’t implement both the dictionary and lookup interfaces, because that would cause it through interface inheritance to implement two different versions of IEnumerable: IEnumerable<KeyValuePair<TKey, IReadOnlyCollection<TValue>> and IEnumerable<IGrouping<TKey, TValue>. It wouldn’t be clear which version you would get when using foreach. But since MultiValueDictionary logically implements the concept, we’ve added a method AsLookup() to MultiValueDictionary which returns an implementation of the ILookup interface.

var multiValueDictionary = new MultiValueDictionary<string, int>();
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 2);
multiValueDictionary.Add("anotherKey", 3);

var lookup = multiValueDictionary.AsLookup(); foreach (IGrouping<string, int> group in lookup) { foreach (int value in group) { Console.WriteLine(“{0}, {1}”, group.Key, value); } } // key, 1 // key, 2 // anotherKey, 3

Indexing and TryGetValue

In the first iteration of the MultiDictionary we followed the precedent from Linq’s AsLookup() with regards to the way the indexation into the MultiDictionary worked. In a regular Dictionary, if you attempt to index into a key that isn’t present you’ll get a KeyNotFoundException, but like AsLookup(), the MultiDictionary returned an empty list instead. This was mostly to match the functionality of the Lookup class that is conceptually similar to the MultiDictionary, but also because this behavior was more practically applicable to the kinds of things you’d be using the MultiDictionary.

With the behavior changes brought on by the MultiValueDictionary and the addition of the AsLookup() method, this old functionality doesn’t quite fit anymore. We heard feedback that this inconsistency between MultiDictionary and Dictionary was confusing, so the MultiValueDictionary will now throw a KeyNotFoundException when indexing on a key that isn’t present. We’ve also added a TryGetValue method to accommodate the new behavior.

var multiValueDictionary = new MultiValueDictionary<string, int>();
multiValueDictionary.Add("key", 1);
//multiValueDictionary["notkey"] throws a KeyNotFoundException
IReadOnlyCollection<int> collection = multiValueDictionary["key"];
multiValueDictionary.Add("key", 2);
//collection contains values [1,2]

Another related change with the MultiValueDictionary on the topic of the indexer is the return value. Previously we returned a mutable ICollection<TValue>. Adding and removing values from the returned ICollection<TValue> updated the MultiDictionary. While there are uses for this functionality, it can be unexpected and create unintentional coupling between parts of an application. To address this we’ve changed the return type to IReadOnlyCollection<TValue>. The read-only collection will still update with changes to the MultiValueDictionary.

When a List just doesn’t cut it

One limitation of the MultiDictionary was that internally, it used a Dictionary<TKey, List<TValue>> and there was no way to change the inner collection type. With the MultiValueDictionary we’ve added the ability to specify your own inner collection.

Showing a simple example of how they work is probably easier than trying to describe them first, so let’s do that.

var multiValueDictionary = MultiValueDictionary<string, int>.Create<HashSet<int>>();
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 1);
//multiDictionary["key"].Count == 1

Above, we replace the default List<TValue> with a HashSet<TValue>. As the examples show, HashSet combines duplicate TValues.

For every constructor there is a parallel generic static Create method that takes the same parameters but allows specification of the interior collection type. It’s important to point out that this doesn’t affect the return value of the indexer/TryValueValue though (they return very limited IReadOnlyCollections regardless of the inner collection type).

If you want a little bit more control over how your custom collection is instantiated, there are also the more specific Create methods that allow you to pass a delegate to specify the inner collection type:

var multiValueDictionary = MultiValueDictionary<string, int>.Create<HashSet<int>>(myHashSetFactory);
multiValueDictionary.Add("key", 1);
multiValueDictionary.Add("key", 1);
//multiValueDictionary["key"].Count == 1 

In either case, the specified collection type must implement ICollection<TValue> and must not have IsReadOnly set to true by default.

And that’s all!

You can download the new MultiValueDictionary from NuGet and try it out for yourself! If you have any questions or if you just want to give feedback, please leave a comment or contact us.

Category
.NET

Author

Immo Landwerth
Program Manager

Immo Landwerth is a program manager on the .NET Framework team at Microsoft. He specializes in API design, the base class libraries (BCL), and .NET Standard. He works on base class libraries which represents the core types of the .NET platform, such as string and int but also includes collections and IO. He's involved with portable class libraries and works on shipping more framework components in an out-of-band fashion via NuGet.

0 comments

Discussion are closed.