November 8th, 2011

ConcurrentDictionary Performance Improvements in .NET 4.5

ConcurrentDictionary is a popular concurrent data structure that was introduced in .NET 4. In the .NET 4.5 release, ConcurrentDictionary gets two performance improvements.

One optimization is related to the way ConcurrentDictionary avoids torn reads and writes. To explain the background, all reference types and some value types are guaranteed to be read and written atomically by the Common Language Runtime, as defined in section 12.6.6 of the ECMA CLI specification. However, large value types such as System.Guid are not read and written atomically. So, if a thread writes values into a System.Guid field while another thread reads the values, the reader thread might observe a partly updated value that consists of bytes from the ‘old’ value mixed with bytes from the ‘new’ value. The reason behind this is that a write or a read of a System.Guid value is not atomic on X86/X64 machines.

To avoid torn reads and writes, the .NET 4 ConcurrentDictionary wraps all values in a node object. To modify a value associated with some key, ConcurrentDictionary allocates a new node object.

In .NET 4.5, ConcurrentDictionary<TKey,TValue> now avoids reallocating nodes on updates if TValue is a reference type or a small primitive value type (Int32, Single, Byte, etc.)

The example below demonstrates the improvement:

static void Example1(int key)
{
    var d = new ConcurrentDictionary<int,int>();
    for(int i = 0; i<10000000; i++)
    {
        d[key] = i;
    }
}

Example1(0) executes about 10% faster in .NET 4.5 than it did in .NET 4.

The difference becomes more prominent when multiple threads are involved:

static void Example2()
{
    Parallel.Invoke(
        () => Example1(0),
        () => Example1(1),
        () => Example1(2),
        () => Example1(3));
}

Example2() calls Example1() on four different threads with different key values. In .NET 4.5, Example2 executes about 30-40% faster than it did .NET 4.

Another ConcurrentDictionary optimization that we introduced in .NET 4.5 is dynamic addition of locks. In .NET 4, the number of locks that protect a ConcurrentDictionary instance is fixed over the lifetime of the instance. By default, the number of locks is based on the number of processors, or alternatively, the user can provide a custom concurrency level.

In practice, a large number of locks is often desirable for maximum throughput. On the other hand, we don’t want to allocate too many lock objects, especially if the ConcurrentDictionary only ends up storing only a small number of items.

In .NET 4.5, ConcurrentDictionary will automatically add more locks as more values are stored in the dictionary. (Note that ConcurrentDictionary will not add more locks if the instance was initialized using one of the constructors that accept a concurrencyLevel argument.)

This example demonstrates the benefit of dynamic lock addition:

static void Example3()
{
    var dict = new ConcurrentDictionary<int, int>();
    Action<int> helper = constant =>
    {
        for (int i = 0; i < 10000000; i++)
        {
            dict[(constant + i) % 100000] = i;
        }
    };

    Parallel.Invoke(
        () => helper(0),
        () => helper(10000),
        () => helper(20000),
        () => helper(30000));
}

This example executes approximately 2.5 – 3X faster in .NET 4.5 compared to .NET 4 and demonstrates the combined effect of both optimizations (cheaper atomic updates and dynamic number of locks) acting together.

0 comments

Discussion are closed.