New Feature? :: ThreadLocal.Values

We’ve been considering adding a Values property to System.Threading.ThreadLocal<T>. Values would return a collection of all current values from all threads (e.g. what you’d get if you evaluated Value from each thread). This would allow for easy aggregations, and in fact in our Parallel Extensions Extras we have a wrapper around ThreadLocal<T> called ReducationVariable<T> that exists purely to provide this functionality. For example:

var localResult = new ThreadLocal<int>(() => 0);
Parallel.For(0, N, i =>
{
localResult.Value += Compute(i);
});

int result = localResult.Values.Sum();

If you’re familiar with the Parallel Patterns Library (PPL) in Visual C++ 2010, this feature would make ThreadLocal<T> very similar in capability to the combinable class.

In .NET 4, it’s already possible to do aggregations using the thread-local data support that’s built in to the parallel loops.

Parallel.For(0, N,
   () => 0,
   (i, loopState, localResult) =>
   {
   return localResult + Compute(i);
   },
   localResult => Interlocked.Add(ref result, localResult));

This approach of using Parallel.For has less overhead than accessing the ThreadLocal<T> instance on each iteration, which is one of the reasons Parallel.For has the support built-in. However, there are some advantages to also having the ThreadLocal<T> approach available:

Fewer delegates to understand. Wrapping your head around three different delegates (and how data is passed between them) in a single method call can be tough. It may also be unintuitive that an interlocked operation is required for the final step (though this approach has performance benefits, as each thread gets to perform its final reduction in parallel).
Certain scenarios may enjoy less overhead. There is potentially a subtle performance issue with the Parallel.For approach, depending on why the local support is being used. In an effort to be fair to other users of the ThreadPool, the Tasks that service a parallel loop will periodically (every few hundred milliseconds) retire and reschedule replicas of themselves. In this way, the threads that were processing the loop’s tasks get a breather to optionally process work in other AppDomains if the ThreadPool deems it necessary. The ThreadPool may also choose to remove the thread from active duty if it believes the active thread count is too high. Consequently, the number of Tasks created to service the loop may be greater than the number of threads. In turn, the delegates that initialize and finalize/aggregate the local states will be executed more, because they are run for each new Task rather than each new Thread. Of course, this would only be an issue if the initializer and finalizer delegates are very expensive, but it’s worth noting that the ThreadLocal<T> approach does not suffer from this.
Usable in places where built-in local support isn’t available. You can do aggregations where we do not have built-in thread-local data support. For example, Parallel.Invoke does not provide local support.

Your input could help! If you’ve got a minute, feel free to answer the following questions and/or provide other thoughts:

Would you find writing aggregations easier with ThreadLocal<T>.Values compared to using the support built in to the parallel loops? If so, does the convenience make the feature worthwhile?
Do you need/want support for writing aggregations outside of parallel loops?
When you do aggregations, are the routines for initializing and finalizing local state expensive? Examples would be great.

Thanks!