PLINQ changes since the MSDN Magazine article
I posted about changes we’ve made to the Task Parallel Library since we published the MSDN Magazine article outlining its design. In this post, I’ll do the same thing for PLINQ.
Most of the October 2007 article on PLINQ is still accurate. After all, PLINQ is largely an implementation of the .NET Standard Query Operators, and thus its API is predominantly dictated by the API shipped in the .NET Framework 3.5. However, there are APIs exposed by PLINQ that are not part of the .NET Standard Query Operators (such as the most visible member of PLINQ, AsParallel), and there are also behavioral differences that manifest due to its very nature as a parallel implementation versus the sequential implementation that exists today. Since the MSDN article was published, there have been a few changes to both of these, and I’ll outline those differences here.
First, as with the Task Parallel Library, PLINQ is not contained in System.Concurrency.dll as is claimed by the article. We’ve since moved all of this functionality to the System.Threading.dll assembly, which is what you’ll need to reference to access PLINQ functionality in your .NET projects as of this CTP.
Next, the article shows the AsParallel extension method as living on the System.Linq.ParallelEnumerable class. This method has migrated to the new System.Linq.ParallelQuery class. Of course, given its primary use as an extension method for IEnumerable<T>, and given that ParallelQuery is in the same namespace and assembly as ParallelEnumerable, the change in declaring type doesn’t really affect the programming model, as the change isn’t at all visible in code using AsParallel as an extension method.
Moving on, the article states that “parallelism doesn’t get introduced until you start processing the output of the query. If you’re familiar with IEnumerable<T>, this equates to calling the GetEnumerator method.” Of course, if you’re very familiar with LINQ, you’ll know that the query isn’t actually evaluated until MoveNext() is called on the IEnumerator<T> returned from GetEnumerator. PLINQ behaves in the same way… none of the data in the query will be processed until MoveNext() is called.
The next change is important to keep in mind. Here’s the original MSDN article text:
“…pipelined processing, in which case the thread doing
the enumeration is separate from the threads devoted to running
the query. PLINQ will use many threads for query execution, but will reduce the degree of parallelism by one so that the enumerating thread is not interfered with. For instance, if you have eight processors available, seven of them will run the PLINQ query while the remaining processor runs the foreach loop on the output of the PLINQ query as elements become available. This carries the benefit of allowing more incremental processing of output, thus reducing the memory requirements necessary to hold the results; however, having many producer threads and only a single consumer thread can often lead to uneven work distribution, resulting in processor inefficiency.”
This was true at the time the article was written, but for a variety of reasons we’ve changed the behavior since. Now, when using pipelining (the default when iterating over a query with foreach) we don’t decrease the degree of parallelism. If you still want such behavior, you can take advantage of the AsParallel overload that accepts the degree of parallelism as an argument, and provide to it Environment.ProcessorCount-1. Why did we change this? The reasoning described in the article is sound, however there were other competing factors that, in the end, we decided were more important. As an example of one of these, consider the computer you’re currently reading this on… is it a dual-core? A quad-core? One of those is likely. And if you were to iterate over a PLINQ query with the aforementioned pipelining behavior on a dual-core machine, guess how many threads would be used to process the query. One. That’s very likely not what you’d expect from a parallel LINQ implementation.
Next (and very minor), the article describes an overload of IParallelEnumerable.GetEnumerator that accepts a Boolean parameter named pipelined that determines whether pipelining will be used when executing the query. This parameter has been renamed to usePipelining. Note, however, that we’ve already received feedback about this design (e.g. using a Boolean argument to enable/disable pipelining), and we’ll be reevaluating it in the future.
In the section of the article covering concurrent exceptions, System.Concurrency.MultipleFailuresException is mentioned. This exception type still exists and behaves as is described in the article, however it’s been renamed. The exception type is now System.Threading.AggregateException and is the same exception type used for this purpose across all of the Parallel Extensions to the .NET Framework. Additionally, the article mentions that the InnerExceptions property is an array of Exception instances; the AggregateException.InnerExceptions property is now a ReadOnlyCollection<Exception>.
When discussing order preservation in PLINQ queries, the article mentions that QueryOptions.PreserveOrdering can be provided as a parameter to the AsParallel method. This capability still exists, but the name has been changed slightly to ParallelQueryOptions.PreserveOrdering.
I believe those are the primary API/behavioral differences between the article and the current CTP release of PLINQ. If you find any other significant differences, please us know, and we’ll augment the information provided here.