October 30th, 2009

PLINQ Queries That Run Sequentially

The goal of PLINQ is to execute computationally intensive LINQ to Objects queries efficiently by splitting up the work across multiple cores on multi-core machines. However, not all queries are equally appropriate for parallelism.

Usually, the best way to use PLINQ is to write short, simple queries with an expensive delegate. This is one example of such query:

    var q = src.AsParallel()

        .Where(x => ExpensiveFilter(x));

    foreach(var x in q) { … }

And here is another:

    int sum = src.AsParallel()
        .Select(x => ExpensiveFunc(x))
        .Sum();

One design goal behind PLINQ is maximum parity with LINQ to Objects. So, you can combine LINQ operators in all kinds of ways, and PLINQ will correctly execute the query. However, performance characteristics get trickier as queries get more complex.

In some cases, PLINQ may decide to run the query in its entirety or in part sequentially. For example, this query will execute sequentially up to and including the TakeWhile operator:

    src.AsParallel()

        .Select(x => Foo(x))

        .TakeWhile(x => Filter(x))

        .Select(x => Bar(x))

        .ToArray();

The TakeWhile operator introduces a tricky sequential dependency – whether an element is or isn’t included in the output depends on the result of Filter() on all previous elements in the sequence. There are various ways to execute this query partly in parallel that take different trade-offs. Depending on the selectivity of Filter and the costs of Foo, Bar and Filter, there are different algorithms which may be appropriate.

Whether a particular query executes in part sequentially depends on the combination of operators in the query. The precise rules are fairly complex, but they can be summarized in a simple way. If a query contains one of these operators, PLINQ may decide to run it sequentially:

·         First, Last

·         Take, Skip

·         TakeWhile, SkipWhile

·         Concat

·         Special overloads of Select, Where and SelectMany that pass position indices into the user delegate

·         ElementAt

·         Zip

For example, if the Take operator follows a Where operator, PLINQ will execute the Where and the Take sequentially by default. Filtering followed by a Take is a tricky operation to parallelize – different algorithms are appropriate depending on the selectivity of the filter, the size of the argument passed to Take, and other details. However, if the Take is applied straight to an array (or an array followed by a Select), PLINQ can handle the Take operator efficiently simply by restricting execution to a section of the array.

Also, in some queries, SelectMany causes the part of the query that comes before the SelectMany to run sequentially. I haven’t seen a practical example where this would be an issue, though. SelectMany produces multiple output elements for each input, and the part of the query before the SelectMany is normally not the computationally expensive part.

You can prevent PLINQ from falling back to sequential execution by turning on the ForceParallelism mode. In the ForceParallelism mode, PLINQ will always use parallel algorithms to execute the query, even if potentially expensive algorithms will be used. This is desirable if the query contains an expensive delegate:

    src.AsParallel()

        .WithExecutionMode(
            ParallelExecutionMode.ForceParallelism)

        .Select(x => Foo(x))

        .TakeWhile(x => Filter(x))

        .Select(x => Bar(x))

        .ToArray();

Alternatively, you can try breaking up the query so that only the simple but computationally expensive part of the query is done in PLINQ, and the rest of the processing is done in LINQ to Objects:

    src.Select(x => Foo(x))

        .TakeWhile(x => Filter(x))

        .AsParallel() // <- only parallelize here

        .Select(x => Bar(x))

        .ToArray();

This implementation will scale well if the calls to Bar() represent the bulk of the work in the query. This solution is often preferable, as the part of the query that executes in parallel is clearly marked, and thus the code is easier to understand.

Author

0 comments

Discussion are closed.

Feedback