Taking advantage of the ordering guarantees of the LINQ GroupBy method

Raymond Chen

Raymond

A customer wanted to group a set of data by one field, and within each group, sort the data by another field, and then sort the groups by that second field.

For example, given the following data set:

NameTime
Charles11
Charles21
Alice20
Charles23
Alice29
Alice13
Charles17
Bob20
Alice13
Bob12
Alice26
Bob18
Charles18
Bob28
Alice23
Bob13

We group by name:

NameTime
Alice20
Alice29
Alice13
Alice13
Alice26
Alice23
Bob20
Bob12
Bob18
Bob28
Bob13
Charles11
Charles21
Charles23
Charles17
Charles18

And then we sort each person’s time, shortest first.

NameTime
Alice13
Alice13
Alice20
Alice23
Alice26
Alice29
Bob12
Bob13
Bob18
Bob20
Bob28
Charles11
Charles17
Charles18
Charles21
Charles23

And then we sort the people by their best time. Charles’s best time is 11 seconds, which is best overall, so his times go first. Bob’s best time is 12 seconds, so his group goes next. Alice’s best time is 13 seconds, so her group is last.

NameTime
Charles11
Charles17
Charles18
Charles21
Charles23
Bob12
Bob13
Bob18
Bob20
Bob28
Alice13
Alice13
Alice20
Alice23
Alice26
Alice29

So we have a three-step LINQ query, where we group, and then sort each group, and then sort the groups.

var results =
    data.GroupBy(x => x.Name) // group by name
        .Select(g => g.OrderBy(x => x.Time)); // sort each group
        .OrderBy(g => g.First()) // sort the groups by best time
        .SelectMany(g => g);  // flatten the groups

The last step is to use SelectMany to convert the groups back into their individual members. This takes advantage of the fact that IGrouping<TKey, out TElement>, derives from IEnumerable<TElement>, so you can use the group as a collection.

But you can reduce this to a two-step operation: First sort globally by time, and then group them. The Group­By method is documented as reporting the groups in the order of first appearance, so this ensures that the fastest group comes first.

var results =
    data.OrderBy(x => x.Time) // sort globally by time
        .GroupBy(x => x.Name) // group by name (best time first)
        .SelectMany(g => g);  // flatten the groups

It does slightly more work than the three-step query because it sorts the entire collection, even though it needed only to sort each group. But it looks slicker, and might even be easier to understand. Provided you understand that the grouping is stable.

Raymond Chen
Raymond Chen

Follow Raymond