November 29th, 2018

Taking advantage of the ordering guarantees of the LINQ GroupBy method

A customer wanted to group a set of data by one field, and within each group, sort the data by another field, and then sort the groups by that second field.

For example, given the following data set:

Name Time
Charles 11
Charles 21
Alice 20
Charles 23
Alice 29
Alice 13
Charles 17
Bob 20
Alice 13
Bob 12
Alice 26
Bob 18
Charles 18
Bob 28
Alice 23
Bob 13

We group by name:

Name Time
Alice 20
Alice 29
Alice 13
Alice 13
Alice 26
Alice 23
Bob 20
Bob 12
Bob 18
Bob 28
Bob 13
Charles 11
Charles 21
Charles 23
Charles 17
Charles 18

And then we sort each person’s time, shortest first.

Name Time
Alice 13
Alice 13
Alice 20
Alice 23
Alice 26
Alice 29
Bob 12
Bob 13
Bob 18
Bob 20
Bob 28
Charles 11
Charles 17
Charles 18
Charles 21
Charles 23

And then we sort the people by their best time. Charles’s best time is 11 seconds, which is best overall, so his times go first. Bob’s best time is 12 seconds, so his group goes next. Alice’s best time is 13 seconds, so her group is last.

Name Time
Charles 11
Charles 17
Charles 18
Charles 21
Charles 23
Bob 12
Bob 13
Bob 18
Bob 20
Bob 28
Alice 13
Alice 13
Alice 20
Alice 23
Alice 26
Alice 29

So we have a three-step LINQ query, where we group, and then sort each group, and then sort the groups.

var results =
    data.GroupBy(x => x.Name) // group by name
        .Select(g => g.OrderBy(x => x.Time)); // sort each group
        .OrderBy(g => g.First()) // sort the groups by best time
        .SelectMany(g => g);  // flatten the groups

The last step is to use SelectMany to convert the groups back into their individual members. This takes advantage of the fact that IGrouping<TKey, out TElement>, derives from IEnumerable<TElement>, so you can use the group as a collection.

But you can reduce this to a two-step operation: First sort globally by time, and then group them. The Group­By method is documented as reporting the groups in the order of first appearance, so this ensures that the fastest group comes first.

var results =
    data.OrderBy(x => x.Time) // sort globally by time
        .GroupBy(x => x.Name) // group by name (best time first)
        .SelectMany(g => g);  // flatten the groups

It does slightly more work than the three-step query because it sorts the entire collection, even though it needed only to sort each group. But it looks slicker, and might even be easier to understand. Provided you understand that the grouping is stable.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

0 comments

Discussion are closed.

Feedback