May 18th, 2022

0 reactions

Writing a sort comparison function, part 2: avoid unnecessary expense

Raymond Chen

Last time, we wrote a basic multi-level sort. I reiterate that the best way to do this is not to write your own multi-level comparison function but rather to rely on std::pair or std::tuple to do the work for you.

It may be that calculating one of the secondary keys is expensive, and you don’t want to do it unless it turns out to be necessary. In that case, you’ll have to break down the comparison manually into components. But don’t try to be clever about it. Just write the most obvious version:

// three-way comparison
int compare_3way_for_sorting(T const& a, T const& b)
{
    // First compare by name
    if (a.name < b.name) return -1;
    if (a.name > b.name) return +1;

    // Names are equal, check connector names
    auto&& a_connector = a.GetConnector();
    auto&& b_connector = b.GetConnector();

    if (a_connector.name < b_connector.name) return -1;
    if (a_connector.name > b_connector.name) return +1;

    // Names and connector names are equal,
    // check manufacturing date
    auto&& a_date = LookupManufacturingDate(a.part_number);
    auto&& b_date = LookupManufacturingDate(b.part_number);

    if (a_date < b_date) return -1;
    if (a_date > b_date) return +1;

    // All keys match
    return 0;
}

// less-than comparison
bool compare_less_for_sorting(T const& a, T const& b)
{
    // First compare by name
    if (a.name < b.name) return true;
    if (a.name > b.name) return false;

    // Names are equal, check connector names
    auto&& a_connector = a.GetConnector();
    auto&& b_connector = b.GetConnector();

    if (a_connector.name < b_connector.name) return true;
    if (a_connector.name > b_connector.name) return false;

    // Names and connector names are equal,
    // check manufacturing date
    auto&& a_date = LookupManufacturingDate(a.part_number);
    auto&& b_date = LookupManufacturingDate(b.part_number);

    if (a_date < b_date) return true;
    if (a_date > b_date) return false;

    // All keys match
    return false;
}

I’ve seen code that tried to do the multi-level comparison manually, but they were too clever and tried to cram it all into one line for style points, but messed it up. Resist the temptation to earn style points. Write the simplest, most straightforward code. Not only is it easier for humans to understand, it’s also easier for the compiler to understand.

Next time, we’ll look at how to do this with spaceships.

Bonus chatter: I considered embracing the “Don’t do this, just use tuples” principle by creating a delayed-comparison wrapper:

template<typename Lambda>
struct defer_comparison
{    
    defer_comparison(Lambda lambda) : key(std::move(lambda)){}
    Lambda key;

    auto operator<=>(defer_comparison const& other) const
        { return compare_3way(key(), other.key() ); }
};
 
auto key(T const& t)
{
    return std::make_tuple(std::ref(t.name),
                          defer_comparison([&] { return t.GetConnector(); }),
                          defer_comparison([&] { return LookupManufacturingDate(t.part_number); }));
}

std::weak_ordering
compare_3way_for_sorting(T const& a, T const& b)
{
    return key(a) <=> key(b);
}

bool compare_less_for_sorting(T const& a, T const& b)
{
    return key(a) < key(b);
}

However, this generates unnecessary calls to GetConnector() and LookupManufacturingsDate() because it breaks down as

bool compare_less_for_sorting(T const& a, T const& b)
{
    auto a_tuple = key(a);
    auto b_tuple = key(b);

    if (std::get<0>(a) < std::get<0>(b)) return true;
    if (std::get<0>(a) > std::get<0>(b)) return false;

    if (std::get<1>(a) < std::get<1>(b)) return true;
    if (std::get<1>(a) > std::get<1>(b)) return false;

    if (std::get<2>(a) < std::get<2>(b)) return true;
    if (std::get<2>(a) > std::get<2>(b)) return false;

    return false;
}

In our case, std::get<1>() returns a defer_comparison, so we end up comparing the same two defer_comparison objects twice.

We could try to solve this by using a std::async with deferred execution to memoize the result of the lambda, but this introduces extra memory allocations and virtual function tables and memory barriers, so it feels like we’d be heading in the wrong direction.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

5 comments

Discussion is closed. Login to edit/delete existing comments.

Neil Rashbrook May 19, 2022 · Edited
Whereas in JavaScript the || operator does exactly what you want, so you can just write
```
arr.sort((a, b) => key1(a) - key1(b) || key2(a) - key2(b) || key3(a) - key3(b));
```
R Samuel Klatchko May 19, 2022
Any reason you didn’t write compare_less_for_sorting using compare_3way_for_sorting:
```
bool compare_less_for_sorting(T const& a, T const& b)
{
    return compare_3way_for_sorting(a, b) == -1;
}
```
Patrick May 18, 2022 · Edited

I haven’t written real C++ code in about a decade. Is it just me or has C++ std::gotten<a>(std::ref([&] lot) harder to read <=> it used to?
- Me Gusta May 18, 2022
  
  The other option is that if you have been out of the loop for that length of time then you wouldn't have kept up with the changes. C++11 added a lot, C++14, 17 and 20 added more.
  Think about it, it is the same with English sentences. The ability to read it and how easy it is to read it is based upon your understanding of grammar and word meanings. If you come across unfamiliar words, if there is a sentence structure that you don't quite understand or if you don't get some grammatical construct then it makes reading harder.Read more
  The other option is that if you have been out of the loop for that length of time then you wouldn’t have kept up with the changes. C++11 added a lot, C++14, 17 and 20 added more.
  Think about it, it is the same with English sentences. The ability to read it and how easy it is to read it is based upon your understanding of grammar and word meanings. If you come across unfamiliar words, if there is a sentence structure that you don’t quite understand or if you don’t get some grammatical construct then it makes reading harder.
  I’d guess that this is the position you are in. If that is so then C++ would be harder to read when you see lambdas, tuples, r-value references and the 3 way comparison operator which was never part of C++98/03.
  
  Read less

紅樓鍮 May 18, 2022 · Edited

I considered... creating a delayed-comparison wrapper... we end up comparing the same two objects twice
You could use some kind of "heterogeneous generator" coroutine:
<code>
This would even have the advantage that intermediate results can be reused to compute multiple subkeys:
<code>

I considered… creating a delayed-comparison wrapper… we end up comparing the same two defer_comparison objects twice

You could use some kind of “heterogeneous generator” coroutine:

unsafe_heterogeneous_generator<
    const string & /* name */,
    const string & /* connector name */,
    year_month_day /* manufacturing date */>
keys(const T &t) {
  co_yield t.name;

  auto connector = t.GetConnector();
  co_yield connector.name;

  co_yield LookupManufacturingDate(t.part_number);
}

template <typename T>
weak_ordering
compare_3way_for_sorting(const T &a, const T &b) {
  auto result = weak_ordering::equivalent;
  zip(keys(a), keys(b)).for_each([&](auto kab) {
    result = kab.first <=> kab.second;
    if (result == weak_ordering::equivalent)
      return control_flow::continue_;
    else
      return control_flow::break_;
  });
  return result;
}

This would even have the advantage that intermediate results can be reused to compute multiple subkeys:

auto connector = t.GetConnector();
co_yield connector.name;
co_yield connector.LookupManufacturingDate(t.part_number);

Writing a sort comparison function, part 2: avoid unnecessary expense

Author

5 comments

Read next

Writing a sort comparison function, part 3: spaceships

Writing a sort comparison function, part 4: descending sort

Author

5 comments

Read next

Writing a sort comparison function, part 3: spaceships

Writing a sort comparison function, part 4: descending sort

Stay informed