Writing a sort comparison function, part 2: avoid unnecessary expense

Raymond Chen

Last time, we wrote a basic multi-level sort. I reiterate that the best way to do this is not to write your own multi-level comparison function but rather to rely on std::pair or std::tuple to do the work for you.

It may be that calculating one of the secondary keys is expensive, and you don’t want to do it unless it turns out to be necessary. In that case, you’ll have to break down the comparison manually into components. But don’t try to be clever about it. Just write the most obvious version:

// three-way comparison
int compare_3way_for_sorting(T const& a, T const& b)
{
    // First compare by name
    if (a.name < b.name) return -1;
    if (a.name > b.name) return +1;

    // Names are equal, check connector names
    auto&& a_connector = a.GetConnector();
    auto&& b_connector = b.GetConnector();

    if (a_connector.name < b_connector.name) return -1;
    if (a_connector.name > b_connector.name) return +1;

    // Names and connector names are equal,
    // check manufacturing date
    auto&& a_date = LookupManufacturingDate(a.part_number);
    auto&& b_date = LookupManufacturingDate(b.part_number);

    if (a_date < b_date) return -1;
    if (a_date > b_date) return +1;

    // All keys match
    return 0;
}

// less-than comparison
bool compare_less_for_sorting(T const& a, T const& b)
{
    // First compare by name
    if (a.name < b.name) return true;
    if (a.name > b.name) return false;

    // Names are equal, check connector names
    auto&& a_connector = a.GetConnector();
    auto&& b_connector = b.GetConnector();

    if (a_connector.name < b_connector.name) return true;
    if (a_connector.name > b_connector.name) return false;

    // Names and connector names are equal,
    // check manufacturing date
    auto&& a_date = LookupManufacturingDate(a.part_number);
    auto&& b_date = LookupManufacturingDate(b.part_number);

    if (a_date < b_date) return true;
    if (a_date > b_date) return false;

    // All keys match
    return false;
}

I’ve seen code that tried to do the multi-level comparison manually, but they were too clever and tried to cram it all into one line for style points, but messed it up. Resist the temptation to earn style points. Write the simplest, most straightforward code. Not only is it easier for humans to understand, it’s also easier for the compiler to understand.

Next time, we’ll look at how to do this with spaceships.

Bonus chatter: I considered embracing the “Don’t do this, just use tuples” principle by creating a delayed-comparison wrapper:

template<typename Lambda>
struct defer_comparison
{    
    defer_comparison(Lambda lambda) : key(std::move(lambda)){}
    Lambda key;

    auto operator<=>(defer_comparison const& other) const
        { return compare_3way(key(), other.key() ); }
};
 
auto key(T const& t)
{
    return std::make_tuple(std::ref(t.name),
                          defer_comparison([&] { return t.GetConnector(); }),
                          defer_comparison([&] { return LookupManufacturingDate(t.part_number); }));
}

std::weak_ordering
compare_3way_for_sorting(T const& a, T const& b)
{
    return key(a) <=> key(b);
}

bool compare_less_for_sorting(T const& a, T const& b)
{
    return key(a) < key(b);
}

However, this generates unnecessary calls to Get­Connector() and Lookup­ManufacturingsDate() because it breaks down as

bool compare_less_for_sorting(T const& a, T const& b)
{
    auto a_tuple = key(a);
    auto b_tuple = key(b);

    if (std::get<0>(a) < std::get<0>(b)) return true;
    if (std::get<0>(a) > std::get<0>(b)) return false;

    if (std::get<1>(a) < std::get<1>(b)) return true;
    if (std::get<1>(a) > std::get<1>(b)) return false;

    if (std::get<2>(a) < std::get<2>(b)) return true;
    if (std::get<2>(a) > std::get<2>(b)) return false;

    return false;
}

In our case, std::get<1>() returns a defer_comparison, so we end up comparing the same two defer_comparison objects twice.

We could try to solve this by using a std::async with deferred execution to memoize the result of the lambda, but this introduces extra memory allocations and virtual function tables and memory barriers, so it feels like we’d be heading in the wrong direction.

5 comments

Discussion is closed. Login to edit/delete existing comments.

  • 紅樓鍮 0

    I considered… creating a delayed-comparison wrapper… we end up comparing the same two defer_comparison objects twice

    You could use some kind of “heterogeneous generator” coroutine:

    unsafe_heterogeneous_generator<
        const string & /* name */,
        const string & /* connector name */,
        year_month_day /* manufacturing date */>
    keys(const T &t) {
      co_yield t.name;
    
      auto connector = t.GetConnector();
      co_yield connector.name;
    
      co_yield LookupManufacturingDate(t.part_number);
    }
    
    template <typename T>
    weak_ordering
    compare_3way_for_sorting(const T &a, const T &b) {
      auto result = weak_ordering::equivalent;
      zip(keys(a), keys(b)).for_each([&](auto kab) {
        result = kab.first <=> kab.second;
        if (result == weak_ordering::equivalent)
          return control_flow::continue_;
        else
          return control_flow::break_;
      });
      return result;
    }

    This would even have the advantage that intermediate results can be reused to compute multiple subkeys:

    auto connector = t.GetConnector();
    co_yield connector.name;
    co_yield connector.LookupManufacturingDate(t.part_number);
  • Patrick 0

    I haven’t written real C++ code in about a decade. Is it just me or has C++ std::gotten<a>(std::ref([&] lot) harder to read <=> it used to?

    • Me Gusta 0

      The other option is that if you have been out of the loop for that length of time then you wouldn’t have kept up with the changes. C++11 added a lot, C++14, 17 and 20 added more.
      Think about it, it is the same with English sentences. The ability to read it and how easy it is to read it is based upon your understanding of grammar and word meanings. If you come across unfamiliar words, if there is a sentence structure that you don’t quite understand or if you don’t get some grammatical construct then it makes reading harder.
      I’d guess that this is the position you are in. If that is so then C++ would be harder to read when you see lambdas, tuples, r-value references and the 3 way comparison operator which was never part of C++98/03.

  • R Samuel Klatchko 0

    Any reason you didn’t write compare_less_for_sorting using compare_3way_for_sorting:

    bool compare_less_for_sorting(T const& a, T const& b)
    {
        return compare_3way_for_sorting(a, b) == -1;
    }
  • Neil Rashbrook 0

    Whereas in JavaScript the || operator does exactly what you want, so you can just write

    arr.sort((a, b) => key1(a) - key1(b) || key2(a) - key2(b) || key3(a) - key3(b));

Feedback usabilla icon