March 2nd, 2020

The Performance Benefits of Final Classes

Sy Brand
C++ Developer Advocate

The final specifier in C++ marks a class or virtual member function as one which cannot be derived from or overriden. For example, consider the following code: 

 struct base { 
  virtual void f() const = 0; 
}; 
 
struct derived final : base { 
  void f() const override {} 
};

If we attempt to write a new class which derives from `derived` then we get a compiler error: 

struct oh_no : derived { 
};
<source>(9): error C3246: 'oh_no': cannot inherit from 'derived' as it has been declared as 'final'
<source>(5): note: see declaration of 'derived'

The final specifier is useful for expressing to readers of the code that a class is not to be derived from and having the compiler enforce this, but it can also improve performance through aiding devirtualization. 

Devirtualization 

Virtual functions require an indirect call through the vtable, which is more expensive than a direct call due to interactions with branch prediction and the instruction cache, and also the prevention of further optimizations which could be carried out after inlining the call.  

Devirtualization is a compiler optimization which attempts to resolve virtual function calls at compile time rather than runtime. This eliminates all the issues noted above, so it can greatly improve the performance of code which uses many virtual calls1. 

Here is a minimal example of devirtualization: 

struct dog { 
  virtual void speak() { 
    std::cout << "woof"; 
  } 
}; 


int main() { 
  dog fido; 
  fido.speak(); 
}

In this code, even though dog::speak is a virtual function, the only possible result of main is to output ”woof”. If you look at the compiler output you’ll see that MSVC, GCC, and Clang all recognize this and inline the definition of dog::speak into main, avoiding the need for an indirect call. 

The Benefit of final 

The final specifier can provide the compiler with more opportunities for devirtualization by helping it identify more cases where virtual calls can be resolved at compile time. Coming back to our original example: 

struct base { 
  virtual void f() const = 0; 
}; 
 
struct derived final : base { 
  void f() const override {} 
};

Consider this function: 

void call_f(derived const& d) { 
  d.f(); 
}

Since derived is marked final the compiler knows it cannot be derived from further. This means that the call to f will only ever call derived::f, so the call can be resolved at compile time. As proof, here is the compiler output for call_f on MSVC when derived or derived::f are marked as final: 

ret 0 

You can see that the derived::f has been inlined into the definition of call_f. If we were to take the final specifier off the definition, the assembly would look like this: 

mov rax, QWORD PTR [rcx] 
rex_jmp QWORD PTR [rax]

This code loads the vtable from d, then makes an indirect call to derived::f through the function pointer stored at the relevant location. 

The cost of a pointer load and jump may not look like much since it’s just two instructions, but remember that this may involve a branch misprediction and/or instruction cache miss, which would result in a pipeline stall. Furthermore, if there was more code in call_f or functions which call it, the compiler may be able to optimize it much more aggressively given the full visibility of the code which will be executed and the additional analysis which this enables. 

Conclusion 

Marking your classes or member functions as final can improve the performance of your code by giving the compiler more opportunities to resolve virtual calls at compile time. 

Consider if there are any places in your codebases which would benefit from this and measure the impact!  

 

1 http://assemblyrequired.crashworks.org/how-slow-are-virtual-functions-really/ 

  https://sites.cs.ucsb.edu/~urs/oocsb/papers/oopsla96.pdf 

  https://stackoverflow.com/questions/449827/virtual-functions-and-performance-c 

Author

Sy Brand
C++ Developer Advocate

Sy Brand is Microsoft’s C++ Developer Advocate. Their background is in compilers and debuggers for embedded accelerators, but they’re also interested in generic library design, metaprogramming, functional-style C++, undefined behaviour, and making our communities more inclusive and welcoming.

8 comments

Discussion is closed. Login to edit/delete existing comments.

  • Andrey Koshelev

    Is it sufficient to mark class as final to get potential benefits of devirtualization, or should be every virtual function inside this class marked as final as well?

  • Andy Webber

    Recently we discussed using final to discourage people from deriving from a type that was never intended to be a base class. One of our colleagues discouraged that saying that adding final to a class with no virtual functions would have the effect of creating a vtable. Do you know if this is really the case?

    For instance, imagine adding final to std::string as an indication that users should not derive from it. Yes, I know...

    Read more
    • Me Gusta

      First, and most importantly, this post is talking about benefits for classes that already have virtual functions. If you are avoiding the use of virtual functions in your class hierarchy then the use of final will not introduce any of these benefits because the compiler will be doing direct function calls anyway.
      But anyway, when doing side by side tests, I don't see any class size differences at all. Two classes/structs with the exact same...

      Read more
      • Andy Webber

        Thanks. That makes sense. I hadn't tried it myself yet, but it went against my intuition that marking the class itself final would necessitate a vtable. So I think it's correct to say that final itself has no bearing on vtable construction. In fact, 12.3 in the C++17 standard simply says that deriving from a base class that is marked as final is ill-formed.

        In the case of member functions, I agree that final implies virtual...

        Read more
      • Tanveer Badar

        Also consider what happens when you derive from a type which does not have a virtual destructor. You will not be able to provide proper RAII for any derived types when using dynamically allocated objects. No derived destructor will run for such objects if they were ever to be upcast to base pointer/reference and then went out of scope.

        The absence of a virtual destructor itself is a big indicator that the type is not meant...

        Read more
  • Yehezkel Bernat

    Nice article.
    Just please fix the dog example. Virtual functions aren’t relevant anyway when the call is done by an actual object. You have to use a pointer or reference for virtual call to be considered.