{"id":25640,"date":"2020-03-02T15:07:18","date_gmt":"2020-03-02T15:07:18","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=25640"},"modified":"2020-03-02T15:10:02","modified_gmt":"2020-03-02T15:10:02","slug":"the-performance-benefits-of-final-classes","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/the-performance-benefits-of-final-classes\/","title":{"rendered":"The Performance Benefits of Final Classes"},"content":{"rendered":"<p><span data-contrast=\"auto\">The <code>final <\/code><\/span><span data-contrast=\"auto\">specifier<\/span><span data-contrast=\"auto\"> in C++<\/span> <span data-contrast=\"auto\">marks a <\/span><span data-contrast=\"auto\">class <\/span><span data-contrast=\"auto\">or virtual member function <\/span><span data-contrast=\"auto\">as one which cannot be derived from<\/span><span data-contrast=\"auto\"> or <\/span><span data-contrast=\"auto\">overriden<\/span><span data-contrast=\"auto\">. For example, consider the following code:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true \">\u00a0struct base {\u00a0\r\n\u00a0 virtual void f() const = 0;\u00a0\r\n};\u00a0\r\n\u00a0\r\nstruct derived final : base {\u00a0\r\n\u00a0 void f() const override {}\u00a0\r\n};<\/pre>\n<p><span data-contrast=\"auto\">If we attempt to write a new class which derives from `derived` then we get a compiler error:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">struct oh_no : derived {\u00a0\r\n};<\/pre>\n<pre class=\"lang:default highlight:0 decode:true\">&lt;source&gt;(9): error C3246: 'oh_no': cannot inherit from 'derived' as it has been declared as 'final'\r\n&lt;source&gt;(5): note: see declaration of 'derived'<\/pre>\n<p><span data-contrast=\"auto\">The <code>final<\/code> <\/span><span data-contrast=\"auto\">specifier<\/span><span data-contrast=\"auto\"> is useful for expressing to readers of the code that a class is not to be derived from and having the compiler enforce this, but it can also improve performance through aiding <\/span><i><span data-contrast=\"auto\">devirtualization<\/span><\/i><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h2><span data-contrast=\"none\">Devirtualization<\/span><span data-ccp-props=\"{&quot;335559738&quot;:240}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Virtual functions require an indirect call through the <\/span><span data-contrast=\"auto\">vtable<\/span><span data-contrast=\"auto\">, which is more expensive than a direct call<\/span><span data-contrast=\"auto\"> due to <\/span><span data-contrast=\"auto\">interactions with <\/span><span data-contrast=\"auto\">branch prediction and <\/span><span data-contrast=\"auto\">the instruction cache, <\/span><span data-contrast=\"auto\">and also <\/span><span data-contrast=\"auto\">the prevention of <\/span><span data-contrast=\"auto\">further optimizations which could be <\/span><span data-contrast=\"auto\">carried out after<\/span> <span data-contrast=\"auto\">inlining<\/span><span data-contrast=\"auto\"> the<\/span><span data-contrast=\"auto\"> call<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Devirtualization<\/span> <span data-contrast=\"auto\">is a compiler optimization which attempts to resolve virtual function calls at compile time rather than runtime. <\/span><span data-contrast=\"auto\">This<\/span> <span data-contrast=\"auto\">eliminates <\/span><span data-contrast=\"auto\">all<\/span><span data-contrast=\"auto\"> the <\/span><span data-contrast=\"auto\">issues <\/span><span data-contrast=\"auto\">noted above, <\/span><span data-contrast=\"auto\">so <\/span><span data-contrast=\"auto\">it <\/span><span data-contrast=\"auto\">can<\/span><span data-contrast=\"auto\"> greatly improve the performance of code which uses many virtual calls<\/span><sup><span data-contrast=\"auto\">1<\/span><\/sup><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Here is a minimal example of <\/span><span data-contrast=\"auto\">devirtualization<\/span><span data-contrast=\"auto\">:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">struct dog {\u00a0\r\n\u00a0 virtual void speak() {\u00a0\r\n\u00a0\u00a0\u00a0 std::cout &lt;&lt; \"woof\";\u00a0\r\n\u00a0 }\u00a0\r\n};\u00a0\r\n\r\n\r\nint main() {\u00a0\r\n\u00a0 dog fido;\u00a0\r\n\u00a0 fido.speak();\u00a0\r\n}<\/pre>\n<p><span data-contrast=\"none\">In this code, even though <\/span><code><span data-contrast=\"none\">dog::<\/span><\/code><span data-contrast=\"none\"><code>speak<\/code> is a virtual function, the only possible result of <code>main<\/code> is to output <code>\u201dwoof\u201d<\/code><\/span><span data-contrast=\"none\">. If you look at the <\/span><a href=\"https:\/\/godbolt.org\/z\/_ZJqvN\"><span data-contrast=\"none\">compiler output<\/span><\/a><span data-contrast=\"none\"> you\u2019ll see that MSVC, GCC, and Clang all recognize this and inline the definition of <code>dog::speak<\/code> into <code>main<\/code>, avoiding the need for a<\/span><span data-contrast=\"none\">n indirect<\/span><span data-contrast=\"none\"> call.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h1 aria-level=\"1\"><span data-contrast=\"none\">The Benefit of <\/span><code><span data-contrast=\"none\">f<\/span><span data-contrast=\"none\">inal<\/span><\/code><span data-ccp-props=\"{&quot;335559738&quot;:240}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">The <code>final<\/code> <\/span><span data-contrast=\"auto\">specifier<\/span><span data-contrast=\"auto\"> can provide the compiler with more opportunities for <\/span><span data-contrast=\"auto\">devirtualization<\/span><span data-contrast=\"auto\"> by helping it identify more cases where virtual calls can be resolved at compile time. Coming back to our original example:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">struct base {\u00a0\r\n\u00a0 virtual void f() const = 0;\u00a0\r\n};\u00a0\r\n\u00a0\r\nstruct derived final : base {\u00a0\r\n\u00a0 void f() const override {}\u00a0\r\n};<\/pre>\n<p><span data-contrast=\"auto\">Consider this function:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">void call_f(derived const&amp; d) {\u00a0\r\n\u00a0 d.f();\u00a0\r\n}<\/pre>\n<p><span data-contrast=\"none\">Since <code>derived<\/code> is marked <code>final<\/code><\/span><span data-contrast=\"none\"> the compiler knows it cannot be derived from further<\/span><span data-contrast=\"none\">. This means that the call to <code>f<\/code> will only ever call <\/span><code><span data-contrast=\"none\">derived::<\/span><\/code><span data-contrast=\"none\"><code>f<\/code>, so the call can be resolved at compile time. As proof, here is the compiler output for <code>call_f<\/code> on MSVC when <code>derived<\/code> or <\/span><code><span data-contrast=\"none\">derived::<\/span><\/code><span data-contrast=\"none\"><code>f<\/code> are marked as <code>final<\/code>:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default highlight:0 decode:true\">ret 0\u00a0\r\n<\/pre>\n<p><span data-contrast=\"none\">You can see that the <\/span><code><span data-contrast=\"none\">derived::<\/span><\/code><span data-contrast=\"none\"><code>f<\/code> has been <\/span><span data-contrast=\"none\">inlined<\/span><span data-contrast=\"none\"> into the definition of <\/span><code><span data-contrast=\"none\">call_f<\/span><\/code><span data-contrast=\"none\">. If we were to take the <code>final<\/code> specifier off the definition, the<\/span><span data-contrast=\"none\"> assembly would look<\/span><span data-contrast=\"none\"> like this:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default highlight:0 decode:true \">mov rax, QWORD PTR [rcx]\u00a0\r\nrex_jmp QWORD PTR [rax]<\/pre>\n<p><span data-contrast=\"auto\">This code loads the <\/span><span data-contrast=\"auto\">vtable<\/span><span data-contrast=\"auto\"> from <code>d<\/code>, then makes an indirect call to <\/span><code><span data-contrast=\"auto\">derived::<\/span><\/code><span data-contrast=\"auto\"><code>f<\/code> through the function pointer stored at the relevant location.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The cost of a pointer load and jump may not <\/span><span data-contrast=\"auto\">look<\/span><span data-contrast=\"auto\"> like<\/span><span data-contrast=\"auto\"> much<\/span><span data-contrast=\"auto\"> since it\u2019s just two instructions, but remember that this may involve a branch misprediction and\/or instruction cache miss, which would result in a pipeline stall. Furthermore, if there was more code in <\/span><code><span data-contrast=\"auto\">call_f<\/span><\/code><span data-contrast=\"auto\"> or functions which call it, the compiler may be able to optimize it much more aggressively given the full visibility of the code which will be executed and the additional analysis which this enables.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h1 aria-level=\"1\"><span data-contrast=\"none\">Conclusion<\/span><span data-ccp-props=\"{&quot;335559738&quot;:240}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Marking your classes or member functions as <code>final<\/code> <\/span><span data-contrast=\"auto\">can improve<\/span><span data-contrast=\"auto\"> the performance of your code by giving the compiler more opportunities to resolve virtual calls at compile time.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Consider if there are any places in your codebases which would benefit from this and measure the impact!<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span data-contrast=\"auto\">1<\/span> <a href=\"http:\/\/assemblyrequired.crashworks.org\/how-slow-are-virtual-functions-really\/\"><span data-contrast=\"none\">http:\/\/assemblyrequired.crashworks.org\/how-slow-are-virtual-functions-really\/<\/span><\/a><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 <\/span><a href=\"https:\/\/sites.cs.ucsb.edu\/~urs\/oocsb\/papers\/oopsla96.pdf\"><span data-contrast=\"none\">https:\/\/sites.cs.ucsb.edu\/~urs\/oocsb\/papers\/oopsla96.pdf<\/span><\/a><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0 <\/span><a href=\"https:\/\/stackoverflow.com\/questions\/449827\/virtual-functions-and-performance-c\"><span data-contrast=\"none\">https:\/\/stackoverflow.com\/questions\/449827\/virtual-functions-and-performance-c<\/span><\/a><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The final specifier in C++ marks a class or virtual member function as one which cannot be derived from or overriden. For example, consider the following code:\u00a0 \u00a0struct base {\u00a0 \u00a0 virtual void f() const = 0;\u00a0 };\u00a0 \u00a0 struct derived final : base {\u00a0 \u00a0 void f() const override {}\u00a0 }; If we attempt [&hellip;]<\/p>\n","protected":false},"author":706,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,512],"tags":[],"class_list":["post-25640","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus","category-general-cpp-series"],"acf":[],"blog_post_summary":"<p>The final specifier in C++ marks a class or virtual member function as one which cannot be derived from or overriden. For example, consider the following code:\u00a0 \u00a0struct base {\u00a0 \u00a0 virtual void f() const = 0;\u00a0 };\u00a0 \u00a0 struct derived final : base {\u00a0 \u00a0 void f() const override {}\u00a0 }; If we attempt [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/25640","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/706"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=25640"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/25640\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=25640"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=25640"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=25640"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}