A common execution path optimization

Sergey Tepliakov

Sergey

Today I want to talk about one interesting optimization pattern that you may face in framework code or in high-performance libraries.

The idea is simple: suppose you have a commonly used method that has two execution paths – one is very common and simple, and the second one takes longer to execute, has more steps but happens not that often.

As an example, let’s consider a List<T>.Add method. On the “happy-path” Add method is very simple and efficient: if there is enough space it adds an item to an internal array. But if the internal array is full, Add method resizes the underlying array (by allocating another array double the size) and then adds an element to it.

Here is a simple implementation of this method:

Unfortunately, this implementation is not eligible for method inlining by the JIT compiler, because it is too large. Even if we try to “force” the method inlining by using MethodImplOptions.AggressiveInlining nothing good will happen. As we’ll see at the end of the post, inlining big methods doesn’t improve performance.

There is no official documentation regarding the JIT compiler optimizations. But there are enough unofficial documents (like blog-posts) that cover current behavior. The JIT compiler won’t inline a method in the following cases:

  • Method is marked with MethodImplOptions.NoInlining
  • Method body is larger than 32 bytes of the IL code
  • Virtual method calls including interface method invocation
  • Method has complex flow control like switch, while
  • Method has exception handling logic

Add method shown above fits into the second rule and won’t be inlined due to its size.

We know that the method has two cases: light-weight common path when the list has enough space and the rare case when the list should be resized. Based on this knowledge we can extract the logic for a rare case and leave the happy-path code as is:

The change is very simple, but it makes the method small enough for method inlining by the JIT compiler. We can check the difference using the amazing benchmarking tool – BenchmarkDotNet:

As you can see the difference is pretty significant (about 20% for a method that just adds a 500 000 elements). You may wander, why the difference is so big? The asymptotic complexity for the Add method is O(N), but the amortized complexity is O(1). The underlying array grows exponentially, so in the vast majority of cases adding an element to the list so cheap that the method call overhead starts playing a reasonable role.

In Visual Studio, you can check that the JIT compiler actually inlines the method. To do that, add a breakpoint to the both benchmark method, run the app in Release mode, and then switch to Disassembly window (Debug => Windows => Disassembly). But be aware, that you need to disable two options Tools => Options => Debugging => General menu: ‘Suppress JIT optimization on module load’ and ‘Enable Just My Code’.

When to use the pattern?

This is not a common pattern that would be helpful in every application. It may be useful for extremely hot paths in end-user production code. However it is more likely to be helpful for high-performance libraries and frameworks where even slight overhead is critical since they may be called on the hot path.

For instance, this pattern is used in the BCL and several places in TPL Dataflow (StartTaskSafe, OfferAsyncIfNececssary, ProcessAsyncIfNecessary etc):

You may consider using this pattern if you have a highly-used member with two distinct scenarios: fast and common, and heavyweight and rare. If the heavyweight part is not inline friendly due to its size or language constructs (like, try/finally) then splitting methods in two will help the CLR JIT to inline the whole method.

Beware of too much inlining

You may think that the method inlining is such a good thing, that frameworks and libraries should enforce the runtime to inline as much as possible using MethodImplOptions.AggressiveInlining that was added in the .NET 4.5. First of all, the attribute removes only the restriction for method inlining based on a method size and won’t force the runtime to inline virtual methods (even ‘sealed’ ones), or methods with complicated control flow or exception handling.

But the main reason for not using the attribute without careful measurement is the speed. The method inlining is double-edged sword. Method inlining reduces number of instructions but can make the resulting code bigger.

For instance, we can force method inlining for our original implementation of the Add method, but it won’t yield any performance benefits. Actually, the overall performance would be slightly worse (for more information see “To Inline or not to Inline: That is the question” by Vance Morrison):

Additional links and resources

Sergey Tepliakov
Sergey Tepliakov

Senior Software Engineer, Tools for Software Engineers

Follow Sergey   

0 comments

    Leave a comment