November 1st, 2024

Analyzing the Performance of the “Proxy” Library

Mingxin Wang

Senior Software Engineer

Since the recent announcement of the Proxy 3 library, we have received much positive feedback, and there have been numerous inquiries regarding the library’s actual performance. Although the “Proxy” library is designed to be fast, fulfilling one of our six core missions, it is not immediately clear how fast “Proxy” can be across different platforms and scenarios.

To better understand the performance of the “Proxy” library, we designed 15 benchmarks, tested in four different environments, and automated them in our GitHub pipeline to generate benchmarking reports for every code change in the future. Everyone can download the reports and raw benchmarking data attached to each build. The rest of this article delves into the benchmarking details. The numbers shown below were generated from a recent CI build.

Indirect Invocation

Both proxy objects and virtual functions can perform indirect invocations. However, since they have different semantics and memory layout, it should be interesting to see how they compare to each other.

Because make_proxy can effectively place a small object alongside metadata (similar to “small buffer optimization” in some other C++ libraries), the benchmarks are divided into two categories: invocation on small objects (4 bytes) and on large objects (48 bytes). By invoking 1,000,000 object of 100 different types, we got the first two rows of the report:

	MSVC on Windows Server 2022 (x64)	GCC on Ubuntu 24.04 (x64)	Clang on Ubuntu 24.04 (x64)	Apple Clang on macOS 15 (ARM64)
Indirect invocation on small objects via `proxy` vs. virtual functions	🟢`proxy` is about 261.7% faster	🟢`proxy` is about 44.6% faster	🟢`proxy` is about 71.6% faster	🟡`proxy` is about 4.0% faster
Indirect invocation on large objects via `proxy` vs. virtual functions	🟢`proxy` is about 186.1% faster	🟢`proxy` is about 15.5% faster	🟢`proxy` is about 17.0% faster	🟢`proxy` is about 10.5% faster

From the report, proxy is faster in all four environments, especially on Windows Server. This result is expected because the implementation of proxy directly stores the metadata of the underlying object, making it more cache-friendly.

Lifetime Management

In many applications, lifetime management of various objects can become a performance hotspot compared to indirect invocations. We benchmarked this scenario by creating 600,000 small or large objects within a single std::vector (with reserved space).

Besides proxy, there are three typical standard options for storing arbitrary types: std::unique_ptr, std::shared_ptr, and std::any. std::variant is not included because it is essentially a tagged union and can only provide storage for a known set of types (though useful in data context management).

For small objects, proxy and std::any usually won’t allocate additional storage. For large objects, proxy and std::shared_ptr offer allocator support (via pro::allocate_proxy and std::allocate_shared) to improve performance, while there is no direct API to customize std::unique_ptr or std::any.

Here are the types we used in the benchmarks:

Small types	Large types
`int`	`std::array<char, 100>`
`std::shared_ptr<int>`	`std::array<std::string, 3>`
`std::unique_lock<std::mutex>`	`std::unique_lock<std::mutex>` + `void*[15]`

By comparing proxy with other solutions, we got the following numbers:

	MSVC on Windows Server 2022 (x64)	GCC on Ubuntu 24.04 (x64)	Clang on Ubuntu 24.04 (x64)	Apple Clang on macOS 15 (ARM64)
Basic lifetime management for small objects with `proxy` vs. `std::unique_ptr`	🟢`proxy` is about 467.0% faster	🟢`proxy` is about 413.0% faster	🟢`proxy` is about 430.1% faster	🟢`proxy` is about 341.1% faster
Basic lifetime management for small objects with `proxy` vs. `std::shared_ptr` (without memory pool)	🟢`proxy` is about 639.2% faster	🟢`proxy` is about 509.3% faster	🟢`proxy` is about 492.5% faster	🟢`proxy` is about 484.2% faster
Basic lifetime management for small objects with `proxy` vs. `std::shared_ptr` (with memory pool)	🟢`proxy` is about 198.4% faster	🟢`proxy` is about 696.1% faster	🟢`proxy` is about 660.0% faster	🟢`proxy` is about 188.5% faster
Basic lifetime management for small objects with `proxy` vs. `std::any`	🟢`proxy` is about 55.3% faster	🟢`proxy` is about 311.0% faster	🟢`proxy` is about 323.0% faster	🟢`proxy` is about 18.3% faster
Basic lifetime management for large objects with `proxy` (without memory pool) vs. `std::unique_ptr`	🟢`proxy` is about 17.4% faster	🟢`proxy` is about 14.8% faster	🟢`proxy` is about 29.7% faster	🔴`proxy` is about 6.3% slower
Basic lifetime management for large objects with `proxy` (with memory pool) vs. `std::unique_ptr`	🟢`proxy` is about 283.6% faster	🟢`proxy` is about 109.6% faster	🟢`proxy` is about 204.6% faster	🟢`proxy` is about 88.6% faster
Basic lifetime management for large objects with `proxy` vs. `std::shared_ptr` (both without memory pool)	🟢`proxy` is about 29.2% faster	🟢`proxy` is about 6.4% faster	🟢`proxy` is about 6.5% faster	🟡`proxy` is about 4.8% faster
Basic lifetime management for large objects with `proxy` vs. `std::shared_ptr` (both with memory pool)	🟢`proxy` is about 10.8% faster	🟢`proxy` is about 9.9% faster	🟢`proxy` is about 8.3% faster	🟢`proxy` is about 53.2% faster
Basic lifetime management for large objects with `proxy` (without memory pool) vs. `std::any`	🟢`proxy` is about 13.4% faster	🟡`proxy` is about 1.3% slower	🟡`proxy` is about 0.9% faster	🟢`proxy` is about 9.5% faster
Basic lifetime management for large objects with `proxy` (with memory pool) vs. `std::any`	🟢`proxy` is about 270.7% faster	🟢`proxy` is about 80.1% faster	🟢`proxy` is about 136.9% faster	🟢`proxy` is about 120.4% faster

From the benchmarking results:

proxy is much faster than any other 3 when the underlying object is small, or managed with memory pools.
proxy is slightly slower than std::unique_ptr when the underlying object is large and not managed with a memory pool.
The performance of std::any varies in different environments, but is generally slower than proxy.

Summary

Although the test environments (GitHub-hosted runners) may differ from actual production environments, the test results show significant performance advantages of proxy in both indirect invocations and lifetime management. If you have more ideas for benchmarking the “Proxy” library, we welcome contributions to our GitHub repository.

Author

Mingxin Wang

Senior Software Engineer

3 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :

Newest

Newest Popular Oldest

Jean Gautier November 7, 2024 · Edited 0

Hi Mingxin,

You did not include variants in your benchmark because it is bound to a list of types and serves as a tagged union for storage.
I can understand that.

However, it is widely used in designs that strive for static polymorphism.
In that regard, it would be really interesting to help evaluate the benefit of lib proxy to also be able to compare it to a std::variant approach.

Thank you,
Jean
- Mingxin Wang Author November 9, 2024 · Edited 1
  Hi Jean,
  
  Thank you for your interest in our work. Like you said, was excluded from the benchmarking because "it is bound to a (fixed) list of types". To avoid misleading, we decided not to compare the performance of due to the scope differences. However, it is true that virtual functions and can theoritically be replaced by within a single binary, at the cost of engineering efforts in managing templates.
  
  I forked the repository and created a branch to add benchmarks for with the existing configurations. For your reference, here's the result generated from my branch (copied...
  Read more
  Hi Jean,
  
  Thank you for your interest in our work. Like you said, std::variant was excluded from the benchmarking because “it is bound to a (fixed) list of types”. To avoid misleading, we decided not to compare the performance of std::variant due to the scope differences. However, it is true that virtual functions and proxy can theoritically be replaced by std::variant within a single binary, at the cost of engineering efforts in managing templates.
  
  I forked the repository and created a branch to add benchmarks for std::variant with the existing configurations. For your reference, here’s the result generated from my branch (copied from benchmarking-report.md):
  
  MSVC on Windows Server 2022 (x64)
  GCC on Ubuntu 24.04 (x64)
  Clang on Ubuntu 24.04 (x64)
  Apple Clang on macOS 15 (ARM64)
  
  Indirect invocation on small objects via proxy vs. std::variant
  🟢proxy is about 16.5% faster
  🟢proxy is about 7.4% faster
  🟡proxy is about 0.9% faster
  🟡proxy is about 0.7% faster
  
  Indirect invocation on large objects via proxy vs. std::variant
  🔴proxy is about 24.9% slower
  🟡proxy is about 4.9% faster
  🟡proxy is about 2.4% slower
  🔴proxy is about 16.6% slower
  
  Basic lifetime management for small objects with proxy vs. std::variant
  🔴proxy is about 14.4% slower
  🔴proxy is about 59.6% slower
  🔴proxy is about 64.8% slower
  🔴proxy is about 8.7% slower
  
  Basic lifetime management for large objects with proxy (without memory pool) vs. std::variant
  🔴proxy is about 59.3% slower
  🔴proxy is about 67.0% slower
  🔴proxy is about 69.7% slower
  🔴proxy is about 45.3% slower
  
  Basic lifetime management for large objects with proxy (with memory pool) vs. std::variant
  🟢proxy is about 32.3% faster
  🔴proxy is about 38.8% slower
  🔴proxy is about 38.9% slower
  🟢proxy is about 10.0% faster
  
  From the results,
  
  The performance comparison of indirect invocation may vary in different environments, but no magnitude difference is observed.
  
  Lifetime management with std::variant is generally faster than proxy
  
  When managing lifetime for large objects, proxy may sometimes become faster than std::variant with external memory pools.
  
  Because I noticed the performance of std::variant may vary significantly depending on the largest type in the list even if it stores a value of a small type, I think it might be more accurate to adjust the benchmarks against std::variant with specific production scenarios. Do you think these number helps? Or do you have any other idea to compare the runtime performance between proxy and std::variant?
  
  Thanks,
  Mingxin
  Read less
  - Jean Gautier November 12, 2024 1
    
    Hi Mingxin,
    
    I am not surprised by the result and I feel the benchmark is more “complete” with all known alternatives to “classic” polymorphism.
    Proxy has a huge (IMO) benefit over variant is that it “decouples” the implementation and and the consumers of an ABI like the virtual functions do (and like std::variant do not).
    
    We are looking forward to use proxy in our experiments soon !
    Thank you very much for taking the time for this additional benchmark.
    Jean

Analyzing the Performance of the “Proxy” Library

Indirect Invocation

Lifetime Management

Summary

Author

3 comments

Read next

Microsoft C++ Team at CppCon 2024: Trip Report

What’s New in vcpkg (October 2024)

Indirect Invocation

Lifetime Management

Summary

Author

3 comments

Read next

Microsoft C++ Team at CppCon 2024: Trip Report

What’s New in vcpkg (October 2024)

Stay informed