Since the recent announcement of the Proxy 3 library, we have received much positive feedback, and there have been numerous inquiries regarding the library’s actual performance. Although the “Proxy” library is designed to be fast, fulfilling one of our six core missions, it is not immediately clear how fast “Proxy” can be across different platforms and scenarios.
To better understand the performance of the “Proxy” library, we designed 15 benchmarks, tested in four different environments, and automated them in our GitHub pipeline to generate benchmarking reports for every code change in the future. Everyone can download the reports and raw benchmarking data attached to each build. The rest of this article delves into the benchmarking details. The numbers shown below were generated from a recent CI build.
Indirect Invocation
Both proxy
objects and virtual functions can perform indirect invocations. However, since they have different semantics and memory layout, it should be interesting to see how they compare to each other.
Because make_proxy
can effectively place a small object alongside metadata (similar to “small buffer optimization” in some other C++ libraries), the benchmarks are divided into two categories: invocation on small objects (4 bytes) and on large objects (48 bytes). By invoking 1,000,000 object of 100 different types, we got the first two rows of the report:
MSVC on Windows Server 2022 (x64) | GCC on Ubuntu 24.04 (x64) | Clang on Ubuntu 24.04 (x64) | Apple Clang on macOS 15 (ARM64) | |
---|---|---|---|---|
Indirect invocation on small objects via proxy vs. virtual functions |
🟢proxy is about 261.7% faster |
🟢proxy is about 44.6% faster |
🟢proxy is about 71.6% faster |
🟡proxy is about 4.0% faster |
Indirect invocation on large objects via proxy vs. virtual functions |
🟢proxy is about 186.1% faster |
🟢proxy is about 15.5% faster |
🟢proxy is about 17.0% faster |
🟢proxy is about 10.5% faster |
From the report, proxy
is faster in all four environments, especially on Windows Server. This result is expected because the implementation of proxy
directly stores the metadata of the underlying object, making it more cache-friendly.
Lifetime Management
In many applications, lifetime management of various objects can become a performance hotspot compared to indirect invocations. We benchmarked this scenario by creating 600,000 small or large objects within a single std::vector
(with reserved space).
Besides proxy
, there are three typical standard options for storing arbitrary types: std::unique_ptr
, std::shared_ptr
, and std::any
. std::variant
is not included because it is essentially a tagged union and can only provide storage for a known set of types (though useful in data context management).
For small objects, proxy
and std::any
usually won’t allocate additional storage. For large objects, proxy
and std::shared_ptr
offer allocator support (via pro::allocate_proxy
and std::allocate_shared
) to improve performance, while there is no direct API to customize std::unique_ptr
or std::any
.
Here are the types we used in the benchmarks:
Small types | Large types |
---|---|
int |
std::array<char, 100> |
std::shared_ptr<int> |
std::array<std::string, 3> |
std::unique_lock<std::mutex> |
std::unique_lock<std::mutex> + void*[15] |
By comparing proxy
with other solutions, we got the following numbers:
MSVC on Windows Server 2022 (x64) | GCC on Ubuntu 24.04 (x64) | Clang on Ubuntu 24.04 (x64) | Apple Clang on macOS 15 (ARM64) | |
---|---|---|---|---|
Basic lifetime management for small objects with proxy vs. std::unique_ptr |
🟢proxy is about 467.0% faster |
🟢proxy is about 413.0% faster |
🟢proxy is about 430.1% faster |
🟢proxy is about 341.1% faster |
Basic lifetime management for small objects with proxy vs. std::shared_ptr (without memory pool) |
🟢proxy is about 639.2% faster |
🟢proxy is about 509.3% faster |
🟢proxy is about 492.5% faster |
🟢proxy is about 484.2% faster |
Basic lifetime management for small objects with proxy vs. std::shared_ptr (with memory pool) |
🟢proxy is about 198.4% faster |
🟢proxy is about 696.1% faster |
🟢proxy is about 660.0% faster |
🟢proxy is about 188.5% faster |
Basic lifetime management for small objects with proxy vs. std::any |
🟢proxy is about 55.3% faster |
🟢proxy is about 311.0% faster |
🟢proxy is about 323.0% faster |
🟢proxy is about 18.3% faster |
Basic lifetime management for large objects with proxy (without memory pool) vs. std::unique_ptr |
🟢proxy is about 17.4% faster |
🟢proxy is about 14.8% faster |
🟢proxy is about 29.7% faster |
🔴proxy is about 6.3% slower |
Basic lifetime management for large objects with proxy (with memory pool) vs. std::unique_ptr |
🟢proxy is about 283.6% faster |
🟢proxy is about 109.6% faster |
🟢proxy is about 204.6% faster |
🟢proxy is about 88.6% faster |
Basic lifetime management for large objects with proxy vs. std::shared_ptr (both without memory pool) |
🟢proxy is about 29.2% faster |
🟢proxy is about 6.4% faster |
🟢proxy is about 6.5% faster |
🟡proxy is about 4.8% faster |
Basic lifetime management for large objects with proxy vs. std::shared_ptr (both with memory pool) |
🟢proxy is about 10.8% faster |
🟢proxy is about 9.9% faster |
🟢proxy is about 8.3% faster |
🟢proxy is about 53.2% faster |
Basic lifetime management for large objects with proxy (without memory pool) vs. std::any |
🟢proxy is about 13.4% faster |
🟡proxy is about 1.3% slower |
🟡proxy is about 0.9% faster |
🟢proxy is about 9.5% faster |
Basic lifetime management for large objects with proxy (with memory pool) vs. std::any |
🟢proxy is about 270.7% faster |
🟢proxy is about 80.1% faster |
🟢proxy is about 136.9% faster |
🟢proxy is about 120.4% faster |
From the benchmarking results:
proxy
is much faster than any other 3 when the underlying object is small, or managed with memory pools.proxy
is slightly slower thanstd::unique_ptr
when the underlying object is large and not managed with a memory pool.- The performance of
std::any
varies in different environments, but is generally slower thanproxy
.
Summary
Although the test environments (GitHub-hosted runners) may differ from actual production environments, the test results show significant performance advantages of proxy
in both indirect invocations and lifetime management. If you have more ideas for benchmarking the “Proxy” library, we welcome contributions to our GitHub repository.
Hi Mingxin,
You did not include variants in your benchmark because it is bound to a list of types and serves as a tagged union for storage.
I can understand that.
However, it is widely used in designs that strive for static polymorphism.
In that regard, it would be really interesting to help evaluate the benefit of lib proxy to also be able to compare it to a std::variant approach.
Thank you,
Jean
Hi Jean,
Thank you for your interest in our work. Like you said, was excluded from the benchmarking because "it is bound to a (fixed) list of types". To avoid misleading, we decided not to compare the performance of due to the scope differences. However, it is true that virtual functions and can theoritically be replaced by within a single binary, at the cost of engineering efforts in managing templates.
I forked the repository and created a branch to add benchmarks for with the existing configurations. For your reference, here's the result generated from my branch (copied...
Hi Mingxin,
I am not surprised by the result and I feel the benchmark is more “complete” with all known alternatives to “classic” polymorphism.
Proxy has a huge (IMO) benefit over variant is that it “decouples” the implementation and and the consumers of an ABI like the virtual functions do (and like std::variant do not).
We are looking forward to use proxy in our experiments soon !
Thank you very much for taking the time for this additional benchmark.
Jean