C++/WinRT performance trap: Switching to Windows Runtime too soon

Raymond Chen

The Windows Runtime contains equivalents to C++ vectors and maps, namely IVector and IMap. If you have a choice, stick with the C++ versions, and if you have to produce a Windows Runtime version, delay the conversion as long as possible.

The reason is that the Windows Runtime types are all virtual interfaces, which means vtable calls for all the methods which cannot be inlined or optimized. Whereas C++ library types have a richer set of available methods to allow you to write simpler code, and since they are written in C++ itself, the compiler can perform optimizations that aren’t available to virtual methods.

For example, suppose you have a method that has to return an IVector<Widget>. I see a lot of people write it like this:

winrt::IVector<Widget> GetWidgets()
{
    winrt::IVector<Widget> widgets =
        winrt::multi_threaded_vector<Widget>();

    widgets.Append(GetFirstWidget());
    widgets.Append(GetSecondWidget());
    widgets.Append(GetThirdWidget());

    return widgets;
}

This creates an empty Windows Runtime vector and adds three widgets to it. Each of those Append calls is a virtual method call that the compiler may not be able to devirtualize, and since you have a multi-threaded vector, it’s going to do some locking to protect against concurrent access, even though no concurrency is possible here because the vector hasn’t been shared with anyone else yet.

More efficient would be

winrt::IVector<Widget> GetWidgets()
{
    std::vector<Widget> widgets;
    widgets.emplace_back(GetFirstWidget());
    widgets.emplace_back(GetSecondWidget());
    widgets.emplace_back(GetThirdWidget());

    return winrt::multi_threaded_vector<Widget>(
        std::move(widgets));
}

Creating the vector is done in the C++ world, where the compiler can do all sorts of clever optimizations. Only at the end is this vector converted to a Windows Runtime vector to be returned to the caller.

And now that you’ve gotten the vector-building in C++, you can take advantage of additional C++ features, like reservation to avoid reallocation:

winrt::IVector<Widget> GetWidgets()
{
    std::vector<Widget> widgets;
    widgets.reserve(3);

    widgets.emplace_back(GetFirstWidget());
    widgets.emplace_back(GetSecondWidget());
    widgets.emplace_back(GetThirdWidget());

    return winrt::multi_threaded_vector<Widget>(
        std::move(widgets));
}

Or construction from an initializer-list.

winrt::IVector<Widget> GetWidgets()
{
    std::vector<Widget> widgets{
        GetFirstWidget(),       
        GetSecondWidget(),      
        GetThirdWidget(),       
    };                          

    return winrt::multi_threaded_vector(
        std::move(widgets));
}

Bonus reading: On the virtues of the trailing comma.

Once we construct the vector from an initializer-list, we can let CTAD save us some more typing:

winrt::IVector<Widget> GetWidgets()
{
    std::vector widgets{
        GetFirstWidget(),
        GetSecondWidget(),
        GetThirdWidget(),
    };

    return winrt::multi_threaded_vector(
        std::move(widgets));
}

And finally, we can just in-place construct the vector in the parameter list for multi_threaded_vector.

winrt::IVector<Widget> GetWidgets()
{
    return winrt::multi_threaded_vector(
        std::vector{
            GetFirstWidget(),
            GetSecondWidget(),
            GetThirdWidget(),
        });
}

A similar shortcut applies to maps.

winrt::IMap<winrt::hstring, int32_t> GetCounts()
{
    return winrt::multi_threaded_map(
        std::unordered_map<winrt::hstring, int32_t>{
            { L"triangles", GetTriangleCount() },
            { L"rectangles", GetCircleCount() },
            { L"circles", GetCircleCount() },
        });
}

Even if you can’t reduce your function to a one-liner, it’s more efficient (and probably a lot easier) to collect the contents into a C++ vector or map (or unordered map) and then wrap it inside a Windows Runtime interface as a final step.

winrt::IVector<Widget> GetWidgets()
{
    std::vector<Widget> widgets;

    // Collect all the widgets from enabled doodads
    for (auto&& doodad : m_doodads) {
        if (doodad.m_enabled) {
            widgets.insert(widgets.end(),
                doodad.m_widgets.begin(),
                doodad.m_widgets.end());
        }
    }

    // Sort by population descending
    std::sort(widgets.begin(), widgets.end(),
        [](Widget const& left, Widget const& right) {
            return left.Population() > right.Population();
        });

    return winrt::multi_threaded_vector(
        std::move(widgets));
}

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

14 comments

Discussion is closed. Login to edit/delete existing comments.

Simona Koníčková March 9, 2024 · Edited

So that’s why the WinRT APIs are so enormously slow as compared to Win32, sometimes a factor of more than a 100.
Brian Boorman March 4, 2024

Rectangle is the new Circle.
Neil Rashbrook March 2, 2024
Is it possible to initialise the vector parameter to winrt::single_threaded_vector directly with an initialiser list?
```
winrt::IVector<Widget> GetWidgets()
{
    return winrt::multi_threaded_vector({
        GetFirstWidget(),
        GetSecondWidget(),
        GetThirdWidget(),
    });
}
```
- Me Gusta March 3, 2024
  Since you asked about single_threaded_vector but showed the code snipped with multi_threaded_vector, I'm assuming that you are asking whether the single_threaded_vector version constructs in the same way.
  Look inside the C++/WinRT generated windows.foundation.collections.h.
  <code>
  Tracing further back, both of these template classes are instantiations of vector_impl with different parameters. The only difference in the template parameters is the base which controls the locking, both being lightweight structs with the default constructor. This means that both versions should construct equally.
  
  Read more
  Since you asked about single_threaded_vector but showed the code snipped with multi_threaded_vector, I’m assuming that you are asking whether the single_threaded_vector version constructs in the same way.
  Look inside the C++/WinRT generated windows.foundation.collections.h.
  
  template <typename T, typename Allocator = std::allocator<T>> Windows::Foundation::Collections::IVector<T> single_threaded_vector(std::vector<T, Allocator>&& values = {}) { return make<impl::input_vector<T, std::vector<T, Allocator>>>(std::move(values)); } template <typename T, typename Allocator = std::allocator<T>> Windows::Foundation::Collections::IVector<T> multi_threaded_vector(std::vector<T, Allocator>&& values = {}) { return make<impl::multi_threaded_vector<T, std::vector<T, Allocator>>>(std::move(values)); }
  
  Tracing further back, both of these template classes are instantiations of vector_impl with different parameters. The only difference in the template parameters is the base which controls the locking, both being lightweight structs with the default constructor. This means that both versions should construct equally.
  Read less
  - Neil Rashbrook March 5, 2024
    
    Sorry, that was a typo in my part. Thanks for trying to answer anyway. As it happens, the declarations you pasted show the default parameter value using an initialiser list, which suggests that you can use one directly without having to specifically name the class.
Sigge Mannen March 2, 2024

I agree with the general sentiment here. Perhaps it’s not super helpful in this specific case though, since i guess people write WinRT code mostly to do some gui stuff in windows, and not for trading software or whatever where performance of virtual methods make any difference.
Isn’t it better to stick with uniform code than some weird mismatch of raw c++ sprinkled with RT specific classes. And if you’re writing performance sensitive library code you’re unlikely to even return the wrapped classes anyway
- Paulo Pinto March 2, 2024
  
  After the downfall of UWP, the only reason for non-Microsoft employees to write WinRT code is to interoperate with Windows APIs only exposed via WinRT.
  
  For GUI code, you’re better off keeping wiht the classics Windows Forms, WPF, or in C++’s case, VCL, Firemonkey, Qt, wxWidgets.
  - Mark out West March 7, 2024
    
    Maybe an outlier but I write WRL/WinRT in-proc DLLs all the time. One ABI, multiple languages supported. And distribution and installation each time is a cinch. But yes, desktop app and UI dev in WinRT is a strain.
    
    Funny, I haven’t touched an HWND or HANDLE in years.
  - Daniel Roskams March 6, 2024
    
    A lot of of WinRT is garbage and is just a bloated COM-based wrapper around existing Win32 APIs that involves a couple of heap allocations, IInspectable, 20 function pointers and a bunch of calls to QueryInterface for what should really just be a single direct function call into user32 or whatever. Devs should really stop using that rubbish but for some reason it’s still used here and there for such trivial purposes as getting the user’s display language or the size of the mouse cursor. WinRT should never have been made in the first place… disgraceful.
  - Joe Beans March 3, 2024
    
    WPF is GOAT even though MS is slow-walking any fixes to let Google and Apple take the lead on desktop (every MS corporate officer bringing a MacBook to meetings should be fired immediately as a spy). If WPF were unviable then you wouldn’t have Avalonia XPF making a business off of the fork. UWP is a technical sore that only existed for Windows Phone, and then they got rid of Windows Phone for no reason — the people who would have written the phone apps are now expected to write the Store apps, bringing us full-circle. What a dumpster fire.
紅樓鍮 March 1, 2024

Imagine if winrt::single_threaded_vector implemented a C++/WinRT-specific COM interface (say, IVectorCppInterop), that allows you to get access to the underlying std::vector. This would allow your class to retain an IVectorCppInterop reference, which allows your own code to call std::vector methods directly, while exposing the same vector as an IVector to the WinRT world.
- Raymond Chen Author March 2, 2024
  
  Is there a way to compile-time generate an IID based on the specific version of the C++ standard library you are building with? Because IVectorCppInterop for MSVC 14.38 needs a different IID from IVectorCppInterop for libc++ 19.0.0. Otherwise, another component might QI for IVectorCppInterop, see that it succeeds, and mistakenly believe that it’s one of its own vectors. (Too bad that other component was compiled with gcc and you compiled with clang.)
  - Joshua Hudson March 2, 2024
    
    Yes, and it's a royal pain.
    
    The better solution is to ship the interop library as a mostly-header library. Something like this:
    
```
template virtual VectorWrapper IVector::GetVector(); /* This is the COM method */

template class VectorWrapper {
private:
IVector*_owner; /* I might have this type wrong; on implementing it you will find out */
size_t *_actual_size;
T **_first;
VectorWrapper(IVector owner, size_t **size, T **first)...
Read more
Yes, and it’s a royal pain.

The better solution is to ship the interop library as a mostly-header library. Something like this:

“`
template virtual VectorWrapper IVector::GetVector(); /* This is the COM method */

template class VectorWrapper {
private:
IVector*_owner; /* I might have this type wrong; on implementing it you will find out */
size_t *_actual_size;
T **_first;
VectorWrapper(IVector owner, size_t **size, T **first) : _owner(owner), _actual_size(size), _first(first) { _owner.AddRef(); }
public:
~VectorWrapper() { _owner.Release(); }
bool empty() const { return size() == 0 }
size_t size() const { return _size; }

void clear() { _owner.clear(); }
void insert(size_t pos, T value) { _owner.insert(pos, value); }
void push_back(size_t pos, T value) { _impl.insert(pos, value); }
void pop_back() { _owner.pop_back(); }
/* omitted the rest of the mutator methods */

/* Here’s the gains */
T *data() { return _first; }
const T *data() const { return _first; }
T &at(ptrdiff_t index) { /* bounds check omitted for brevity */ return _first[index]; }
const T &at(ptrdiff_t index) const { /* bounds check omitted for brevity */ return _first[index]; }
T &operator[](ptrdiff_t index) { return _first[index]; }
const T &operator[](ptrdiff_t index) const { return _first[index]; }
T *begin() { return _first; }
T *end() { return _first + size(); }
const T *begin() const { return _first; }
const T *end() const { return _first + size(); }
/* rbegin and rend omitted for brevity; they’re just run your mouth off code not hard */
}
“`

What this does is answer all reader calls out of pointers to the actual backing store, which it was given pointers to in its constructor call, while forwarding the writers to the COM implementation so it can use the provder’s grow/shrink methods.

Read less
- Henke37 March 2, 2024
  
  That would be awesome. Provided that you could ensure that both the WinRt implementation and your code are talking about the same std::vector. Which is a pain and not part of any contract.