If we can have `std::atomic<std::shared_ptr>`, why not `std::atomic<com_ptr>`?

Raymond Chen

Some time ago, we peeked inside the atomic shared_ptr to see how it worked. Can we apply these same principles to create an atomic com_ptr?

Recall that the atomic shared_ptr operates by using the bottom bit of the control block pointer as a lock flag, so that nobody can change the value while we’re copying the pointer and incrementing the reference count. Can we do this with a com_ptr?

We could use the same trick of using the bottom bit of the raw COM pointer as a lock flag. This is acceptable because COM pointers must be pointer-aligned (since they point to a vtable), so we know that the bottom bit of a valid COM pointer is clear. However, we run into trouble when trying to increment the reference count: The call to IUnknown::AddRef happens while the lock is held, but the AddRef is a call out to external code, and we don’t know what it’s going to do. We know what it’s supposed to do (namely, increment the reference count), but it may take a circuitous route to get there, including passing through aggregated controlling unknowns, tear-off stubs, tear-offs of aggregated objects, weak outer pointers, and other fanciful characters.

We know that holding a lock while calling out to external code is a source of deadlocks, so holding a lock while calling out to a mystery implementation of IUnknown::AddRef is probably not a good idea.

Sorry.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

11 comments

Discussion is closed. Login to edit/delete existing comments.

Александр Гутенев May 7, 2025 · Edited

Author of `atomic::wait` and `atomic<shared_ptr<T>>::wait` PRs here.

I'm afraid that `atomic<T>::wait` is problematic too, but for a different reason.

Turns out that `atomic<shared_ptr<T>>::wait` should also treat the value as changed if the value pointer is not changed, but the control block is different. This could have been achieved with `WaitOnAddress` on two pointers residing contiguously in memory, but the maximum size for `WaitOnAddress` is 8 bytes. Another contributor fixed that by adding timed backoff in his PR.

I've created an issue to revisit that in the future. But unless `WaitOnAddress` with 16 bytes emerges, any other solution, like using "indirect" wait (what the...
Read more
Author of `atomic::wait` and `atomic<shared_ptr<T>>::wait` PRs here.

I’m afraid that `atomic<T>::wait` is problematic too, but for a different reason.

Turns out that `atomic<shared_ptr<T>>::wait` should also treat the value as changed if the value pointer is not changed, but the control block is different. This could have been achieved with `WaitOnAddress` on two pointers residing contiguously in memory, but the maximum size for `WaitOnAddress` is 8 bytes. Another contributor fixed that by adding timed backoff in his PR.

I’ve created an issue to revisit that in the future. But unless `WaitOnAddress` with 16 bytes emerges, any other solution, like using “indirect” wait (what the STL currently does for `atomic<16 bytes>`), using mutexes, or keeping the timed backoff approach — would be suboptimal, partly defeating the purpose of having `atomic<shared_ptr<T>>::wait` in the C++ Standard.

Read less
Igor Levicki April 28, 2025

@Raymond Chen

Sorry for the offtopic Raymond, but I want to let you know that:

- Turning off email notifications in this site profile doesn't work (I still get them)
- This commenting system is broken:

post A -> post B (reply to A) -> post C (reply to B) <-- you are here and you can't reply to it -- why not just drop the pretense of supporting threaded conversations and allow for post quoting?

- Sign in (with Microsoft at least) doesn't keep you signed in and you have to sign in every time you visit...
Read more
@Raymond Chen

Sorry for the offtopic Raymond, but I want to let you know that:

– Turning off email notifications in this site profile doesn’t work (I still get them)
– This commenting system is broken:

post A -> post B (reply to A) -> post C (reply to B) <– you are here and you can't reply to it — why not just drop the pretense of supporting threaded conversations and allow for post quoting?

– Sign in (with Microsoft at least) doesn't keep you signed in and you have to sign in every time you visit if you want to leave a comment or vote

Would be nice if you could get the owners to fix any of it.

Read less
- Raymond Chen Author April 28, 2025
  
  I’ve asked the site maintainers to look into it.
Neil Rashbrook April 26, 2025 · Edited

Calling AddRef is plain sailing compared to calling Release, which can run destructors.
- Kevin Norris April 26, 2025
  
  I don't think this is likely to be an issue, at least for atomic std::shared_ptr, because C++ specifies that the destruction of the underlying object is sequenced after the atomic operations on the refcount. Effectively, your weak pointers all see the object as "already gone" even if its destructor has not yet run (and C++ specifies that behavior even for non-atomic std::weak_ptr). So you don't need to hold a lock when you run the destructor, because by the time you're destroying the pointee, you've already prevented anyone else from getting a pointer to it, and you are explicitly not required...
  Read more
  I don’t think this is likely to be an issue, at least for atomic std::shared_ptr, because C++ specifies that the destruction of the underlying object is sequenced after the atomic operations on the refcount. Effectively, your weak pointers all see the object as “already gone” even if its destructor has not yet run (and C++ specifies that behavior even for non-atomic std::weak_ptr). So you don’t need to hold a lock when you run the destructor, because by the time you’re destroying the pointee, you’ve already prevented anyone else from getting a pointer to it, and you are explicitly not required to give weak_ptr exact knowledge of when the object has been fully destroyed. I don’t pretend to understand how COM handles this, but I can’t imagine they would do it all that differently.
  
  Read less
Matt McCutchen April 25, 2025 · Edited

Update (2025-04-29): The following argument was wrong as pointed out in Mike Winterberg's reply. But if you don't specialize , then the default implementation uses a lock anyway, so you have exactly the same risk of deadlock, right? So what's the benefit of declining to provide the specialization (and the resulting small performance improvement)? Is the existence of a specialization going to make a difference in whether users realize there is a deadlock risk unless they incorporate the hidden lock and into their lock hierarchy? It may be worth noting that while this article apparently...
Read more
Update (2025-04-29): The following argument was wrong as pointed out in Mike Winterberg’s reply. But if you don’t specialize std::atomic<com_ptr<T>>, then the default implementation uses a lock anyway, so you have exactly the same risk of deadlock, right? So what’s the benefit of declining to provide the specialization (and the resulting small performance improvement)? Is the existence of a specialization going to make a difference in whether users realize there is a deadlock risk unless they incorporate the hidden lock and AddRef into their lock hierarchy? It may be worth noting that while this article apparently focuses on use cases in which a com_ptr<T> is all the user needs to access from multiple threads, I imagine there are many use cases in which access to a com_ptr<T> is one step of a larger method call on a threadsafe COM object that takes a lock for the entire method call. Users must already have some way of doing that safely, so I’d think they could apply the same techniques to a std::atomic<com_ptr<T>> if they were aware that they needed to.

In better news, I think there actually is a way to implement an atomic com_ptr that doesn’t complicate the lock hierarchy (at least compared to a non-atomic one). IIUC, the basic race condition we need to avoid is the following: Thread A reads the com_ptr, but before thread A can call AddRef, thread B updates the com_ptr, which releases a reference to the old target object that happens to be the last reference. Now thread A is calling AddRef on a freed object. Somehow, we need to delay the release when a read of the com_ptr is pending. In general, an update to the com_ptr could occur while multiple reads are pending, and we need to ensure that the corresponding release occurs exactly once, when all those reads are done. Perhaps you can see the solution I’m building up to: another level of reference count aggregation. Specifically, use a std::atomic<std::shared_ptr<com_ptr<T>>>. To read the shared variable, first copy the shared_ptr and then copy the com_ptr, which does the AddRef outside the shared_ptr lock; finally, destruct the temporary shared_ptr, which releases the object if appropriate. To write the shared variable, replace the entire shared_ptr with a new make_shared of the desired com_ptr. One could encapsulate this logic in a shareable_com_ptr wrapper class and then define a std::atomic specialization for it. I don’t know whether this solution should be taken seriously, but if I’m not missing anything, it should work.

(The usual disclaimer: My comment is based on bits of information I picked up from reading this blog for fun, not from actually using the relevant technologies.)

Read less
- Mike Winterberg April 29, 2025 · Edited
  
  Keeping in mind your disclaimer… std::atomic<com_ptr> won’t compile, since the primary std::atomic template requires that the object it contains to be trivially copyable. Since com_ptr defines a copy constructor, it isn’t.
  - Matt McCutchen April 29, 2025
    
    You are right. My knowledge of the primary template for std::atomic<T> was based on another article on this blog, and I neglected to look up whether there were restrictions on T. That invalidates my first point, so I crossed it out. Thanks.
- lxndrrbrtgtsch2 April 26, 2025 · Edited
  
  std::atomic<shareable_com_ptr<T>> does not behave like std::atomic<com_ptr<T>> since std::atomic::store is noexcept. having the lower bits of the pointer act as a counter of outstanding or excess add_ref calls would work. with the only problem being a deadlock if the add_ref call calls std::atomic<com_ptr<T>>::load on its owner. a full split refcount using 128bit CAS avoids this
  - Matt McCutchen April 29, 2025
    
    is another issue I didn't research; thanks for pointing it out. To confirm, what potential exception(s) from are you concerned about? Failure to allocate heap memory for the new ? From what I read, COM methods aren't supposed to throw C++ exceptions, so calling in a function might technically be OK, though I'm starting to think it's inconsistent with the spirit of .
    
    Let's put aside for now the question of whether it's reasonable to expose the solution as a specialization and suppose our goal is just to provide a thread-safe wrapper...
    Read more
    noexcept is another issue I didn’t research; thanks for pointing it out. To confirm, what potential exception(s) from std::atomic<shareable_com_ptr<T>>::store are you concerned about? Failure to allocate heap memory for the new shared_ptr? From what I read, COM methods aren’t supposed to throw C++ exceptions, so calling AddRef in a noexcept function might technically be OK, though I’m starting to think it’s inconsistent with the spirit of std::atomic.
    
    Let’s put aside for now the question of whether it’s reasonable to expose the solution as a std::atomic specialization and suppose our goal is just to provide a thread-safe com_ptr wrapper that (1) doesn’t take a lock around a call to external code and (2) doesn’t do anything that could throw a C++ exception aside from possibly AddRef/Release. (I don’t know if this is useful, but it’s a fun exercise for me.) So we need to eliminate the heap allocation, and it sounds like that’s what you’re trying to do in your solution. I’m not clear on the details, and I’m doubtful that it can work. Suppose the shareable_com_ptr points to COM object O1 and there are N1 pending AddRefs. Now one thread changes the shareable_com_ptr to point to O2, and before any of the previous AddRefs finish, another N2 AddRefs accumulate against O2. If we make no assumption of fairness in thread scheduling, then we can accumulate an unbounded number of old values of the shareable_com_ptr, each with a corresponding group of threads that have to coordinate for the last thread that completes its AddRef to do the Release. I don’t see how you can track that in a constant amount of memory. If we can’t allocate heap memory, it seems like the only solution left is to embed the data structure in the stacks of the threads, like WaitOnAddress and slim reader-writer locks do according to 20191101-00/?p=103046 . I would guess it’s possible, but I won’t try to work out the details.
    
    Read less
- Raymond Chen Author April 25, 2025
  
  My take is that std::atomic<com_ptr> is also a mistake for the same reason. Most usages of COM pointers are part of larger operations, so you already need a lock and also can assess the dangers of calling into T while holding that lock. But I don’t like std::atomic<com_ptr> making that risk assessment for me.