September 2nd, 2022

The case of the recursively-acquired non-recursive lock, and how to avoid the unintentional reentrancy

A customer encountered a deadlock due to unexpected reentrancy, and they were looking for guidance in fixing it.

Here’s the code in question:

struct WidgetTracker : IWidgetChangeNotificationSink
{
    /* other stuff not relevant here */

    /// IWidgetChangeNotificationSink
    STDMETHODIMP OnCurrentWidgetChanged();

private:
    WRL::ComPtr<IWidget> m_currentWidget;
    std::mutex m_mutex;
};

HRESULT WidgetTracker::OnCurrentWidgetChanged()
{
    auto guard = std::lock_guard(m_mutex);
    RETURN_IF_FAILED(GetCurrentWidget(&m_currentWidget));
    return S_OK;
}

The idea here is that the WidgetTracker listens for notifications that the current widget has changed, and when it receives that notification, it updates its local cache to hold the new current widget.

The hang occurred with this stack:

ntdll!ZwWaitForAlertByThreadId
ntdll!RtlAcquireSRWLockExclusive
contoso!WidgetTracker::OnCurrentWidgetChanged
rpcrt4!Invoke
rpcrt4!Ndr64StubWorker
rpcrt4!NdrStubCall3
combase!CStdStubBuffer_Invoke
combase!InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_...>::operator()
combase!ObjectMethodExceptionHandlingAction<<lambda_...> >
combase!InvokeStubWithExceptionPolicyAndTracing
combase!DefaultStubInvoke
combase!SyncServerCall::StubInvoke
combase!StubInvoke
combase!ServerCall::ContextInvoke
combase!DefaultInvokeInApartment
combase!ReentrantSTAInvokeInApartment
combase!ComInvokeWithLockAndIPID
combase!ThreadDispatch
combase!ThreadWndProc
user32!UserCallWinProcCheckWow
user32!DispatchMessageWorker
combase!CCliModalLoop::MyDispatchMessage
combase!CCliModalLoop::PeekRPCAndDDEMessage
combase!CCliModalLoop::BlockFn
combase!ModalLoop
combase!ThreadSendReceive
combase!CSyncClientCall::SwitchAptAndDispatchCall
combase!CSyncClientCall::SendReceive2
combase!SyncClientCallRetryContext::SendReceiveWithRetry
combase!CSyncClientCall::SendReceiveInRetryContext
combase!ClassicSTAThreadSendReceive
combase!CSyncClientCall::SendReceive
combase!CClientChannel::SendReceive
combase!NdrExtpProxySendReceive
rpcrt4!Ndr64pSendReceive
rpcrt4!NdrpClientCall3
combase!ObjectStublessClient
combase!ObjectStubless
litware!Widget::~Widget
litware!Widget::`scalar deleting destructor'
litware!Widget::Release
contoso!Microsoft::WRL::ComPtr<IWidget>::InternalRelease
contoso!Microsoft::WRL::ComPtr<IWidget>::ReleaseAndGetAddressOf
contoso!Microsoft::WRL::Details::ComPtrRef<...>::operator struct IWidget **
contoso!WidgetTracker::OnCurrentWidgetChanged
rpcrt4!Invoke
rpcrt4!Ndr64StubWorker
rpcrt4!NdrStubCall3
combase!CStdStubBuffer_Invoke
combase!InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_...>::operator()
combase!ObjectMethodExceptionHandlingAction<<lambda_...> >
combase!InvokeStubWithExceptionPolicyAndTracing
...

Reading from the bottom up, what happened is that the current widget changed, and the Widget­Tracker received the change notification. The Widget­Tracker locks the mutex, and then wants to get the new current Widget, but first it releases the old Widget.

It is that release of the old widget that causes trouble, because it makes a cross-process call, and while waiting for the cross-process call to complete, the current widget changes again, and the On­Current­Widget­Changed method gets called again. (It is evident that this code is running on a single-threaded apartment. If it were running in a multi-threaded apartment, the second call would have arrived on a different thread.)

The problem is that we are releasing our reference to the old widget while holding a lock, and that creates the opportunity for mayhem, since we don’t control what the widget will do when it is released. And if this is the final release of the widget, it will probably do a lot of work.

This is another case of the hidden callout: The destructor.

And the solution is the same: Destruct the reference to the old widget outside the lock.

HRESULT WidgetTracker::OnCurrentWidgetChanged()
{
    WRL::ComPtr<IWidget> widget;
    auto guard = std::lock_guard(m_mutex);
    RETURN_IF_FAILED(GetCurrentWidget(&widget));
    m_currentWidget.Swap(widget);
    return S_OK;
}

We declare a ComPtr<IWidget> before taking the lock, so that it destructs after the lock is released. (Remember that in C++, local variables are destructed in reverse order of construction.) After we get the current widget into the local widget, we swap it with the old one, and then return.

The lock guard destructs first, which exits the lock. and then the ComPtr<IWidget> destructs, which releases the old widget. This release occurs outside the lock, so any re-entrancy is not going to create a deadlock.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Discussion is closed. Login to edit/delete existing comments.

  • Swap Swap

    The OnCurrentWidgetChanged declaration should be STDMETHOD, not STDMETHODIMP.

  • Henke37

    If the variable order matters then I’d stick a comment pointing it out, this seems like a trap for developers in a hurry.

    • Tom Mason

      IMO it would be more appropriate to create a scope explicitly. Something like

      HRESULT WidgetTracker::OnCurrentWidgetChanged()
      {
          WRL::ComPtr widget;
          {
              auto guard = std::lock_guard(m_mutex);
              RETURN_IF_FAILED(GetCurrentWidget(&widget));
              m_currentWidget.Swap(widget);
          }
          return S_OK;
      }
    • sugrob 9000

      Indeed, moreover a lock guard that isn't at the top of its scope immediately sticks out as a "bug" that the next guy may be tempted to "fix", since it is the usual assumption that you want to hold the lock for the entire scope.

      So, while the blog post is discussing the technical aspect of this solution, in practice there probably should also be a comment. (That sounds like an entirely appropriate thing to point...

      Read more
      • AlanW · Edited

        this comment has been deleted.

      • Ivan K

        More cookies plz

    • Greg Knox

      > If the variable order matters then I’d stick a comment pointing it out,

      Henke37, you may have missed the rest of this blog post that comments on that very same topic. I would have assumed the code samples are kept succinct to demonstrate the issue encountered and how to solve them.