The hidden callout: The destructor

Raymond Chen

This is a general problem, but I’m going to give a specific example.

C++ standard library collections do not support concurrent mutation. It is the caller’s responsibility to serialize mutation operations, and to ensure that no mutation occurs at the same time as any other operation. And the usual way of accomplishing this is to use a mutex of some kind.

class ThingManager
{
private:
  std::mutex things_lock_;
  std::vector<std::shared_ptr<Thing>> things_;

public:
  void AddThing(std::shared_ptr<Thing> thing)
  {
    std::lock_guard guard(things_lock_);
    things_.push_back(std::move(thing));
  }

  void RemoveThingById(int32_t id)
  {
    std::lock_guard guard(things_lock_);
    auto it = std::find_if(things_.begin(), things_.end(),
      [&](auto&& thing)
      {
        return thing->id() == id;
      });
    if (it != things_.end()) {
      things_.erase(it);
    }
  }
};

The idea here is that you give the ThingManager a bunch of things, and then you can later remove them by providing the Thing‘s ID. Presumably there are also methods to search for Things or to perform some operation across all Things, but those are just distractions from the exercise.

This particular object wants to support concurrent operation, so it internally uses a mutex to ensure safe operation.

Now, you can quibble about the use of find_if instead of remove_if, or using a std::vector instead of a std::map, but let’s set that aside.

The question is: What’s wrong with this code?

I sort of gave it away in the title: We are calling out to external code while holding our lock.

You probably know not to call out to external code when holding an internal lock, and the act of invoking a method on an object may remind you of that fact. But destructors just run by themselves. You don’t typically write code the trigger the destruction of an object. The object just destructs when it destructs.

Removing the shared_ptr<Thing> from our vector could result in the Thing‘s destruction if this was the last strong reference to the Thing. And that destructor runs while the things_lock_ is still locked.

Now things get interesting.

You may not know all that happens inside the Thing destructor. It may have been written by another team, or by you, six months ago. Or somebody could have derived from Thing and given you a shared pointer to the derived object.¹ Or you might be given a shared pointer to a Thing embedded inside a larger object.

Let’s do that thing with the derived type:

class SuperThing : Thing
{
private:
  ThingManager& manager_;
  int32_t helper_id_ = 0;

public:
  SuperThing(ThingManager& manager) :
    manager_(manager)
  {
    auto helper = std::make_shared<Thing>();
    helper_id_ = helper->id();
    manager_.AddThing(helper);
  }

  ~SuperThing()
  {
    manager_.RemoveThingById(helper_id_);
  }
};

The SuperThing object is itself a Thing, but it also uses a helper thing. At construction, it creates a helper thing and registers it with the manager, retaining the ID. And at destruction, it removes its helper thing.

And then this happens:

void test(ThingManager& manager)
{
  auto s = std::make_shared<SuperThing>();
  auto id = s->id();
  manager.AddThing(s);
  s = nullptr;

  manager.RemoveThingById(id);
}

This little test function creates a SuperThing, adds it to the thing manager, and then immediately removes it.

The RemoveThingById function looks for a matching Id and finds it, so it erases the corresponding Thing from the vector. That erasure destroys the shared_ptr, and since this is the last strong reference, the underlying Thing is also destroyed.

This runs the destructor of our SuperThing, which tries to remove its helper Thing. And that calls back into the ThingManager, which gets stuck trying to acquire a mutex that is already held (unwittingly, by itself).

This is not a purely theoretical exercise. This sort of thing happens, and it’s a source of bugs.

Next time, we’ll look at how to address these types of problems.

¹ If you work in Windows, a common scenario for this is that the shared_ptr in the example above takes the form of a COM reference-counted pointer.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

6 comments

Discussion is closed. Login to edit/delete existing comments.

Neil Rashbrook November 14, 2020

The mutex is irrelevant here; I’ve seen an example where object A releases a reference to object B, which causes code to run which tries to mutate object A, which is in an inconsistent state because it’s in the middle of releasing its reference to object B. It’s particularly fun when it tries to release its reference to object B again…

The solution is to transfer the reference to a temporary, so that by the time the reference is released, object A is in a consistent state again.

Nathan Williams November 11, 2020

This is probably only a solution for the specific example and not the general problem, but one thing you could do is this:

void RemoveThingById(int32_t id)
{
  std::shared_ptr removed_thing;
  {
    std::lock_guard guard(things_lock_);
    auto it = std::find_if(things_.begin(), things_.end(), ...);
    if (it != things_.end()) {
      removed_thing = *it;
      things_.erase(it);
    }
  }
}

Making a local copy of the shared_ptr before erasing it postpones the destructor call.

Joshua Hudson November 11, 2020

I can has recursive mutex?

This is a perfectly serious comment. I’m accustomed to recursive mutexes being the natural solution to this.
- Gwen Roelants November 12, 2020
  
  Recursive mutexes actually have similar problems. While holding the lock you typically also assume some pre- and post-conditions, but if in the middle of such a function you call another function that requires the lock those conditions may not be met.
  
  You might also have an iterator for example, that is invalidated when the function you call also modifies the same vector.
  
  Finally when you have more than one mutex in your program recursive mutexes make it easier to get caught in a situation where you did not expect to hold both, the order in which they are acquired is wrong and...
  Read more
  Recursive mutexes actually have similar problems. While holding the lock you typically also assume some pre- and post-conditions, but if in the middle of such a function you call another function that requires the lock those conditions may not be met.
  
  You might also have an iterator for example, that is invalidated when the function you call also modifies the same vector.
  
  Finally when you have more than one mutex in your program recursive mutexes make it easier to get caught in a situation where you did not expect to hold both, the order in which they are acquired is wrong and you get a deadlock in another thread trying to enter the mutex while already holding another one.
  
  Read less
Adam Rosenfield November 11, 2020

Yup, I’ve run into this very problem many times of “Thing tries to remove itself from the list of Things managed by the ThingManager in its destructor”. IIRC I don’t think I’ve hit this specific issue of deadlock when trying to lock a mutex, but I’ve definitely run into other sorts of reentrancy problems. Care is very much needed here.

(I won’t spoil the answer and steal Raymond’s thunder from tomorrow.)
R. Samuel Klatchko November 11, 2020
Minor issue, in
```
test()
```
I believed you meant to write
```
auto s = std::make_shared(manager);
```