If you call a wait function like WaitÂForÂSingleÂObject and receive the code WAIT_, what does it mean and what should you do?
The documentation says that WAIT_ means that you successfully claimed a mutex, but the thread that previously owned the mutex failed to release the mutex before it exited. This could be an oversight because the code encountered a code path that forgot to release the mutex. Or it could be because the thread crashed before it could release the mutex.
The documentation also suggests that “If the mutex was protecting persistent state information, you should check it for consistency.” This is to handle the second case: The thread crashes before it can release the mutex. If the purpose of the mutex was to prevent other threads from accessing the data while it is in an inconsistent state, then the fact that the thread crashed while holding the mutex means that the data might still be in that inconsistent state.
Now, maybe you have no way to check whether the data is in an inconsistent state or have no way to repair it if such an inconsistent state is discovered. (Most people don’t bother to design their data structures with rollback or transactions, because the point of the mutex was to avoid having to write that fancy code in the first place!) In that case, you really have only two choices.
One option is to just cover your ears and pretend you didn’t hear anything. Just continue operating normally and hope that any latent corruption is not going to cause major problems.
Another option is to give up and abandon the operation. However, if that’s your choice, you have to give up properly.
The abandoned state is not sticky; is reported only to the first person to wait for the mutex after it was abandoned. Subsequent waits succeed normally. Therefore, if you decide, “Oh it’s corrupted, I’m not touching it,” and release the mutex and walk away, then the next person to wait for the mutex will receive a normal successful wait, and they will dive in, unaware that the data structures are corrupted!
One solution is to add a flag inside your data that says “Possibly corrupted.” The code that detects the WAIT_ can set that flag, and everybody who acquires the mutex can check the flag to decide if they want to take a chance by operating on corrupted data.
I’m not saying that you have to do it that way, but it’s a choice you’re making. In for a penny, in for a pound.
In summary, here are some options when you encounter an abandoned mutex:
- Try to fix the problem.
- Ignore the problem.
- Give up and create a warning to others.
- Give up and make everybody else think that everything is fine.
The final choice doesn’t make sense, because if you’re going to make everybody else think that everything is fine, then that’s the same as having everybody else simply ignore the problem. In which case, you may as well ignore the problem too!
Related reading: Understanding the consequences of WAIT_ABANDONED.
Bonus chatter: Don’t forget that if you get WAIT_, the mutex is owned by you, so make sure to release it.
Choice 5: ExitProcess()
That’s what I did for the abandoned mutex case on an anonymous wait handle created by my own process and shouldn’t be visible to other processes.