What does it mean when my attempt to stop a Windows NT service fails with ERROR_BROKEN_PIPE?
A customer reported that they had a sporadic problem: Their product includes a Windows NT service, and when their client program tries to stop the service, it sometimes fails with ERROR_
BROKEN_
PIPE
. Their client program is written in C#, so it uses the ServiceController.
Stop
method to stop the service, and the failure is reported in the form of an exception. In Win32, this turns into a call to the ControlService
function with the SERVICE_
CONTROL_
STOP
code.
Under what conditions would an attempt to stop a service result in the error ERROR_
BROKEN_
PIPE
?
One of the developer support escalation engineers used psychic powers:
Does your service terminate itself before the call to its
HandlerEx
routine returns from theSERVICE_
CONTROL_
STOP
request, or before the call toStartServiceCtrlDispatcher
returns?I’m guessing that the
ERROR_
BROKEN_
PIPE
arises because the service process terminated itself while the Service Control Manager was still talking to it, waiting for the service to report that it finished processing theSERVICE_
CONTROL_
STOP
request. The error isERROR_
BROKEN_
PIPE
because the process on the other end of the pipe (the service) died.
The customer agreed that this was a possibility: When the service receives the SERVICE_
CONTROL_
STOP
request, it signals a helper thread to clean up, and that helper thread may finish its cleanup and terminate the service process before the main thread can report a successful stop to the Service Control Manager.
A short time later, the customer reported back and confirmed that when they forced the race condition to occur, they indeed got the ERROR_
BROKEN_
PIPE
error code.
I like this example of psychic debugging because it demonstrates how you can take something you know (ERROR_
BROKEN_
PIPE
means that two processes were talking to each other over a pipe, and one side suddenly terminated), and think about how it could apply to something you don’t know (surmising that the Service Control Manager uses a pipe to talk to the service).
5 comments
Hi, been a while since you wrote about psychic debugging! I just want to thank you for introducing that method of debugging, it’s been useful for me too. It’s kind of taking a holistic view of the whole problem, while also paying attention to all the small details that surrounds it.
One additional method I resort to: “Whatever remains, however improbable, is the solution.”. That quote from Sherlock Holmes has also served me well during the years.
I’ve gotten this specific error before, so when I see something like this, it’s usually more of a matter of remembering how i fixed it in the first place. Unfortunately, I never got around to putting together a personal knowledge base for these errors, so I have to count on my unreliable memory, but in this case, I remembered the error message immediately.
I tend to use psychic (or intuitive) debugging more for multithreading, looping errors, bad IDisposable usages, and stack overflows (the latter of which sticks out like a sore thumb….nothing kills the entire debug session like SOE). But all those errors tend to have certain behaviors, which, if you put them together with a recent code change, usually solves the mystery.
It is also extraordinarily satisfying when you have an actual solution the explains the problem. Computers aren’t magical: it’s doing something exactly rational and explainable.
What isn’t fun if when there diagnostic steps are: have you tried running a virus scan? Have you tried sfc /scannow? Have you tried rebooting? Have you tried deleting your user profile and creating a new one? Have you tried reinstalling Windows?
Aside from the last two (which I simply will not do), the hope of them is to make the problem unreproducible – you don’t know the problem, so you didn’t really fix it.
It’s like randomly replacing parts on your car, or your 737 max, and hope the problem goes away.
Deleting your user profile is unlikely to help (and painful to reverse), but creating a new, clean user profile is a cheap and easy diagnostic step if you’ve exhausted other options.
I once ran into a problem with Visual Studio that I couldn’t explain. (I forget the details, but they’re not important.) Uninstalling and reinstalling VS didn’t resolve it. But when I created a new user profile and logged in as that user, the problem went away.
After dumping both users’ registry hives and painstakingly comparing them, I finally narrowed down the problem: a buggy VS extension.
I deleted the registry keys related to that extension under my original user profile, and everything was well again.
This actually is the tip of a really bad design in the service manager.
StopService(…);
CreateFile(serviceprocessbinary, … CREATE_ALWAYS …); // Error file in use.
The correct fix would be to call TerminateProcess(GetCurrentProcess(), 0) on getting SERVICE_STOP but the service manager reports
ERROR_
BROKEN_
PIPE
rather than success. I could handle it, but services.msc doesn’t. Hint: if you getERROR_
BROKEN_
PIPE
that service isn’t running anymore.