What does it mean when my attempt to stop a Windows NT service fails with ERROR_BROKEN_PIPE?

Raymond Chen

Raymond

A customer reported that they had a sporadic problem: Their product includes a Windows NT service, and when their client program tries to stop the service, it sometimes fails with ERROR_BROKEN_PIPE. Their client program is written in C#, so it uses the Service­Controller.Stop method to stop the service, and the failure is reported in the form of an exception. In Win32, this turns into a call to the Control­Service function with the SERVICE_CONTROL_STOP code.

Under what conditions would an attempt to stop a service result in the error ERROR_BROKEN_PIPE?

One of the developer support escalation engineers used psychic powers:

Does your service terminate itself before the call to its Handler­Ex routine returns from the SERVICE_CONTROL_STOP request, or before the call to Start­Service­Ctrl­Dispatcher returns?

I’m guessing that the ERROR_BROKEN_PIPE arises because the service process terminated itself while the Service Control Manager was still talking to it, waiting for the service to report that it finished processing the SERVICE_CONTROL_STOP request. The error is ERROR_BROKEN_PIPE because the process on the other end of the pipe (the service) died.

The customer agreed that this was a possibility: When the service receives the SERVICE_CONTROL_STOP request, it signals a helper thread to clean up, and that helper thread may finish its cleanup and terminate the service process before the main thread can report a successful stop to the Service Control Manager.

A short time later, the customer reported back and confirmed that when they forced the race condition to occur, they indeed got the ERROR_BROKEN_PIPE error code.

I like this example of psychic debugging because it demonstrates how you can take something you know (ERROR_BROKEN_PIPE means that two processes were talking to each other over a pipe, and one side suddenly terminated), and think about how it could apply to something you don’t know (surmising that the Service Control Manager uses a pipe to talk to the service).

Raymond Chen
Raymond Chen

Follow Raymond   

5 Comments
Avatar
Henry Skoglund 2019-04-05 08:46:56
Hi, been a while since you wrote about psychic debugging! I just want to thank you for introducing that method of debugging, it's been useful for me too. It's kind of taking a holistic view of the whole problem, while also paying attention to all the small details that surrounds it. One additional method I resort to: "Whatever remains, however improbable, is the solution.". That quote from Sherlock Holmes has also served me well during the years.
Avatar
Keith Patrick 2019-04-05 11:17:30
I've gotten this specific error before, so when I see something like this, it's usually more of a matter of remembering how i fixed it in the first place. Unfortunately, I never got around to putting together a personal knowledge base for these errors, so I have to count on my unreliable memory, but in this case, I remembered the error message immediately. I tend to use psychic (or intuitive) debugging more for multithreading, looping errors, bad IDisposable usages, and stack overflows (the latter of which sticks out like a sore thumb....nothing kills the entire debug session like SOE). But all those errors tend to have certain behaviors, which, if you put them together with a recent code change, usually solves the mystery.
Avatar
Ian Boyd 2019-04-08 09:25:04
It is also extraordinarily satisfying when you have an actual solution the explains the problem. Computers aren't magical: it's doing something exactly rational and explainable. What isn't fun if when there diagnostic steps are: have you tried running a virus scan? Have you tried sfc /scannow? Have you tried rebooting? Have you tried deleting your user profile and creating a new one? Have you tried reinstalling Windows? Aside from the last two (which I simply will not do), the hope of them is to make the problem unreproducible - you don't know the problem, so you didn't really fix it. It's like randomly replacing parts on your car, or your 737 max, and hope the problem goes away. 
Avatar
Joshua Hudson 2019-04-10 20:25:44
This actually is the tip of a really bad design in the service manager. StopService(...); CreateFile(serviceprocessbinary, ... CREATE_ALWAYS ...); // Error file in use. The correct fix would be to call TerminateProcess(GetCurrentProcess(), 0) on getting SERVICE_STOP but the service manager reports ERROR_BROKEN_PIPE rather than success. I could handle it, but services.msc doesn't. Hint: if you get ERROR_BROKEN_PIPE that service isn't running anymore.