A customer reported that they found a bug in shell32. The customer liaison forwarded the question to the shell team, with the caution that since he is currently on vacation, he hadn’t validated the legitimacy of the report, or its quality or correctness.
The customer was kind enough to reduce the program to its essence, and that made zeroing in on the problem much easier.
#import "shell32.dll" // Code in italics is wrong void demonstrate_problem() { auto hr = CoInitialize(nullptr); Shell32::IDispatchPtr shell_application("Shell.Application"); ... use the shell_application to do stuff ... CoUninitialize(); }
The customer’s analysis of the crash was rather lengthy, but hidden inside is this fragment:
If I remove the call to
CoÂUninitalize
, then the crash disappears, but from what I understand, I’m doing everything by the rules. Once IRelease
all the COM pointers I own, I have the right to callCoÂUninitialize
.
Do you see the problem?
This is another case of paying attention to when your destructors run.
The shell_
application
object is a smart pointer object, so it will release the raw COM pointer at destruction. When does it destruct?
It destructs after the CoÂUninitialize
call.
The customer belived that they had Release
all of their COM pointers at the point they called CoÂUninitialize
, but in fact they hadn’t. There was an unreleased COM pointer inside the shell_
application
object, as well as other objects in the code I elided.
The fix is to ensure that everything is properly released before uninitializing COM. One way is to force the destruction of all relevant objects by introducing a nested scope:
void demonstrate_problem() { auto hr = CoInitialize(nullptr); { // nested scope Shell32::IDispatchPtr shell_application("Shell.Application"); ... use the shell_application to do stuff ... } // force smart objects to destruct CoUninitialize(); }
Another is to put the initialize of COM into its own RAII object, so that it destructs last.
void demonstrate_problem() { CCoInitialize init; Shell32::IDispatchPtr shell_application("Shell.Application"); ... use the shell_application to do stuff ... // CoUninitialize(); }
Hello, customer here.
The crash obviously happens without the premature CoUninitialize. Otherwise I would not have contacted Microsoft.
Code example is at: https://github.com/conioh/shell32_git_crash
Only 32-bit binaries crash (either on native 32-bit OS or under WoW64).
Crash still happens on 18363.592 but fixed on Insider versions (which is why I was told by a Microsoft employee the issue won’t be fixed or even explained to me). The bug might be in combase or a related module rather than in shell32.
I have dates and complete emails in case anyone requires evidence I am indeed said customer, but it’s easier to just build, run and see the crash.
It’s unfortunate to see that Raymond Chen is no better than the so many other Microsoft employees I had to endure dealing with during all those years.
I wasn’t the one assigned to investigate your ticket, but I’ll look at it anyway. Though if it is fixed on Insider builds (which is what I have ready at hand), then there won’t be much for me to debug. Sorry.
I’d appreciate it.
Even if it’s fixed in the Insider version (at least that code doesn’t reproduce the crash anymore) it’s still interesting and useful to understand why a seemingly innocuous call such as Invoke causes a many-minutes-long windows_storage!CNamespaceWalk::s_AsyncWalkThreadProc in a worker thread, why there are unmatched calls to Register and Revoke in the GIT, etc.
More importantly, the question is what to do now. The reply I received on Septmber 27 is: “I don’t get a crash. All the worker threads appear to have exited cleanly. It may be a bug we fixed after build 18362.”
And while true that on 18362 (and later version I’ve tested) there’s no crash, I still have over a billion Windows machine where this does crash. I don’t necessarily expect a backport of the fix, but knowing the details of the issue might help pick a workaround.
I’m sure it’s not that hard to get a non-Insider VM just to check that.
I can set up a non-Insider VM; it’s just that I don’t have one ready to go, so I’d have to go create one. (time passes) I can’t repro on 17134 either. The AsyncWalkThread is to populate the “Send To” submenu on the context menu of the item you are invoking. If it takes minutes, then there’s something weird about your SendTo folder. The cheap way to avoid the problem is simply not to uninitialize the last COM apartment (to avoid the ProcessUninitialize code path). Most apps initialize COM and startup and leave it initialized.
Are you sure he is talking about your report here? Because the articles are usually written more than than a year in advance.
this comment has been deleted.
“If I remove the call to CoÂUninitalize, then the crash disappears”
“The crash obviously happens without the premature CoUninitialize. Otherwise I would not have contacted Microsoft.”
Hmmm….
I think these aren’t the same reports…
Does it crash in that 1-minute sleep, or after?
It crashes during the sleep, in a worker thread executing shcore!WorkThreadManager::CThread::s_ExecuteThreadProc IIRC.
Under real-world circumstances there’s usually no need for the sleep since the process is alive long enough for the crash to happen. Since here the process ends after the call to Invoke I had to keep it alive for the worker thread to fail.
0:009> kL
# ChildEBP RetAddr
00 1c76fd0c 757ba70a combase!wil::details::in1diag3::FailFast_Unexpected
01 (Inline) ——– combase!CGIPTable::OnProcessUninitialize+0xd62b4
02 (Inline) ——– combase!GIPTableProcessUninitialize+0xd62b4
03 1c76fd28 75741407 combase!ProcessUninitialize+0xd62f3
04 1c76fd38 75741262 combase!DecrementProcessInitializeCount+0x50
05 1c76fd58 757417d9 combase!wCoUninitialize+0xac
06 1c76fdd4 76c61cb0 combase!CoUninitialize+0xf9
07 1c76fef4 76c61aca shcore!WorkThreadManager::CThread::ThreadProc+0x1c0
08 1c76ff00 00000000 shcore!WorkThreadManager::CThread::s_ExecuteThreadProc+0xf
It’s easy to debug/trace and see the mismatched calls to combase!CGIPTable::RegisterInterfaceInGlobal and combase!CGIPTable:: RevokeInterfaceFromGlobal but is seems at least some Microsoft personnel find it easier to post nonsense blog posts than actually debugging the Windows components they’re supposed to be responsible for.