The COM_INTERFACE_ENTRY must be a COM interface, but nobody actually checks

Raymond Chen

A customer had some code written with the Active Template Library, more commonly known as ATL. Apparently, ATL is still a thing!

Anyway, their problem was that their component that had been working just fine for many years started crashing. Their object went something like this:

class ATL_NO_VTABLE CAwesomeWidget :
    public CComObjectRootEx<CComMultiThreadModel>,
    public CComCoClass<CAwesomeWidget, &CLSID_AwesomeWidget>,
    public IWidgetProviderInfo,
    public IWidget
    /// ... other interfaces ...

      // ... other interfaces ...

    // IWidgetProviderInfo
    IFACEMETHODIMP GetProviderId(GUID *pguidId);

    // IWidget

Yes, I’m making you look at ancient ATL code, with all its wacky macros. Product maintenance is like that. Deal with it.

Clients can do

void Sample()
  CComPtr<IWidget> widget;

to create the CAwesome­Widget as a widget, and then ask the widget to do something.

The CAwesome­Widget did its job very well for over five years, and then suddenly it started crashing in Get­Provider­Id, even though nobody called Get­Provider­Id.

They traced it back to this call:

void Sample2()
  CComPtr<IUnknown> unk;

  CComQIPtr<IWidget> widget(unk); // ← here

Obviously, this wasn’t literally what their code did, but it boils the problem down to its essence.

The idea was that instead of creating the object for its IWidget interface, the object was created with the generic IUnknown interface, and then it was converted to an IWidget. The reason for this extra step isn’t important, but you can come up with scenarios where this might happen. (For example, maybe the Co­Create­Instance is coming from a generic “creation helper” function.)

The problem was in the definition of the IWidget­Provider­Info:

// widget.h
interface IWidgetProviderInfo
    STDMETHOD(GetProviderId)(GUID *pguidId) PURE;

Do you see something?

Actually, more honestly, I should be asking, “Do you not see something?”

The IWidget­Provider­Info interface does not derive from IUnknown!

Once you realize this, everything unravels.

The COM_INTERFACE_ENTRY macro assumes that the thing it’s given is indeed a COM interface. This assumption is made in two places:

  1. A request for IUnknown returns the first interface in the list.
  2. Any successful request will be accompanied by a call to IUnknown::Add­Ref, per COM rules.

They listed IWidget­Provider­Info as the first “interface”, so it was returned as in response to a request for IUnknown, even though it wasn’t IUnknown.

The attempt to convert the IUnknown to an IWidget involves a call to IUnknown::Query­Interface, but remember, that thing which the code thinks is an IUnknown is really an IWidget­Provider­Info. The call to IUnknown::Query­Interface actually called IWidget­Provider­Info::Get­ProviderId. With the wrong number and types of arguments, of course. The crash occurred when the Get­Provider­Id method tried to write to what it thought was a pguidId, but which was actually a riid from the Query­Interface.

The customer would have noticed that something was wrong with IWidget­Provider­Info had they ever tried to use it!

CComPtr<IWidgetProviderInfo> info;
// ^^ error: class "IWidgetProviderInfo" has no "Release" member

Merely talking about IWidget­Provider­Info causes the compiler to get upset because the CComPtr destructor needs to call the Release method, which doesn’t exist. The customer never noticed this because they never used the IWidget­Provider­Info interface at all! It was presumably added in anticipation of a feature that never materialized.

Okay, so now we understand why this crashes and how it eluded compile-time detection, but how did it ever work? After all, the implementation of Create­Instance needs to do a Query­Interface to return the proper pointer back to the caller.

Here’s a simplified version of how ATL implements Create­Instance:

template <class T1>
class CComCreator
  static HRESULT WINAPI CreateInstance(void* pv, REFIID riid, LPVOID* ppv)
    T1* p = NULL;
    ATLTRY(p = new T1(pv))
    if (p != NULL)
      hRes = p->_AtlInitialConstruct();
      if (SUCCEEDED(hRes))
        hRes = p->FinalConstruct();
      if (hRes == S_OK)
        hRes = p->QueryInterface(riid, ppv);
      if (hRes != S_OK)
        delete p;
    return hRes;

Why doesn’t the call to p->Query­Interface crash?

Because it’s being called from a pointer to a T1, not from a pointer to an IUnknown. The compiler therefore knows that the Query­Interface method is in fact not at slot 0 in vtable 0, but rather is at slot 0 in vtable 1 (taking it from IWidget).

So how about fixing the problem?

One fix is to delete the unused IWidget­Provider­Info interface. However, when working with legacy code, you may be averse to taking such a drastic step, because there might be somebody who is actually using that interface in a way you failed to detect. Or maybe you want to keep the interface around because you really do plan on using it soon. In that case, you can make the interface derive from IUnknown, like it should have in the first place:

interface IWidgetProviderInfo : IUnknown
    STDMETHOD(GetProviderId)(GUID *pguidId) PURE;

Customer problem solved.

Next time, we’ll look at how to catch this problem at compile time.


Discussion is closed. Login to edit/delete existing comments.

  • Neil Rashbrook 0

    Just use your time machine to change the second parameter of IUnknown::Query­Interface from void** to IUnknown**.

  • Raymond ChenMicrosoft employee 0

    That doesn’t really help. You’d still need to cast it, because there is no standard conversion from T** to IUnknown**, not even if T derives from IUnknown. (Unless T = IUnknown, but that’s not an interesting case.)

Feedback usabilla icon