The case of the crash when destructing a std::map

A customer reported that they were getting crashes while destructing a std::map.

Here’s the point of the crash:

eax=245bd25c ebx=00004d33 ecx=31feece4 edx=00c40000 esi=00000000 edi=31feece4
eip=563397db esp=245bd250 ebp=245bd268 iopl=0         nv up ei ng nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010286
contoso!std::_Tree<⟦...⟧>::~_Tree+0x2b:
563397db cmp     byte ptr [esi+0Dh],0                       ds:002b:0000000d=??

0:054> k9
ChildEBP RetAddr      
(Inline) --------     contoso!std::_Tree_val<⟦...⟧>::_Erase_tree [xtree @ 1073]
(Inline) --------     contoso!std::_Tree_val<⟦...⟧>::_Erase_head+0x5 [xtree @ 1073]
245bd268 56338313     contoso!std::_Tree<⟦...⟧>::~_Tree+0x2b [xtree @ 1073]
245bd290 56362695     contoso!Contoso::IdCollection::~IdCollection+0x163
245bd2b4 564c8510     contoso!Contoso::LogEntry::~LogEntry+0xd5
245bd2e0 5691a999     contoso!Contoso::LogEntries::Cleanup+0x90
(Inline) --------     contoso!Contoso::Log::Flush+0x8
245be134 56916406     contoso!Contoso::TaskRunner::RunTasks+0x3949

Okay, so we are trying to destruct a _Tree, which is the internal class that acts as the basis for map, multimap, set, and multiset. In this case, we are destructing a std::map.

I wasted invested a good amount of time reading the STL source code in order to figure out the internal structure of std::map. You can read the linked article for the details, but the short version is that the map consists of a sentinel node where

sentinel.left = first node in the tree
sentinel.right = last node in the tree
sentinel.parent = root node of the tree

For the root node and other nodes of the tree, the left, right, and parent have their normal meanings.

If there is no root node, left subtree, or right subtree, then the corresponding member contains a pointer to the sentinel node. (The use of a sentinel node is a standard computer science trick which removes the need to add null pointer checks everywhere.)

We can read the STL code to see how tree destruction occurs.

We start with the _Tree destructor:

    ~_Tree() noexcept {
        const auto _Scary = _Get_scary();
        _Scary->_Erase_head(_Getal());
    }

The Scary stuff is just to scare you. It’s just getting a value from the tree.

    _Scary_val* _Get_scary() noexcept {
        return _STD addressof(_Mypair._Myval2._Myval2);
    }

Meanwhile, _Erase_head goes like this:

    template <class _Alnode>
    void _Erase_head(_Alnode& _Al) noexcept {
        this->_Orphan_all();
        _Erase_tree(_Al, _Myhead->_Parent);
        _Alnode::value_type::_Freenode0(_Al, _Myhead);
    }

This just erases the tree and then frees the sentinel node. So the real excitement is in _Erase_tree:

    template <class _Alnode>
    void _Erase_tree(_Alnode& _Al, _Nodeptr _Rootnode) noexcept {
        while (!_Rootnode->_Isnil) { // free subtrees, then node
            _Erase_tree(_Al, _Rootnode->_Right);
            _Alnode::value_type::_Freenode(_Al,
                _STD exchange(_Rootnode, _Rootnode->_Left));
        }
    }

Erasing the tree consists of recursively erasing the right node, freeing the node, and tail recursing on the left node.

Now, the crashing instruction is cmp byte ptr [esi+0Dh],0, which is obviously the _Rootnode->_Isnil. “Obviously” because it is the only place we check a byte. All other operations are on pointers.

But let’s look at the crash in the context of the whole function just to make sure, and see what else we can learn.

// Function prologue nonsense

contoso!std::_Tree<⟦...⟧>::~_Tree
563397b0 push    ebp
563397b1 mov     ebp,esp
563397b3 push    0FFFFFFFFh
563397b5 push    offset contoso!___guard_check_icall_thunk+0x25d0 (56d0e690)
563397ba mov     eax,dword ptr fs:[00000000h]
563397c0 push    eax
563397c1 push    esi
563397c2 push    edi
563397c3 mov     eax,dword ptr [contoso!__security_cookie (5427c040)]
563397c8 xor     eax,ebp
563397ca push    eax
563397cb lea     eax,[ebp-0Ch]
563397ce mov     dword ptr fs:[00000000h],eax

// inlined _Erase_head

563397d4 mov     edi,ecx                ; edi = this
563397d6 mov     esi,dword ptr [edi]    ; esi = _Myhead
563397d8 mov     esi,dword ptr [esi+4]  ; esi = _Myhead.Parent

// inlined _Erase_tree (esi = _Rootnode)

563397db cmp     byte ptr [esi+0Dh],0   ; while (!_Rootnode->_Isnil) ← CRASHED HERE
563397df jne     contoso!std::_Tree<⟦...⟧>::~_Tree+0x51 (56339801) ; break out of loop

loop:
// Recursive call to _Erase_tree
563397e1 push    dword ptr [esi+8]      ; _Rootnode->_Right
563397e4 mov     ecx,edi                ; outbound this = inbound this
563397e6 push    edi                    ; allocator
563397e7 call    contoso!std::_Tree_val<⟦...⟧>::_Erase_tree (5633b450) ; erase the subtree

563397ec mov     eax,esi                ; delete the old _Rootnode
563397ee mov     esi,dword ptr [esi]    ; fetch _Rootnode->_Left for tail recursion
563397f0 push    18h
563397f2 push    eax
563397f3 call    contoso!operator delete (56d08334) ; free the old _Rootnode
563397f8 add     esp,8

563397fb cmp     byte ptr [esi+0Dh],0   ; while (!_Rootnode->_Isnil)
563397ff je      contoso!std::_Tree<⟦...⟧>::~_Tree+0x31 (563397e1)

// end of _Erase_head, now free the sentinel node

56339801 push    18h
56339803 push    dword ptr [edi]
56339805 call    contoso!operator delete (56d08334)
5633980a add     esp,8

5633980d mov     ecx,dword ptr [ebp-0Ch]
56339810 mov     dword ptr fs:[0],ecx
56339817 pop     ecx
56339818 pop     edi
56339819 pop     esi
5633981a mov     esp,ebp
5633981c pop     ebp
5633981d ret

Okay, so we were right that the crash was on the test of _Rootnode->_Isnil, but we also learned that this is the test that occurs before entering the loop body for the first time. (The tests that occur on subsequent iterations come later in the function.)

This is great, because it tells us that no changes to the tree have been made yet. We aren’t looking at a tree in a temporarily invalid state because the destructor is messing with it. Instead, the tree is still its originally-corrupted state.

The crash is on a null _Rootnode, and that came from _Myhead._Parent, so our tree must have a null _Myhead._Parent, which is not allowed. (An empty tree has a _Myhead._Parent that points back to the sentinel node.)

Let’s see what we have in the tree.

0:054> ?? this->_Mypair._Myval2._Myval2
class std::_Tree_val<⟦...⟧>
   +0x000 _Myhead          : 0x1ca4f280 std::_Tree_node<⟦...⟧>
   +0x004 _Mysize          : 0

Okay, so this tree is empty (_Mysize is zero).

0:054> ?? this->_Mypair._Myval2._Myval2._Myhead
struct std::_Tree_node<⟦...⟧> * 0x1ca4f280
   +0x000 _Left            : 0xc00000b0 std::_Tree_node<⟦...⟧>
   +0x004 _Parent          : (null) 
   +0x008 _Right           : 0x1ca4f280 std::_Tree_node<⟦...⟧>
   +0x00c _Color           : 1 ''
   +0x00d _Isnil           : 1 ''
   +0x010 _Myval           : std::pair<int const, enum ChannelType>

As expected, this is the sentinel node, (_Isnil is 1). What’s not expected is that the _Parent is null, and the _Left is corrupted. The _Right is okay: It points back to the sentinel node.

That corrupted value for _Left looks really suspicious: It is of the form 0xc000nnnn, which is the range used by NTSTATUS codes. And if we dump the node as bytes, we can see that the corruption is restricted to just those first two dwords.

0:054> dc 1ca4f280 L10
1ca4f280  c00000b0 00000000 1ca4f280 00000101  ................
          ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
         corrupted corrupted    okay     okay

What is the NTSTATUS code that got written?

C:\>certutil /error 0xc00000b0
0xc00000b0 (NT: 0xc00000b0 STATUS_PIPE_DISCONNECTED) -- 3221225648 (-1073741648)
Error message text: The specified named pipe is in the disconnected state.
CertUtil: -error command completed successfully.

To me, this looks like what happens when an overlapped I/O completes. The first two fields of the OVERLAPPED structure are updated by the kernel at the completion of the I/O, and the two things it writes are the status code and the number of bytes transferred (which is unsurprisingly zero seeing as an error occurred).

My theory was that this program at some point issued an overlapped I/O and freed the OVERLAPPED structure associated with the I/O before the I/O completed. That memory then got reused to hold the std::map sentinel node, and then the I/O completed, and the kernel wrote the I/O result into what it thought was the OVERLAPPED structure (but is now the std::map sentinel node), thereby corrupting the sentinel node.

The customer said, “We don’t use overlapped I/O, but maybe one of the libraries we use does.”

They provided their source code in the form of a massive 5 gigabyte ZIP file. Thankfully, they also gave me access to their online repo, so I could use the search functionality in their repo hosting provider.

I searched their code for OVERLAPPED and found a few references. A lot of them were just the word “overlapped” being used in a comment, but it wasn’t long before I found an actual OVERLAPPED structure, and here it is.

void Channel::ReadData(⟦...⟧)
{
    ⟦...⟧

    OVERLAPPED o{};
    o.hEvent = m_readCompleteEvent;

    if (ReadFile(m_file, m_buffer, m_bufferSize, &actual, &o)) {
        // completed synchronously
        ⟦...⟧
    } else if (GetLastError() != ERROR_IO_PENDING) {
        ⟦ handle various error conditions ⟧
    } else {
        // Wait for I/O to complete.
        switch (WaitForSingleObject(o.hEvent, IO_TIMEOUT)) {
        case WAIT_OBJECT_0:
            ⟦... process the results ...⟧
            break;

        case WAIT_ABANDONED:
            ⟦... deal with the error ...⟧
            break;

        case WAIT_TIMEOUT:
            break;

        default:
            ⟦... unexpected error ...⟧
            break;
        }
    }
}

After they issue the overlapped read, they wait up to IO_TIMEOUT (1000) milliseconds for an answer. If there is no answer after that time, they just give up and return.¹

Do you see the problem?

They never cancel the I/O and wait for it complete. They just abandon the I/O and return immediately.

When the function returns, the OVERLAPPED structure on the stack becomes available for reuse, and then when the I/O finally does complete, the kernel writes the I/O status to memory that has since been repurposed for something else. (It also writes the data to the original m_buffer which might also have been freed by the time the I/O completes.)

I’m not sure what they were thinking here. They started an I/O and just walked away. How does the kernel know that it should stop executing the I/O and stop writing the I/O results back into application memory?

It’s like booking a demolition company to knock down your house, and they say, “We’re really busy right now, but we’ve added you to our schedule. We can’t promise an exact date, but trust us, we’ll show up to knock down your house when it’s your turn.” You get tired of waiting for them and just sell the house and move somewhere else. Eventually, that demolition company will show up and knock down that house, even though it now belongs to somebody else.

When I discussed this bug investigation with some colleagues, one of them remarked, “Wow, how lucky you were! The very first hit was the memory corruption bug you were looking for.”

I replied, “As it turns out, it wasn’t luck.” This code base was a target-rich environment. Every single overlapped I/O had this same bug: Nobody ever cancelled I/O before abandoning it! If the I/O didn’t complete within a specified timeout, the code always simply walked away from it.

(Note that this code is really lucky that the I/O eventually failed. If it had succeeded, they would also have corrupted whatever object was placed in the memory formerly used as the I/O output buffer!)

But wait, this is stack corruption. The original problem was heap corruption. Even though this is bad, it’s not the bug that caused the crash.

I found two places that performed asynchronous I/O into an OVERLAPPED structure on the heap. Here’s one of them:

class Writer
{
    ⟦...⟧

    OVERLAPPED m_overlapped;

    ⟦...⟧
};

ErrorCode Writer::Write(void* buffer, unsigned size)
{
    ⟦...⟧
    
    if (!WriteFile(m_target, buffer, size,
                &actual, &m_overlapped)) {
        return ErrorCode::WriteFailed;
    }

    if (GetLastError() != ERROR_IO_PENDING) {
        return ErrorCode::WriteFailed;
    }

    if (WaitForSingleObject(o.hEvent, 5000)
            == WAIT_TIMEOUT) {
        return ErrorCode::WriteTimeout;
    }
}

This issues an overlapped write to the m_target and waits 5 seconds for the write to complete. if it doesn’t complete, then it just abandons the operation and returns a failure code.

What’s happening is that if this write operation takes more than five seconds, the failure code propagates up the call stack, and I guess it destructs the Writer, allowing the memory for m_overlapped to be reused by the IdCollection, which then gets corrupted when the I/O finally completes.

Notice that the crash is in the logging code, and the log entry is probably created immediately after the Writer is freed, so it ends up reusing that memory. And then the overlapped I/O completes and updates what it thought was an OVERLAPPED structure but which is now the map sentinel node.

The fix is to make sure that when we decide to abandon an I/O operation, we cancel it and wait for the I/O to complete. (It will probably complete with ERROR_CANCELLED.)

For example, we could do this:

ErrorCode Writer::Write(void* buffer, unsigned size)
{
    ⟦...⟧
    
    if (!WriteFile(m_target, buffer, size,
                &actual, &m_overlapped)) {
        return ErrorCode::WriteFailed;
    }

    if (GetLastError() != ERROR_IO_PENDING) {
        return ErrorCode::WriteFailed;
    }

    if (WaitForSingleObject(o.hEvent, 5000)
            == WAIT_TIMEOUT) {
        CancelIoEx(m_target, &m_overlapped);        
        GetOverlappedResult(m_target, &m_overlapped,
            &actual, TRUE);                         
        return ErrorCode::WriteTimeout;
    }
}

¹ Yes, this code tests for WAIT_ABANDONED, even though that error code will never be returned when waiting on event. The WAIT_ABANDONED error code is used only by mutexes.

3 comments

Discussion is closed. Login to edit/delete existing comments.

Joshua Hudson September 30, 2024

I had the fun of debugging a simpler version of this. I had some code that was written for synchronous handles and using an OVERLAPPED structure on the stack for some other reason, called ReadFile, handled errors, and returned. Of course, ERROR_IO_PENDING was just another error in the default case of the switch statement so it did nothing in particular; and of course I managed at some point to pass an asynchronous handle to it. Boom stack corruption.
Kristof Roomp September 29, 2024

This kind of corruption is particularly hard to find because since this is the kernel updating these structures, enabling pageheap won’t find these. Same thing with the overlapped reads and misinterpreting the return code of ReadFileEx, I think you mentioned that in a previous post.
Robert Tischer September 28, 2024

This kind of in-depth analysis is so very valuable. I have the same kind of cancellation problem in my code.