{"id":7603,"date":"2012-05-17T07:00:00","date_gmt":"2012-05-17T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2012\/05\/17\/how-to-view-the-stack-of-threads-that-were-terminated-as-part-of-process-teardown-from-the-kernel-debugger\/"},"modified":"2012-05-17T07:00:00","modified_gmt":"2012-05-17T07:00:00","slug":"how-to-view-the-stack-of-threads-that-were-terminated-as-part-of-process-teardown-from-the-kernel-debugger","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20120517-00\/?p=7603","title":{"rendered":"How to view the stack of threads that were terminated as part of process teardown from the kernel debugger"},"content":{"rendered":"<p>\nAs we saw some time ago,\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2007\/05\/03\/2383346.aspx\">\nprocess shutdown is a multi-phase affair<\/a>.\nAfter you call <code>ExitProcess<\/code>,\nall the threads are forcibly terminated.\nAfter that&#8217;s done, each DLL is sent a <code>DLL_PROCESS_DETACH<\/code>\nnotification.\nYou may be debugging\na problem with <code>DLL_PROCESS_DETACH<\/code> handling\nthat suggests that some of those threads were not cleaned up properly.\nFor example, you might assert that a reference count is zero,\nand you find during process shutdown that this assertion sometimes fires.\nMaybe you terminated a thread before it got a chance to release\nits reference?\nHow can you test this theory if the thread is already gone?\n<\/p>\n<p>\nIt so happens that when all the threads are terminated during the\nearly phase of process shutdown,\nthe kernel is a bit lazy and doesn&#8217;t free their stacks.\nIt figures, hey, the entire process is going away soon,\nso the stack memory is going to be cleaned up as part of process\ntermination.\n(It&#8217;s sort of the kernel equivalent of\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2012\/01\/05\/10253268.aspx\">\nnot bothering to sweep the floor\nof a building that&#8217;s about to be demolished<\/a>.)\nYou can use this to your advantage by <i>grovelling the stacks\nthat were left behind<\/i>.\n<\/p>\n<p>\nHey, this is why you get called in to debug the hard stuff, right?\n<\/p>\n<blockquote CLASS=\"m\"><p>\nBefore continuing, I need to emphasize that this information is\n<b>for debugging purposes only<\/b>.\nThe structures and offsets are all implementation details\nwhich can change from release to release.\n<\/p><\/blockquote>\n<p>\nThe first step is to identify where all the stacks are.\nThe direct approach is difficult because the stacks can be all\ndifferent sizes, so it&#8217;s not easy to pick them out of a line-up.\nBut one thing does come in a consistent size: The\n<a HREF=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ff565433.aspx\">\nTEB<\/a>.\n<\/p>\n<p>\nFrom the kernel debugger, use the <code>!process<\/code> command\nto dump the process you are interested in,\nand from the header information, extract the <code>VadRoot<\/code>.\n<\/p>\n<pre>\n1: kd&gt; !process -1\nPROCESS 8731bd40  SessionId: 1  Cid: 0748    Peb: 7ffda000  ParentCid: 0620\n    DirBase: 4247b000  ObjectTable: 96f66de0  HandleCount: 104.\n    Image: oopsie.exe\n    <font COLOR=\"red\"><u>VadRoot 893de570<\/u><\/font> Vads 124 Clone 0 Private 518. Modified 643. Locked 0.\n    DeviceMap 995628c0\n<\/pre>\n<p>\nDump this VAD root with the <code>!vad<\/code> command,\nand pay attention only to the entries which say\n<code>1 Private READWRITE<\/code>.\n<\/p>\n<pre>\n1: kd&gt; !vad <u>893de570<\/u>\nVAD     level      start      end    commit\n... ignore everything except \"1 Private READWRITE\" ...\n8730a5f0 ( 6)         50       50         1 Private      READWRITE\n9ab0cb40 ( 5)         60       7f         1 Private      READWRITE\n893978b0 ( 6)         80       9f         1 Private      READWRITE\n87302d30 ( 5)        110      110         1 Private      READWRITE\n889693f8 ( 6)        120      121         1 Private      READWRITE\n872f3fb8 ( 6)        170      170         1 Private      READWRITE\n87089a80 ( 6)        1a0      1a0         1 Private      READWRITE\n8cbf1cb0 ( 5)        1c0      1df         1 Private      READWRITE\n88c079d0 ( 6)        1e0      1e0         1 Private      READWRITE\n9abc33e0 ( 6)        410      48f         1 Private      READWRITE\n873173b0 ( 7)        970      970         1 Private      READWRITE\n8ca1c158 ( 7)      7ffd5    7ffd5         1 Private      READWRITE\n88c02a78 ( 6)      7ffd6    7ffd6         1 Private      READWRITE\n872f9298 ( 5)      7ffd7    7ffd7         1 Private      READWRITE\n8750d210 ( 7)      7ffd8    7ffd8         1 Private      READWRITE\n87075ce8 ( 6)      7ffda    7ffda         1 Private      READWRITE\n87215da0 ( 4)      7ffdc    7ffdc         1 Private      READWRITE\n872f2200 ( 6)      7ffdd    7ffdd         1 Private      READWRITE\n8730a670 ( 5)      7ffdf    7ffdf         1 Private      READWRITE\n<\/pre>\n<p>\n(If you are debugging from user mode, then you can use\n<code>!vadump<\/code> but the output format is different.)\n<\/p>\n<p>\nEach of these is a candidate TEB.\nIn practice, TEBs tend to be allocated at the high end of memory,\nso the ones with a low <code>start<\/code> value are probably\nred herrings.\nTherefore, you should investigate these candidates in reverse order.\n<\/p>\n<p>\nFor each candidate, take the <code>start<\/code> address and append\nthree zeroes.\n(Each page on x86 is 4KB, which conveniently maps to 1000 in hex.)\nDump the first seven\npointers of the TEB with the <code>dp xxxxx000 L7<\/code>\ncommand.\n<\/p>\n<pre>\n1: kd&gt; dp 7ffdf000 L7\n7ffdf000  0016fbb0 00170000 0016b000 00000000\n7ffdf010  00001e00 00000000 7ffdf000 &larr; hit\n<\/pre>\n<p>\nIf the TEB is valid, then the seventh pointer points back\nto the start of the TEB.\nIn a valid TEB,\nthe second and third values are the\nstack limits; in this case, the candidate stack lives between\n<code>0016b000<\/code> and <code>00170000<\/code>.\n(As a double-check, you can verify that the upper limit of the\nstack, <code>00170000<\/code> in this case, matches up with\nthe end of a VAD allocation in the <code>!vad<\/code> output above.)\n<\/p>\n<p>\nNow that you know where the stack is, you can <code>dps<\/code> it\nand\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2011\/03\/09\/10138401.aspx\">\nlook for EBP frames<\/a>.\n(I usually start about two to four pages below the upper limit of the stack.)\nTest out each candidate EBP frame with the <code>k=<\/code> command\nuntil you find one that seems to be solid.\nRecord this candidate stack trace in a text file for further study.\n<\/p>\n<p>\nRepeat for each candidate TEB, and you will eventually reconstruct\nwhat each thread in the process was doing at the moment it was\nterminated.\nIf you&#8217;re really lucky, you might even see the code that incremented\nthe reference count\nbut was terminated before it could release it.\n<\/p>\n<p>\nThe above discussion also applies to debugging 64-bit processes.\nHowever, instead of looking for\n<code>1 Private READWRITE<\/code> pages, you want to look for\n<code>2 Private READWRITE<\/code> pages.\nAs an additional wrinkle, if you are debugging ia64, then converting\na page frame to a linear address is sadly not as simple as appending\nthree zeroes.\nPages on ia64 are 8KB, not 4KB, so you need to shift the value left\nby 25 bits: Add three zeroes and then multiply by two.\n<\/p>\n<p>\nAnd finally, if you are debugging a 32-bit process on x64,\nthen you want to look for <code>3 Private READWRITE<\/code> pages,\nbut add 2 before appending the three zeroes.\nThat&#8217;s because the TEB for a 32-bit process on x64 is really two\nTEBs glued together: A 64-bit TEB followed by a 32-bit TEB.\n<\/p>\n<p>\n<b>Note<\/b>:\nI did not come up with this debugging technique on my own.\nI learned it from an even greater debugging genius.\n<\/p>\n<p>\n<a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2012\/05\/18\/10306501.aspx\">\nNext time<\/a>, we&#8217;ll look at debugging this issue from a user-mode\ndebugger.\n<\/p>\n<p>\n<b>Trivia<\/b>:\nThe informal term for these terminated-but-not-yet-completely-destroyed\nthreads is <i>ghost threads<\/i>.\nThe term was coined by the Exchange support team,\nbecause they often have to study server failures\nthat require them to do this type of investigation,\nand they needed a cute name for it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As we saw some time ago, process shutdown is a multi-phase affair. After you call ExitProcess, all the threads are forcibly terminated. After that&#8217;s done, each DLL is sent a DLL_PROCESS_DETACH notification. You may be debugging a problem with DLL_PROCESS_DETACH handling that suggests that some of those threads were not cleaned up properly. For example, [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-7603","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>As we saw some time ago, process shutdown is a multi-phase affair. After you call ExitProcess, all the threads are forcibly terminated. After that&#8217;s done, each DLL is sent a DLL_PROCESS_DETACH notification. You may be debugging a problem with DLL_PROCESS_DETACH handling that suggests that some of those threads were not cleaned up properly. For example, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/7603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=7603"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/7603\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=7603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=7603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=7603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}