{"id":10503,"date":"2011-06-03T07:00:00","date_gmt":"2011-06-03T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2011\/06\/03\/swamping-the-thread-pool-with-work-faster-than-it-can-drain\/"},"modified":"2011-06-03T07:00:00","modified_gmt":"2011-06-03T07:00:00","slug":"swamping-the-thread-pool-with-work-faster-than-it-can-drain","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20110603-00\/?p=10503","title":{"rendered":"Swamping the thread pool with work faster than it can drain"},"content":{"rendered":"<p>\nThis scenarios is real,\nbut details have been changed to protect the guilty.\n<\/p>\n<p>\nConsider a window showing the top of a multi-page document.\nThe developers found that when the user clicks the down-arrow\nbutton on the scroll bar,\nthe program locks up for 45 seconds,\nover a dozen threads are created,\nand then suddenly everything clears up and the window\ndisplays the final paragraph of the document\n(i.e., it scrolled all the way to the bottom).\n<\/p>\n<p>\nThe problem was traced to queueing tasks to\nthe thread pool faster than they can drain.\n<\/p>\n<p>\nThe document is an object which, unlike a window, has no thread\naffinity.\n(Naturally, there are critical sections around the various document\nmethods so you don&#8217;t have corruption if two threads try to modify\nthe document at the same time.)\nThe way to display a different portion of\nthe document is to call a method which\nchanges the viewport location.\n<\/p>\n<p>\nWhen the user clicks the down-arrow button on the scroll bar,\nthe main thread creates a periodic timer\nat\n<a HREF=\"http:\/\/blogs.msdn.com\/oldnewthing\/archive\/2008\/04\/23\/8417521.aspx\">\nfour fifths of the double-click speed<\/a>,\nand each time the timer elapses,\nit does the equivalent of\n<code>document.ScrollDown()<\/code>.\nThe code cancels the timer once the user releases the mouse button.\n<\/p>\n<p>\nThe catch is that the document was so complicated that\nit takes a long time to change the view top and redraw the new view.\n(I don&#8217;t remember exactly how long, but let&#8217;s say it was 700ms.\nThe important thing is that it takes longer than 400ms.)\n<\/p>\n<p>\nGiven that set-up, you can see what happens when the user clicks\nthe scroll down-arrow.\nThe initial scroll is initiated,\nand before it can complete,\nanother scroll is queued to the thread pool.\nThe document view keeps trying to update its position, but the\nperiodic timer generates scroll requests faster than the document\nview can keep up.\n<\/p>\n<p>\nIf that description was too terse, here&#8217;s a longer version.\n<\/p>\n<p>\nThe code for scrolling went something like this:\n<\/p>\n<pre>\nOnBeginScrollDown()\n{\n \/\/ Start a timer to do the scrolling\n <a HREF=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms682485(VS.85).aspx\">CreateTimerQueueTimer<\/a>(&amp;htimer, NULL, ScrollAgain, NULL,\n     0, GetDoubleClickTime() * 4 \/ 5, WT_EXECUTEDEFAULT);\n}\nOnEndScrollDown()\n{\n if (htimer != NULL) {\n  DeleteTimerQueueTimer(NULL, htimer, INVALID_HANDLE_VALUE);\n  htimer = NULL;\n }\n}\nScrollAgain(...)\n{\n document.ScrollDown();\n}\n<\/pre>\n<p>\n(In reality, the program didn&#8217;t use the\n<code>CreateTimerQueueTimer<\/code> function&mdash;it\nhad a custom timer queue and a custom thread pool&mdash;but\nthe effect is the same.)\n<\/p>\n<p>\nAt time <i>T<\/i>&nbsp;=&nbsp;0,\nthe user clicks on the scroll bar down-arrow.\nThe UI thread starts the timer\nwith an initial delay of zero and a period of 400ms.\nThe timer fires immediately, and a thread pool thread is\nasked to run <code>ScrollAgain<\/code>.\nThe <code>ScrollAgain<\/code> function calls\n<code>ScrollDown<\/code>,\nwhich begins the process of scrolling the document.\n<\/p>\n<p>At time <i>T<\/i>&nbsp;=&nbsp;400ms,\nthe periodic timer fires, and a new thread pool thread is created\nto service it.\nPool thread&nbsp;2 calls <code>ScrollDown()<\/code> and blocks.\n<\/p>\n<p>\nAt time <i>T<\/i>&nbsp;=&nbsp;700ms,\nthe <code>ScrollDown<\/code> call\non pool thread&nbsp;1 completes,\nand now pool thread&nbsp;2 can begin its call to\n<code>ScrollDown()<\/code>.\n<\/p>\n<p>\nAt time <i>T<\/i>&nbsp;=&nbsp;800ms,\nthe periodic timer fires again,\nand pool thread&nbsp;1 (now idle) is asked to handle it.\nPool thread&nbsp;1 calls <code>ScrollDown()<\/code> and blocks.\n<\/p>\n<p>\nAt time <i>T<\/i>&nbsp;=&nbsp;1200ms,\nthe periodic timer fires yet again.\nThis time,\nthere are no idle threads in the thread pool, so the thread\npool manager creates yet another thread to service the timer.\nPool thread&nbsp;3 calls <code>ScrollDown()<\/code> and blocks.\n<\/p>\n<p>\nAt time <i>T<\/i>&nbsp;=&nbsp;1400ms,\nthe <code>ScrollDown()<\/code>\ncall issued by pool thread&nbsp;2 completes.\nPool thread&nbsp;2 now returns to idle.\nNow the call to\n<code>ScrollDown()<\/code> from pool thread&nbsp;1\n(issued at\ntime <i>T<\/i>&nbsp;=&nbsp;800ms) can start.\n<\/p>\n<p>\nAt time <i>T<\/i>&nbsp;=&nbsp;1600ms,\nthe periodic timer fires <i>again<\/i>,\nand pool thread&nbsp;2 is chosen to service it.\nPool thread&nbsp;2 calls <code>ScrollDown()<\/code> and blocks.\n<\/p>\n<p>\nAt time <i>T<\/i>&nbsp;=&nbsp;2000ms,\nthe periodic timer fires again, and a new pool thread is created\nto service it.\nPool thread&nbsp;4 calls <code>ScrollDown()<\/code> and blocks.\n<\/p>\n<p>\nYou can see where this is going, I hope.\nWork is being generated by the periodic timer at a rate of one\nwork item per 400ms,\nbut it takes 700ms to carry out each work item,\nand the tasks are serialized on the document.\nIt&#8217;s like\n<a HREF=\"https:\/\/www.youtube.com\/watch?v=uztA6JCKB4s\">\nLucy in the chocolate factory<\/a>.\nThe document is frantically trying to carry out all the\nwork, and it never manages to catch up.\nEventually, the document scrolls all the way to the bottom,\nand the mass of\npent-up calls to <code>ScrollDown()<\/code> all return immediately since\nthere is no more scrolling possible.\n<\/p>\n<p>\nNow that the document is idle,\nit can paint,\nand that&#8217;s where the user finally sees the document,\nscrolled all the way to the bottom.\n<\/p>\n<p>\nThere are a number of possible solutions here.\n<\/p>\n<p>\nOne way is\nnot to queue up another scroll while an old one is still running.\nInstead, just wait for it to finish, and then issue a new scroll that\naccumulates all the scrolling that had taken place while you were waiting\nfor the first to complete.\nThis results in jerky scrolling, however,\nand it creates a lag of up to 700ms between the user releasing the mouse\nbutton and scrolling actually stopping.\n<\/p>\n<p>\nAnother approach is to\ndisable repainting the entire document when you detect that you are\nin the <i>document is too complex to scroll quickly<\/i> case\nand just scroll the scrollbar thumb.\nWhen the user stops scrolling, re-enable painting and <i>boom<\/i>\nthe document appears at the user&#8217;s chosen location.\nThis preserves responsiveness, but you lose the ability to see the\ndocument as you scroll it.\n<\/p>\n<p>\nI don&#8217;t know what solution the customer finally went with.\nI was just there to help with the debugging.\n<\/p>\n<p>\n<b>Bonus example<\/b>:\nLarry Osterman describes\n<a HREF=\"http:\/\/blogs.msdn.com\/larryosterman\/archive\/2009\/06\/25\/what-s-wrong-with-this-code-part-26-a-real-world-example.aspx\">\nanother situation<\/a>\nwith\n<a HREF=\"http:\/\/blogs.msdn.com\/larryosterman\/archive\/2009\/06\/29\/what-s-wrong-with-this-code-part-26-the-answer.aspx\">\nthe same underlying cause<\/a>.\n<\/p>\n<p>\n<b>Hidden take-away<\/b>:\nObserve that both of these examples illustrate\none of the subtle consequences of a design which\nmoves all processing off the UI thread.\n<\/p>\n<p>\n<b>Update<\/b>:\nNote that <code>Set&shy;Timer<\/code> wouldn&#8217;t have helped here.\n<\/p>\n<pre>\ncase WM_TIMER:\n  if (wParam == SCROLLTIMER) {\n    QueueUserWorkItem(ScrollAgain, NULL, WT_EXECUTEDEFAULT);\n  }\n  ...\n<\/pre>\n<p>\nSince the processing has been moved off the UI thread, the\n<code>WM_TIMER<\/code> messages are free to keep flowing in\nand queue up work faster than the background thread can keep up.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This scenarios is real, but details have been changed to protect the guilty. Consider a window showing the top of a multi-page document. The developers found that when the user clicks the down-arrow button on the scroll bar, the program locks up for 45 seconds, over a dozen threads are created, and then suddenly everything [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-10503","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>This scenarios is real, but details have been changed to protect the guilty. Consider a window showing the top of a multi-page document. The developers found that when the user clicks the down-arrow button on the scroll bar, the program locks up for 45 seconds, over a dozen threads are created, and then suddenly everything [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/10503","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=10503"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/10503\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=10503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=10503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=10503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}