{"id":103720,"date":"2020-05-01T07:00:00","date_gmt":"2020-05-01T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=103720"},"modified":"2020-05-16T09:03:53","modified_gmt":"2020-05-16T16:03:53","slug":"20200501-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20200501-00\/?p=103720","title":{"rendered":"Diagnosing a hang: Everybody stuck in <CODE>Win&shy;Http&shy;Get&shy;Proxy&shy;For&shy;Url<\/CODE>"},"content":{"rendered":"<p>A customer reported that their program eventually ground to a halt with over 750 threads stuck in <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code>:<\/p>\n<pre>ntdll!ZwWaitForSingleObject+0x14\r\nKERNELBASE!WaitForSingleObjectEx+0x8f\r\nwinhttp!OutProcGetProxyForUrl+0x160\r\nwinhttp!WinHttpGetProxyForUrl+0x349\r\ncontoso!submit_web_request+0x232\r\nntdll!TppWorkpExecuteCallback+0x35e\r\nntdll!TppWorkerThread+0x474\r\nkernel32!BaseThreadInitThunk+0x14\r\nntdll!RtlUserThreadStart+0x21\r\n<\/pre>\n<p>(I&#8217;ve simplified the stack trace for expository purposes.)<\/p>\n<p>What&#8217;s happening here is that you put some work on the thread pool, and that work called <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code>. This function is synchronous, but it makes HTTP network requests which are asynchronous. To bridge the gap, the <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code> function performs a synchronous wait for the asynchronous work to complete.<\/p>\n<p>And my guess is that <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code> itself uses the thread pool to complete its asynchronous work.<\/p>\n<p>What&#8217;s happening is that the program flooded the thread pool with <code>submit_<\/code><code>web_<\/code><code>request<\/code> work items. Those work items called <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code>, which queues its own work item and waits for it to complete. But those work items can&#8217;t run because the thread pool&#8217;s threads are all busy handling <code>submit_<\/code><code>web_<\/code><code>request<\/code> work items.<\/p>\n<p>Eventually, the thread pool may realize that it&#8217;s not making progress and spin up a new thread to deal with the work that has been piling up. Maybe that thread will finish the work begun by <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code>, and that will allow one of the <code>submit_<\/code><code>web_<\/code><code>request<\/code> threads to continue. Once that thread is finished with the <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code> work item, it will go pull another work item from the queue, and odds are that it&#8217;s going to get another <code>submit_<\/code><code>web_<\/code><code>request<\/code> work item, so now we&#8217;re back where we started, except with one more stuck thread in the thread pool.<\/p>\n<p>If the <code>submit_<\/code><code>web_<\/code><code>request<\/code> work items come in faster than <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code> can retire its own work items, the thread pool will fill up with threads blocked inside <code>submit_<\/code><code>web_<\/code><code>request<\/code>, and eventually the thread pool will reach its thread limit, and everything stops.<\/p>\n<p>You&#8217;re basically starving the thread pool by hijacking it with requests that themselves require the thread pool. All of the thread pool threads are stuck handling your requests, and none are left to do the work that your requests generated.<\/p>\n<p>It&#8217;s like you have a lot of heavy equipment that you want to move, so you hire every moving company in the city to move them. Company A shows up, and they say, &#8220;Hm, this is too big for us to move by ourselves. Let me call Company B, maybe they can help us.&#8221; Company B says, &#8220;Sorry, I can&#8217;t help you now. I just got an order to move a heavy piece of equipment.&#8221; By starving out all of the available moving companies, you manage to prevent any of them from completing the job.<\/p>\n<p>I suspect that this system is running in a network environment where <a href=\"http:\/\/en.wikipedia.org\/wiki\/Web_Proxy_Autodiscovery_Protocol\"> WPAD<\/a> is slow, which makes <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code>&#8216;s work item take longer to finish its job, and that makes it more likely that <code>submit_<\/code><code>web_<\/code><code>request<\/code> work items will arrive faster than <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code> work items can be retired.<\/p>\n<p>Now that we&#8217;ve diagnosed the problem, what can we do to fix it?<\/p>\n<p>One idea is to hire just one moving company and let them decide how many more moving companies they need. Put all your calls to <code>submit_<\/code><code>web_<\/code><code>request<\/code> on a single thread and retire them one at a time. This clogs up just one thread, leaving the others available to assist. On the other hand, this means that the requests cannot be handled in parallel.<\/p>\n<p>A better fix is to change the way you use the thread pool so you don&#8217;t keep a thread hostage for a long time.<\/p>\n<p>I&#8217;m not an expert on Win\u00adHttp, but other people had some ideas on how to do this.<\/p>\n<p>You can switch to <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl\u00adEx<\/code>, which returns immediately and calls you back when it has an answer. The <code>submit_<\/code><code>web_<\/code><code>request<\/code> function could call <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl\u00adEx<\/code> and return immediately. This releases the thread pool thread to do other work\u2014possibly even the work that <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl\u00adEx<\/code> needs to do in order to complete. When <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl\u00adEx<\/code> finishes its asynchronous work, it calls the callback, and the callback and do whatever work <code>submit_<\/code><code>web_<\/code><code>request<\/code> was planning on doing after getting the proxy information.<\/p>\n<p>Basically, go asynchronous all the way. It&#8217;s not an unreasonable approach for this program, since the <code>submit_<\/code><code>web_<\/code><code>request<\/code> itself models an asynchronous request: It initiates the request and will call some caller-provided callback with the response from the servber. Since it&#8217;s already behaving asynchronously, you may as well make it even <i>more<\/i> asynchronous.<\/p>\n<p>Another suggestion was to skip <code>Win\u00adHttp\u00adGet\u00adProxy\u00adFor\u00adUrl<\/code>\u00a0entirely and just pass the <code>WIN\u00adHTTP_<\/code><code>ACCESS_<\/code><code>TYPE_<\/code><code>AUTOMATIC_<\/code><code>PROXY<\/code> flag to <code>Win\u00adHttp\u00adOpen<\/code>. This defers the proxy work to the <code>Win\u00adHttp\u00adOpen<\/code> function, and it can do that as part of its other asynchronous activities. This seems like a good idea because it gets you out of the proxy business entirely, and you still get the asynchronous behavior. It also gives you the satisfaction of fixing a bug by deleting code.<\/p>\n<p>The customer confirmed that switching to the <code>WIN\u00adHTTP_<\/code><code>ACCESS_<\/code><code>TYPE_<\/code><code>AUTOMATIC_<\/code><code>PROXY<\/code> flag fixed the problem.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Putting on your thinking cap.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-103720","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Putting on your thinking cap.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/103720","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=103720"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/103720\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=103720"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=103720"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=103720"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}