{"id":110601,"date":"2024-12-03T07:00:00","date_gmt":"2024-12-03T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110601"},"modified":"2024-12-03T07:26:30","modified_gmt":"2024-12-03T15:26:30","slug":"20241203-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20241203-00\/?p=110601","title":{"rendered":"Tricks from product support: We&#8217;re not smart enough to debug the problem, can you help us?"},"content":{"rendered":"<p>Some time ago, I shared the trick of asking customers to <a title=\"Blow the dust out of the connector\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20040303-00\/?p=40423\"> blow the dust out of the connector<\/a>. Today I&#8217;m sharing a trick I learned from the enterprise product support team.<\/p>\n<p>It can happen that investigating a problem reveals that a problem occurred when calling a function that has been patched or hooked. (In the case of enterprise customers, the offender is typically some &#8220;advanced anti-malware software&#8221; that they paid a lot of money for.) The code running in the hook ends up does something sketchy, the most common example of which is hooking a low-level function and then having the hook call a higher-level function, resulting in a deadlock. A ridiculous example would be hooking <code>Heap\u00adAlloc<\/code> (a low-level memory allocation function) and calling <code>Message\u00adBox<\/code> (a high-level user interface function). Another example would be hooking a function in a way that changes unspecified but observable state, such as changing the value returned by <code>Get\u00adLast\u00adError<\/code> when the function succeeds.<\/p>\n<p>The trick here is to not to tell the customer, &#8220;We think the problem is being caused by your anti-malware software.&#8221; That is something they don&#8217;t want to hear. After all, they paid a lot of money for that anti-malware software, and a recommendation of the form &#8220;throw away a lot of money you already spent&#8221; is not going to land well. (See also: sunk cost fallacy.)<\/p>\n<p>Instead, tell the customer, &#8220;It looks like the anti-malware software is interfering with our ability to debug the problem. Can you temporarily turn it off, then reproduce the problem following the same instructions, with the same tracing and crash dump collection steps? Once you&#8217;ve done that, you can turn the software back on.&#8221;<\/p>\n<p>In other words, &#8220;It&#8217;s not you. It&#8217;s me.&#8221; We are trying to debug the problem in our software, and we fully acknowledge that it&#8217;s a problem in our software, but we&#8217;re not smart enough to do it while that other software is running, so can you just help us out and remove some of the distractions?<\/p>\n<p>I&#8217;m told that what usually\u00b2 happens is that the customer, for some mysterious reason, is unable to get the problem to occur when the anti-malware software is disabled. &#8220;Wow, that&#8217;s weird.&#8221;<\/p>\n<p>Sometimes the customer gets the hint and opens a support ticket with the anti-malware vendor. Sometimes we have to suggest to them, &#8220;Why don&#8217;t you check if there&#8217;s an update available for your anti-malware software?&#8221;<\/p>\n<p>\u00b9 A common example of this is calling <code>Tls\u00adGet\u00adValue<\/code> from inside the hook, which has a documented side effect of clearing the last error code.<\/p>\n<p>\u00b2 Usually, but not always. Sometimes, the anti-malware software not actually the source of the problem. But we&#8217;re not lying! Removing the anti-malware software from the equation does simplify the debugging: Since we don&#8217;t have the symbols for the anti-malware software, the stack traces are cluttered with mystery frames, and sometimes the frames are so badly messed up that the debugger can&#8217;t find the other end. Removing the anti-malware software produces cleaner and more complete stack frames, which definitely makes the analysis easier.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s not you, it&#8217;s me.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[104],"class_list":["post-110601","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-tipssupport"],"acf":[],"blog_post_summary":"<p>It&#8217;s not you, it&#8217;s me.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110601","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110601"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110601\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110601"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110601"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}