{"id":39895,"date":"2020-09-30T05:12:22","date_gmt":"2020-09-30T12:12:22","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/premier-developer\/?p=39895"},"modified":"2020-10-02T05:14:12","modified_gmt":"2020-10-02T12:14:12","slug":"collect-and-automate-diagnostic-actions-with-azure-app-services","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/collect-and-automate-diagnostic-actions-with-azure-app-services\/","title":{"rendered":"Collect and Automate Diagnostic Actions with Azure App Services"},"content":{"rendered":"<p><a href=\"https:\/\/www.linkedin.com\/in\/reedrobison\/\">Reed Robison<\/a> shares techniques to collect diagnostic data and automate recovery behavior with Azure App Services.<\/p>\n<hr \/>\n<p>Troubleshooting production systems is often a balance between restoring services quickly and trying to collect enough information to isolate what caused the issue. For complex application issues, it\u2019s almost always helpful to capture a memory dump. Memory dumps are a snapshot of a process in time and with them, you can see precisely what your app was doing when it experienced a problem. By examining call stacks, you can see exactly what every thread is doing \u2013 what they are waiting on, exceptions that were thrown, and sometimes even the data that is responsible for getting it into a bad state. Post-mortem debugging isn\u2019t for everyone, but for the most complex problems it\u2019s how you get answers. Luckily, there are some good tools that automate dump analysis, and you can always call Microsoft for deeper assistance.<\/p>\n<p>The biggest challenge is typically reacting fast enough to capture a dump while the problem is occurring. Once you recycle a process, that data (and opportunity) is gone. Sometimes manual memory dumps are possible but frequently you must automate the process in order to get the data you need.<\/p>\n<p>Azure App Services provides a range of Diagnostic Services to choose from. This post will explore some of the tools available and ways to automate more complex scenarios.<\/p>\n<p>Let\u2019s consider the scenario where a web application instance gets into a \u201cbad\u201d state and is no longer serving requests property. Requests routed to this one instance fail, but the other instances seem to be working fine. The goal is to quickly detect the condition, create a memory dump, and recycle only the instance that is causing problem.<\/p>\n<h3>Manual Intervention<\/h3>\n<p>You could simply restart the App Service either through the Azure portal or through an automation script. The downside with this approach is that it restarts all the instances, and the impairment will persist until a human gets involved. That\u2019s not ideal for production scenarios. To capture a dump before you restart, you can to navigate to the <strong><em>Diagnose and solve problems<\/em><\/strong> in the Azure portal and choose Diagnostic Tools. Choose Collect Memory Dumps, pick a specific instance to dump, then save to a designated storage account for further analysis.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-39896\" src=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/memorydump.png\" alt=\"Image memorydump\" width=\"623\" height=\"285\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/memorydump.png 623w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/memorydump-300x137.png 300w\" sizes=\"(max-width: 623px) 100vw, 623px\" \/><\/p>\n<h3>Trapping a Specific Exception Condition<\/h3>\n<p>Frequently, there is a specific exception or condition responsible for getting your app into a bad state. For example, you might see evidence of exceptions that occurred at some point a memory dump or log file, but understanding how you go there means trapping the exception <em>as it occurs<\/em>. This can get tricky with multiple instances since you may not know which instance the error will occur on. In this scenario, you want to monitor the process for an exception to occur and trigger a memory dump at that moment in time. To do this with App Services, we\u2019ll typically use something like <em>procdumphelper<\/em> to setup an exception monitor and configure the monitoring rule via Kudo console.<\/p>\n<p>There a good overview of how to set this up <a href=\"https:\/\/techcommunity.microsoft.com\/t5\/apps-on-azure\/capturing-dumps-on-multiple-instances-automatically-using\/ba-p\/392394\">here<\/a>.<\/p>\n<p><em>Tip \u2013 when configuring a rule to dump on a specific exception, you need -g if you are triggering on native exceptions.\u00a0 If you are triggering on managed exceptions, remove the -g param (will not trigger managed exceptions if this is used).<\/em><\/p>\n<h3>Automating Rules<\/h3>\n<p>App Services allows you to define Auto-Heal rules to automate some types of recovery actions. You can configure these using the Azure portal under <strong><em>Diagnose and solve problems<\/em><\/strong>. The list of pre-defined conditions is limited, but it\u2019s easy to use and handy for some common scenarios.<\/p>\n<p>For instance, you can trigger this action based on Request Duration, Memory Limit, Request Count, or a specific Status Code returned from your app. You can choose to Recycle Process, Log an Event, or take a Custom Action (such as creating a memory dump, running a profiler, or even running a specific executable).<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-39900\" src=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/auotheal-1.png\" alt=\"Image auotheal\" width=\"1062\" height=\"629\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/auotheal-1.png 1062w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/auotheal-1-300x178.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/auotheal-1-1024x606.png 1024w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/09\/auotheal-1-768x455.png 768w\" sizes=\"(max-width: 1062px) 100vw, 1062px\" \/><\/p>\n<p>While Auto-Heal rules make it easy to automate against these conditional triggers, you don\u2019t have a lot of additional options to customize them. In the scenario where you need to quickly identify a problem instance, dump it, and restore service, the default conditions (request duration, memory, count, or a status code) might not be enough. If your app could return a specific error code as a response, you could use that as the trigger, but that assumes your app knows it\u2019s in a bad state and has the ability to return a unique status code to serve as a trigger. That may not always be possible.<\/p>\n<p>Another automated option to restore service is <a href=\"https:\/\/azure.github.io\/AppService\/2020\/08\/24\/healthcheck-on-app-service.html\">Health Check<\/a>. It allows you to specify a path in your application to ping on a regular interval. The idea here is that if an instance fails to respond to a ping it can automatically be detected as unhealthy and removed from the load balancer. If it remains in an unhealthy state for an extended period of time, it is replaced with a new instance. More details can be found <a href=\"https:\/\/docs.microsoft.com\/azure\/azure-monitor\/platform\/autoscale-get-started#health-check-path\">here<\/a>. It does not (yet) provide any means to debug or dump that problem instance and it doesn\u2019t remove it right away.<\/p>\n<p>If all the above approaches don\u2019t provide the granularity to achieve the goal, you could consider writing your own automation script to control the actions. Azure exposes the ability to setup an alert rule (see Monitoring option in your app service) to trigger off a variety of conditions like Metrics and Logs. If there are characteristics (for example a Handle Count &gt; threshold) that indicates a \u201cknown\u201d bad state, you can configure an action group to kick off an Azure Function, Runbook, Logic App, etc., where you could control what happens next. If you can identify a way to trigger some kind of notification, then you can use PowerShell to author your own actions. There are a variety of ways to enumerate resources in your environment and take actions to restart services, instances, and even create memory dumps.<\/p>\n<p>For example, <a href=\"https:\/\/docs.microsoft.com\/archive\/blogs\/david_burgs_blog\/powershell-script-to-restart-role-instances-for-webapp\">here is a PowerShell script to recycle a role instance of a WebApp<\/a>. You could use this technique to recycle a specific, problem instance vs. restarting the entire App Service.<\/p>\n<p>I\u2019ll go into details of some approaches of automating memory dumps via PowerShell and <a href=\"https:\/\/docs.microsoft.com\/rest\/api\/appservice\/\">Azure REST APIs<\/a> in the next post. You can use a combination of these techniques to fine tune an automated response to create memory dumps and quickly recycle problem instances.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Troubleshooting production systems is often a balance between restoring services quickly and trying to collect enough information to isolate what caused the issue.  For complex application issues, it\u2019s almost always helpful to capture a memory dump.  <\/p>\n","protected":false},"author":582,"featured_media":39898,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[25],"tags":[53,24,119,3],"class_list":["post-39895","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure","tag-app-services","tag-azure","tag-diagnostics","tag-team"],"acf":[],"blog_post_summary":"<p>Troubleshooting production systems is often a balance between restoring services quickly and trying to collect enough information to isolate what caused the issue.  For complex application issues, it\u2019s almost always helpful to capture a memory dump.  <\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/39895","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/582"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=39895"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/39895\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/39898"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=39895"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=39895"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=39895"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}