{"id":14548,"date":"2023-03-07T00:01:00","date_gmt":"2023-03-07T08:01:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cse\/?p=14548"},"modified":"2024-07-18T11:52:25","modified_gmt":"2024-07-18T18:52:25","slug":"build-test-resilience-dotnet-functions","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/build-test-resilience-dotnet-functions\/","title":{"rendered":"A Hypothesis-Driven Approach to Building and Testing Resilience in .NET Azure Functions"},"content":{"rendered":"<p>Cloud-native architectures in Azure often bring together many services and dependencies &#8211; applications can read from and write to data stores, improve performance via external caches, and rely on message and event services to process data, among many other potential configurations of cloud components.\nWhile we&#8217;d hope that every component of our tech stack will work perfectly over the lifetime of our product, this ignores the realities of software development, particularly in the realm of cloud computing:\nnetwork connections flicker, infrastructure experiences a temporary blip, or load on shared public cloud resources leads to throttled requests.\nEven in the face of these challenges, we, as developers, still have a responsibility to our end users to make our application as available and reliable as we can.<\/p>\n<p>It might be impossible to foresee every problem that will arise, but we can follow the spirit of scientific inquiry to develop and test hypotheses about how our system might behave when different components fail.\nBy identifying potential points of failure in our system early on, we can take steps to mitigate problems that might arise and make our solutions more resilient overall.<\/p>\n<p>In this post, we&#8217;ll present an example of this journey through code snippets from an <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-functions\/functions-overview\">Azure Functions<\/a>-based data processing pipeline created as an artifact from our team&#8217;s work with customers.\nThe Functions relied heavily on <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/digital-twins\/overview\">Azure Digital Twins (ADT)<\/a> and Azure Blob Storage as part of their workflows, and encountering transient errors while interfacing with these dependencies, combined with our team&#8217;s prioritization of engineering fundamentals, inspired a deeper dive into strategies to build and test resilience in our .NET code.\nBy adopting and incorporating this resilience-first mindset in our project, we were able to more carefully consider how we might want to vary error handling behavior across different failure scenarios and exception types.\nWe also needed to consider how we could design a thorough defense-in-depth strategy that integrates resilience options, both those built-into the Azure SDKs and implemented via external libraries, and, ultimately, enable our customer to more effectively and reliably support and extend this application in production.<\/p>\n<p>You&#8217;ll be able to follow along as we go over some of the main concepts of resilience engineering and how they can be implemented.\nIn particular, we&#8217;ll discuss ways in which you can leverage the well-known <a href=\"https:\/\/github.com\/App-vNext\/Polly\/\">Polly<\/a> library in an Azure Function to implement patterns for resilience and transient fault-handling.\nWe will also discuss using these in conjunction with resilience options offered in some of the native Azure SDKs.\nWe&#8217;ll conclude by reviewing the use of Polly&#8217;s companion library, <a href=\"https:\/\/github.com\/Polly-Contrib\/Simmy\">Simmy<\/a>, to perform hypothesis-driven chaos testing against any external dependencies used in an application.<\/p>\n<h2>Implementing resilience with Polly<\/h2>\n<p><a href=\"https:\/\/github.com\/App-vNext\/Polly\">Polly<\/a> is a .NET library for resilience and transient fault-handling.\nIt implements several design patterns for this &#8211; including Retry, Circuit-breaker, Timeout, and Rate-limiting &#8211; through the use of composable <strong>policies<\/strong>.\nThese policies are highly configurable based on your particular business and error-handling requirements.\nYou can also combine multiple policies for a multi-layered approach to resilience.\nYou can check out <a href=\"https:\/\/github.com\/App-vNext\/Polly\/wiki\/Transient-fault-handling-and-proactive-resilience-engineering\">this page in their documentation<\/a> for a more detailed discussion of when and why you should use each policy in a comprehensive resilience engineering strategy.<\/p>\n<h3>Built-in retry policies for Azure SDKs<\/h3>\n<p>Note that the services that your applications consume may have some resilience engineering built in by default.\nFor instance, many Azure SDKs and services include <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/best-practices\/retry-service-specific\">configurable retry mechanisms<\/a>,\nthough it&#8217;s also worth checking the individual product documentation of each service in case this behavior isn&#8217;t listed in that linked document (e.g. <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/digital-twins\/reference-service-limits#working-with-limits\">Azure Digital Twins (ADT)<\/a>).\nThese retry policies can be tuned, via configuration variables, to ensure that the behavior is best suited to the unique requirements and best practices for each service.<\/p>\n<p>This means that when this built-in Azure SDK client retry strategy is appropriate for your application scenario, you probably wouldn&#8217;t need to implement retries using Polly.\nIt might still be useful to do so when making requests against some other SDK\/API that doesn&#8217;t have that guarantee built in or when additional retry configuration complexity is required across multiple application layers.<\/p>\n<p>As with most design decisions, you should think about this carefully based on your business requirements, the services you&#8217;re using, your particular cloud architecture, and the expected usage patterns of your system.\nThe Azure Architecture Center has a great <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/best-practices\/transient-faults\">article on transient fault handling<\/a> that dives a little deeper on these considerations, common pitfalls, and best practices &#8211;\nwe highly encourage you to check it out for more guidance on this topic!<\/p>\n<h3>Using the Circuit Breaker pattern<\/h3>\n<p>Retries are a very useful tool in handling transient faults.\nHowever, in cases where a dependency is seriously struggling, and it&#8217;s evident that the issue is likely not just a minor blip, a large number of high-frequency requests or an over-enthusiastic retry policy (or both) can do more harm than good, potentially causing your users to wait even longer to get what they need from your application.\nThe <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/patterns\/circuit-breaker\">Circuit Breaker pattern<\/a>, used in conjunction with a well-crafted retry policy, is well-suited for these scenarios to regulate the flow of traffic and avoid overwhelming downstream servers with excessive retries.<\/p>\n<p>In the section below, we&#8217;ll walk through setting up a circuit breaker policy using <a href=\"https:\/\/github.com\/App-vNext\/Polly\/wiki\/Circuit-Breaker\">the Polly implementation of this pattern<\/a>.\nCircuit Breaker is the only Polly policy implemented and discussed in this sample, but you can read more about the other options for implementing resilience and transient fault-handling in <a href=\"https:\/\/github.com\/App-vNext\/Polly\/wiki\">their wiki<\/a>.\nThe patterns presented for registering policies via dependency injection and parameterization to enable easy configuration are applicable across these policy options.<\/p>\n<h3>Policy registration via dependency injection<\/h3>\n<p>For the rest of this blog post, we&#8217;ll assume some basic familiarity with .NET Azure Functions.\nHowever, if you&#8217;re less familiar with this topic, don&#8217;t worry &#8211;\nthe <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-functions\/\">Azure documentation<\/a> is a good place to get basic knowledge and get started developing .NET Azure Functions.\n<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-functions\/functions-dotnet-dependency-injection\">Dependency injection<\/a> in Azure Functions is key to building loosely-coupled, easily configurable applications that follow the principle of inversion of control (IoC).\nWe can inject a Polly <a href=\"https:\/\/github.com\/App-vNext\/Polly\/wiki\/PolicyRegistry\"><code>PolicyRegistry<\/code><\/a> centrally on Function startup so that later, when we want to apply one of its policies elsewhere in the project, it can be extracted from this registry;\nthis pattern promotes the separation of policy definition and usage.<\/p>\n<p>In our sample, we define the methods for injecting policies in <code>PolicyExtensions.cs<\/code>:<\/p>\n<pre><code class=\"language-c#\">public static class PolicyExtensions\r\n{\r\n    public const string AdtPolicyName = \"adtPolicy\";\r\n\r\n    \/\/\/ &lt;summary&gt;\r\n    \/\/\/ Adds Polly Policies to service collection\r\n    \/\/\/ &lt;\/summary&gt;\r\n    \/\/\/ &lt;param name=\"services\"&gt;Service collection to add Polly policies to&lt;\/param&gt;\r\n    \/\/\/ &lt;param name=\"circuitBreakerAllowedExceptionCount\"&gt;Allowed exceptions in timeframe&lt;\/param&gt;\r\n    \/\/\/ &lt;param name=\"circuitBreakerWaitTimeSeconds\"&gt;Duration of Break&lt;\/param&gt;\r\n    \/\/\/ &lt;returns&gt;void&lt;\/returns&gt;\r\n    public static void AddPollyPolicies(this IServiceCollection services, int circuitBreakerAllowedExceptionCount, int circuitBreakerWaitTimeSeconds)\r\n    {\r\n        if (services is null)\r\n        {\r\n            throw new ArgumentNullException(nameof(services));\r\n        }\r\n\r\n        services\r\n            .AddPolicyRegistry()\r\n            .AddAdtPolicies(circuitBreakerAllowedExceptionCount, circuitBreakerWaitTimeSeconds);\r\n    }\r\n\r\n    \/\/\/ &lt;summary&gt;\r\n    \/\/\/ Adds a Circuit Breaker Policy for ADT requests to the chain of policy registries.\r\n    \/\/\/ &lt;\/summary&gt;\r\n    \/\/\/ &lt;param name=\"policyRegistry\"&gt;Policy Registry to add the Circuit Breaker Policy to&lt;\/param&gt;\r\n    \/\/\/ &lt;param name=\"circuitBreakerAllowedExceptionCount\"&gt;Allowed exceptions in timeframe&lt;\/param&gt;\r\n    \/\/\/ &lt;param name=\"circuitBreakerWaitTimeSeconds\"&gt;Duration of Break&lt;\/param&gt;\r\n    \/\/\/ &lt;returns&gt;void&lt;\/returns&gt;\r\n    private static IPolicyRegistry&lt;string&gt; AddAdtPolicies(this IPolicyRegistry&lt;string&gt; policyRegistry, int circuitBreakerAllowedExceptionCount, int circuitBreakerWaitTimeSeconds)\r\n    {\r\n        if (policyRegistry is null)\r\n        {\r\n            throw new ArgumentNullException(nameof(policyRegistry));\r\n        }\r\n\r\n        \/\/ Open circuit based on specified exception types being thrown\r\n        var adtPolicy = GetCircuitBreakerPolicy(circuitBreakerAllowedExceptionCount, circuitBreakerWaitTimeSeconds);\r\n\r\n        policyRegistry\r\n            .Add(AdtPolicyName, adtPolicy);\r\n\r\n        return policyRegistry;\r\n    }\r\n\r\n    private static IAsyncPolicy GetCircuitBreakerPolicy(int allowedExceptionCount, int waitTimeSeconds)\r\n    {\r\n        \/\/ Open circuit based on potentially transient HTTP error exceptions being thrown\r\n        var policy = Policy\r\n            .Handle&lt;RequestFailedException&gt;(ex =&gt;\r\n                ex.Status &gt;= 500 ||\r\n                ex.Status == (int)HttpStatusCode.RequestTimeout ||\r\n                ex.Status == (int)HttpStatusCode.TooManyRequests);\r\n\r\n        var circuitBreakerPolicyAsync = policy.CircuitBreakerAsync(allowedExceptionCount, TimeSpan.FromSeconds(waitTimeSeconds));\r\n\r\n        return circuitBreakerPolicy;\r\n    }\r\n}<\/code><\/pre>\n<p>There are a few things to call out here:<\/p>\n<ul>\n<li>The methods generating policies are grouped by the service they are targeting (here, we showed the ADT-related policies).\nThis was an intentional design choice, made not only as a logical grouping for readability, but also to make it easier to centralize all policy-related setup in this class, so that the consuming SDK clients will only need to extract a single policy.\nWe will see later in this document how this can be extended with <a href=\"https:\/\/github.com\/App-vNext\/Polly\/wiki\/PolicyWrap\"><code>PolicyWrap<\/code><\/a> to flexibly combine multiple policies into one custom policy when performing chaos testing with Simmy.<\/li>\n<li>Note that our Circuit Breaker policy explicitly handles only <code>RequestFailedException<\/code>s, which is the exception type <a href=\"https:\/\/learn.microsoft.com\/en-us\/dotnet\/api\/overview\/azure\/digitaltwins.core-readme?view=azure-dotnet#troubleshooting\">thrown by the ADT .NET SDK<\/a> on service errors.\nWe introspect further on the status code of this error to only trigger the policy on the ones that potentially indicate a transient HTTP error (e.g. &gt;=500 status code for an Internal Service Error, request timeouts, or rate-limits exceeded).\nMore examples of pattern matching in policy construction to enable the configuration best suited to your use case can be found in the <a href=\"https:\/\/github.com\/App-vNext\/Polly\/#circuit-breaker\">Polly Circuit Breaker quickstart<\/a>.<\/li>\n<li>We pass the allowed exception count and cool-off period of the circuit breaker through to these methods as variables.\nThis parameterization allows us to configure these via the external app settings for easy tuning.<\/li>\n<\/ul>\n<p>We can call this extension method in our main <code>Startup.cs<\/code> class of the Functions project:<\/p>\n<pre><code class=\"language-c#\">var config = this.GetConfiguration(builder);\r\n\r\n\/\/ Add Polly\r\nvar settings = new StreamingDataFlowSettings(config);\r\n\r\nbuilder.Services.AddPollyPolicies(settings.CircuitBreakerAllowedExceptionCount, settings.CircuitBreakerWaitTimeSec);<\/code><\/pre>\n<p>Here, <code>StreamingDataFlowSettings<\/code> is a POCO class meant to map to the local or deployed settings for the function app, through which we can set the configuration for all components of the application.<\/p>\n<h3>Implementation in SDK clients<\/h3>\n<p>As we discussed above, to use the policies registered in our startup class, we can extract what we need to use by name in the constructor of our consuming client classes (here, in <code>AdtClient.cs<\/code>):<\/p>\n<pre><code class=\"language-c#\">private readonly IAsyncPolicy policy;\r\n\r\n...\r\n\r\n  \/\/ In the AdtClient() constructor\r\n  this.policy = policyRegistry.Get&lt;IAsyncPolicy&gt;(PolicyExtensions.AdtPolicyName);<\/code><\/pre>\n<p>We can then use this policy to execute the client requests provided by the ADT .NET SDK.\nAs an example:<\/p>\n<pre><code class=\"language-c#\">public void UpdateTwinAsync&lt;BasicDigitalTwin&gt;(string twinId, BasicDigitalTwin twin)\r\n{\r\n    try\r\n    {\r\n        this.policy.ExecuteAsync&lt;T&gt;(() =&gt;\r\n            this.client.UpdateDigitalTwinAsync&lt;T&gt;(twinId, twin));\r\n    }\r\n    catch (RequestFailedException e)\r\n    {\r\n        this.logger.FailedToCreateTwin(...);\r\n    }\r\n}<\/code><\/pre>\n<p>All requests wrapped in this way will be subject to the circuit breaker policy we defined in the <code>PolicyExtensions<\/code> class, just like we wanted.<\/p>\n<h2>Testing resilience with Simmy<\/h2>\n<p>So far in this post, we&#8217;ve seen how to implement and configure Polly policies to improve resilience in our .NET Azure Functions applications.\nThis is great!\nA well-considered resilience engineering strategy can give us tremendous peace of mind to know that transient errors are being handled gracefully.\nHowever, it&#8217;s also best to ensure that this fault handling is correctly implemented and configured;\nwhile transient dependency failures are an inevitable part of software, it isn&#8217;t easy to replicate the conditions that lead to them, like network failures or host throttling, in a repeatable and systematic way.<\/p>\n<p>When using Polly, <a href=\"https:\/\/github.com\/Polly-Contrib\/Simmy\">Simmy<\/a> is a natural next step to achieve these goals.\nSimmy is a library for performing chaos engineering by using configurable fault injections in a policy-centric way.\nIt is directly built off of and integrated with Polly.\nAs a bonus, this also means that we can reuse much of our existing code structure for policy registration and request execution.<\/p>\n<h3>Building a hypothesis<\/h3>\n<p>Resilience engineering and structuring chaos tests (really, most testing in software development), first requires you to understand what the expected behavior of your application is at a steady state.\nFrom there, you need to understand where and how it can fail, and then craft your hypothesis about what will happen in each case, following an &#8220;if-then&#8221; format (i.e. <em>if X occurs at a rate of Y%, then Z will happen<\/em>.)<\/p>\n<p>As a simple example, let&#8217;s look at the case of our application attempting to update twins in Azure Digital Twins.\nWe know that if ADT encounters an error on receiving an update request, it will return a <code>RequestFailedException<\/code>, which will be logged by our application.\nWe also know that we have configured our circuit breaker policy to specifically handle errors with status code 408 (Request Timeout), 429 (Too Many Requests), and anything &gt;= 500 (Internal Status Error), so the circuit breaker will be triggered if we exceed a certain amount (<code>allowedExceptionCount<\/code>) of consecutive errors matching this criteria &#8211;\nfor the sake of this example, lets say that <code>allowedExceptionCount<\/code> is 5.\nWe can form two hypotheses based off of this information:<\/p>\n<ul>\n<li><em>If<\/em> the update request to ADT returns a <code>RequestFailedException<\/code> with status code 500 randomly 5% of the time (simulating some kind of temporary blip), <em>then<\/em> we expect to see these exceptions occur occasionally and be logged, but it would be rare to see the circuit breaker trigger, since exceptions need to be consecutive for the circuit to open.<\/li>\n<li><em>If<\/em> the update request to ADT returns a <code>RequestFailedException<\/code> with status code 500, 100% of the time (simulating an extended issue or outage), <em>then<\/em> we expect to see the exceptions logged 5 times and then the circuit breaker policy would trigger, blocking further requests via the .NET SDK.<\/li>\n<\/ul>\n<p>From here, provided that you have a configurable framework for performing these tests, it&#8217;s easy enough to simply vary the parameters of the faults that you are injecting and run your series of experiments based off of this list.\nWe will see how this can be implemented in the following section.<\/p>\n<h3>Putting into practice<\/h3>\n<p>Remember, from what we&#8217;ve implemented <a href=\"#policy-registration-via-dependency-injection\">so far<\/a>, we already have a framework for injecting a <code>PolicyRegistry<\/code> into our application and using that policy to execute client SDK requests.\nPer our established pattern, adding chaos policies to simulate the exception cases detailed in the hypotheses above requires minimal extension to add the new policies, as well as a few new variables for configuration:<\/p>\n<pre><code class=\"language-c#\">private static IPolicyRegistry&lt;string&gt; AddAdtPolicies(\r\n  this IPolicyRegistry&lt;string&gt; policyRegistry,\r\n  bool usingCircuitBreaker,\r\n  double simmyInjectionRate,\r\n  string chaosDependencyTestingKey,\r\n  int circuitBreakerAllowedExceptionCount,\r\n  int circuitBreakerWaitTimeSeconds)\r\n{\r\n    if (policyRegistry is null)\r\n    {\r\n        throw new ArgumentNullException(nameof(policyRegistry));\r\n    }\r\n\r\n    \/\/ By default, use a no-op policy\r\n    IAsyncPolicy adtPolicy = Policy.NoOpAsync();\r\n    List&lt;IAsyncPolicy&gt; allPolicies = new List&lt;IAsyncPolicy&gt; { adtPolicy };\r\n\r\n    \/\/ Add additional policies per config\r\n    if (usingCircuitBreaker)\r\n    {\r\n        var circuitBreakerPolicy = GetCircuitBreakerPolicy(circuitBreakerAllowedExceptionCount, circuitBreakerWaitTimeSeconds);\r\n        allPolicies.Add(circuitBreakerPolicy);\r\n    }\r\n\r\n    \/\/ Note that here, \"Adt\" is used as a key to toggle chaos testing for the ADT SDK client\r\n    \/\/ This can be extended to other dependency types to isolate chaos testing for each\r\n    \/\/ dependency via configurable app settings.\r\n    if (string.Equals(chaosDependencyTesting, \"Adt\"))\r\n    {\r\n        var adtFaultPolicy = GetRequestFailedExceptionFaultPolicy(simmyInjectionRate, adtKey);\r\n        allPolicies.Add(adtFaultPolicy);\r\n    }\r\n\r\n    \/\/ If we ended up adding more policies, combine them - otherwise, just return no-op\r\n    if (allPolicies.Count &gt; 1)\r\n    {\r\n        adtPolicy = Policy.Wrap(allPolicies.ToArray());\r\n    }\r\n\r\n    policyRegistry\r\n        .Add(AdtPolicyName, adtPolicy);\r\n\r\n    return policyRegistry;\r\n}\r\n\r\nprivate static IAsyncPolicy GetRequestFailedExceptionFaultPolicy(double injectionRate, string serviceName)\r\n{\r\n    \/\/ Causes the policy to throw a RequestFailedException with a probability of {injectionRate}% if enabled\r\n    var fault = new RequestFailedException(500, $\"Simmy: {serviceName} Internal Status Error\");\r\n\r\n    var chaosExceptionPolicy = MonkeyPolicy.InjectExceptionAsync(with =&gt;\r\n      with.Fault(fault)\r\n        .InjectionRate(injectionRate)\r\n        .Enabled());\r\n\r\n    return chaosExceptionPolicy;\r\n}<\/code><\/pre>\n<p>The main takeaways from this sample:<\/p>\n<ul>\n<li>We&#8217;ve added parameters (that can be stored in the function app settings) for the injection rate of the Simmy exception policies as well as a key to toggle chaos testing for this dependency (in this case, <code>\"Adt\"<\/code>) &#8211; if this key is unset, no chaos-related policies are injected.\nThis is crucial for containing the blast radius of the chaos testing; it&#8217;s important that testing of this nature has minimal impact on your end users in a production environment.\nThis approach can also be extended if you wish to run chaos testing on multiple dependencies &#8211; by varying the key, tests for each dependency can be isolated, leading to cleaner tests.<\/li>\n<li>This version also includes a toggle for the circuit breaker policy, mostly to illustrate the use of <a href=\"https:\/\/github.com\/App-vNext\/Polly\/wiki\/NoOp\">no-op policies<\/a> but also for maintaining expected functionality in cases where we would want to use the same ADT SDK client but with different behavior for error handling and resilience.\nNo-op policies conform to the same expected format of a policy, but allow you to execute the underlying code without intervention, which is useful for maintaining data contracts across a project.\nThey can also be useful in unit tests where you want to create stubs for policy behavior without affecting functionality.<\/li>\n<li>Multiple policies can be wrapped into one with the <code>Policy.Wrap()<\/code> method.\nThis can be used as it is here for Polly\/Simmy in combination, as well as multiple Polly resilience strategies for a defense in depth approach.<\/li>\n<\/ul>\n<p>Now that we&#8217;ve implemented this chaos testing framework in our .NET code, we can vary the configuration of the Simmy fault injection to match each of the hypotheses to test.\nOnce that&#8217;s set, we can then just run our application normally with some simulated input (which can be done manually, via simple console app input generators, or automated load testing tools) and observe the resulting behavior via the application logs or monitoring tools.\nNote that Simmy exceptions are injected in place of the SDK call wrapped by the policy,\nmeaning that they are meant to represent errors thrown by the SDK after it has gone through its internal retry policies.\nIf we were able to inject Simmy exceptions at the layer that they would be able to be retried by the SDK, we could expect to see fewer exceptions logged.<\/p>\n<p>The Azure ADT .NET SDK was just one of the external dependencies we used as part of our customer project, and we&#8217;ve highlighted its usage here as a representative sample of this approach to chaos testing.\nIn our project, we implemented a version of this for each external SDK client we used, fine-tuning the resilience policies and fault injections based on the usage and error handling requirements of each one.\nFor us, this testing revealed opportunities for improvement to add more fine-grained error handling in our circuit breaker logic to handle transient HTTP errors and temporary rate limit throttling differently from others that were related more to input validation for ADT requests (the final result of which is shown in this sample);\nit also informed a more thoughtful custom error handling policy for a cache client that had different built-in retry configurations for its initial connection and cache operations.\nMost importantly, this careful observation and documentation across the failure scenarios we identified enabled us to test and validate our assumptions about how our application worked in the face of unexpected failures.\nBy using these results to identify bugs and make improvements where needed across our system, we could ensure that we were leaving our customer well-equipped to handle errors gracefully and guarantee system reliability to their end users.<\/p>\n<p>While this is by no means an exhaustive example, we hope that you can extend it, in conjunction with the Polly and Simmy documentation and Azure resources, to develop a strategy for resilience and chaos testing that&#8217;s best suited for your cloud application.<\/p>\n<h2>Conclusion<\/h2>\n<p>Resilience is a core tenet of building reliable cloud platforms, and in addition to Polly and Simmy for .NET, there are <a href=\"https:\/\/microsoft.github.io\/code-with-engineering-playbook\/automated-testing\/fault-injection-testing\/\">many tools available<\/a> for incorporating and testing resilience engineering into your cloud applications.\nIt&#8217;s always important to keep engineering fundamentals in mind when designing and implementing systems;\nrigorous testing of our software is critical not only for ensuring that we meet the expected behavior of our system, but also to foster a mindset of mindfulness about understanding all potential inputs and failure points of the system.\nBeing methodical and taking the time to systematically develop testing hypotheses helps to validate assumptions, promote code quality, and uncover the areas where your application can become more resilient overall.<\/p>\n<p>To see the code presented in this sample in context of the full data processing workflow, check out the <a href=\"https:\/\/github.com\/Azure-Samples\/aas-digital-factory\">AAS Digital Factory repo in Azure-Samples<\/a>.\nWe hope that you&#8217;ve found this to be a useful overview of some of the key concepts of resilience engineering and chaos testing and how Polly and Simmy can be used to achieve these in a real-world application.\nThis is obviously just one way of building and testing resilience in an Azure .NET application &#8211; we welcome any thoughts or feedback on ways you might have achieved this in your own cloud applications.<\/p>\n<h2>References and further reading<\/h2>\n<ul>\n<li><a href=\"https:\/\/github.com\/App-vNext\/Polly\/wiki\">Polly documentation<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/Polly-Contrib\/Simmy\">Simmy documentation<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/best-practices\/transient-faults\">Azure Architecture Center: guidance on transient fault-handling<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/framework\/resiliency\/chaos-engineering\">Azure Architecture Center: Chaos engineering<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/patterns\/retry\">Azure Architecture Center: the Retry pattern<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/patterns\/circuit-breaker\">Azure Architecture Center: the Circuit Breaker pattern<\/a><\/li>\n<li><a href=\"https:\/\/app.pluralsight.com\/course-player?clipId=3e5d5659-9e7a-4ecc-aeae-874b0c7f2dc9\">Pluralsight course &#8211; performing chaos in a serverless world<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/Azure-Samples\/aas-digital-factory\">Azure-Samples: AAS Digital Factory<\/a> shows how this approach to resilience and chaos testing presented can be put into practice, including all code snippets from this article.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>An overview of how to use the Polly and Simmy libraries for a hypothesis-driven resilience engineering and chaos testing approach to .NET Azure Functions.<\/p>\n","protected":false},"author":113738,"featured_media":14560,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[77,3380,3378,3381,3379,3382],"class_list":["post-14548","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cse","tag-azure-functions","tag-chaos-testing","tag-dotnet","tag-polly","tag-resilience","tag-simmy"],"acf":[],"blog_post_summary":"<p>An overview of how to use the Polly and Simmy libraries for a hypothesis-driven resilience engineering and chaos testing approach to .NET Azure Functions.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/14548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/113738"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=14548"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/14548\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/14560"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=14548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=14548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=14548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}