{"id":425,"date":"2026-05-28T10:00:00","date_gmt":"2026-05-28T17:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/aspire\/?p=425"},"modified":"2026-05-28T10:00:00","modified_gmt":"2026-05-28T17:00:00","slug":"hermetic-aspire-tests-chaos-studio","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/aspire\/hermetic-aspire-tests-chaos-studio\/","title":{"rendered":"How Azure Chaos Studio ships with hermetic Aspire end-to-end tests"},"content":{"rendered":"<p>End-to-end tests have a reputation problem.<\/p>\n<p>They&#8217;re slow, they&#8217;re flaky, and the moment you wire in a real cloud dependency, your CI builds become an exercise in waiting and retrying.\nMost teams I&#8217;ve talked to handle this one of two ways: they push everything down into integration tests and call it good (and lose confidence that the full system actually composes),\nor they spin up shared &#8220;dev&#8221; environments and run E2E against those (and accept the cross-talk, the flake, and the &#8220;who broke staging&#8221; group chat messages as the cost of doing business).<\/p>\n<p>We took a third path, and it&#8217;s paid off in a way I didn&#8217;t fully expect.<\/p>\n<h2>A quick word on what we&#8217;re building<\/h2>\n<p><a href=\"https:\/\/learn.microsoft.com\/azure\/chaos-studio\/\">Azure Chaos Studio<\/a> lets customers proactively break their own systems in safe, controlled ways \u2014 to validate that their resilience strategies actually hold up under real failure.\nUnder the hood we have four independently-shipped services: a control plane, an execution plane, a fault-execution plugin host, and a data plane.\nEach one is its own deployable, with its own dependencies, its own long-running-operation (LRO) semantics, and its own surface to test.<\/p>\n<p>We&#8217;re heading toward GA of our V2 platform later this year, and the velocity bar has gone up \u2014 particularly as the team has leaned heavily into agent-assisted development. We needed test coverage we could actually trust before we let agents make non-trivial changes.<\/p>\n<h2>The problem with our previous setup<\/h2>\n<p>For a long time, our end-to-end tests ran against shared environments. That&#8217;s a familiar story for anyone working on real cloud services:<\/p>\n<ul>\n<li>Even with stage-level serialization inside a single pipeline, separate release pipelines for each service all pointed at the same shared environment \u2014 and they ran in parallel<\/li>\n<li>Flake from real network conditions creeps in everywhere<\/li>\n<li>A regression in one service can block unrelated work<\/li>\n<li>Local repro is &#8220;spin up your own resource group and pray&#8221;<\/li>\n<\/ul>\n<p>We had a solid integration test layer too. Each service has its own <code>WebApplicationFactory<\/code>-style suite that spins the API up in-process, runs real HTTP through it, and stubs out the data and event collaborators with mocks. Those tests are fast, deterministic, and great at catching regressions inside a single service.<\/p>\n<p>But that&#8217;s exactly the limit. Integration tests of that shape can&#8217;t tell you whether the four services compose correctly \u2014 whether the LRO state machine in the control plane actually agrees with what the execution plane is polling for,\nwhether the auth flow holds across hops, whether a retry on one service surfaces sensibly two services downstream.\nThat&#8217;s the layer where most of our real bugs live, and that&#8217;s the layer we didn&#8217;t have a trustworthy story for.<\/p>\n<h2>Hermetic, ephemeral, per-test environments<\/h2>\n<p>About a year ago I read <a href=\"https:\/\/carloarg02.medium.com\/how-we-use-hermetic-ephemeral-test-environments-at-google-to-reduce-test-flakiness-a87be42b37aa\">this writeup on hermetic, ephemeral test environments<\/a>,\nand it stuck with me. The core idea: every test gets its own clean, isolated environment, brought up just for that test, with all dependencies running locally.\nNo shared state. No flake from neighbors. Failures are reproducible by construction.<\/p>\n<p>It&#8217;s a great vision. The hard part is getting there in a real system with deep cloud dependencies.<\/p>\n<p>Then a conversation with one of the .NET teams connected the dots for me: <a href=\"https:\/\/aspire.dev\/testing\/overview\/\"><code>Aspire.Hosting.Testing<\/code><\/a> was already most of the way there. Aspire already knows how to bring up your full service graph as a process tree. With the testing package, you can do it programmatically \u2014 from inside an xUnit fixture, on every PR, in your CI pipeline.<\/p>\n<h2>What the setup looks like<\/h2>\n<p>The hermetic model has three pieces:<\/p>\n<ol>\n<li><strong>The real service code<\/strong>, running as the real binaries, wired up the way it is in production.<\/li>\n<li><strong>Emulators or local stand-ins for external dependencies<\/strong> \u2014 Cosmos and Storage both have first-class emulator support in Aspire. Key Vault doesn&#8217;t have one out of the box, but James Gould&#8217;s excellent <a href=\"https:\/\/github.com\/james-gould\/azure-keyvault-emulator\">Azure Key Vault Emulator<\/a> plugs straight into Aspire&#8217;s <code>AddAzureKeyVault(...).RunAsEmulator()<\/code> flow.<\/li>\n<li><strong>A stub for anything that can&#8217;t be emulated.<\/strong> We use <a href=\"https:\/\/wiremock.org\/\">WireMock<\/a> for the one external dependency without a usable emulator.<\/li>\n<\/ol>\n<p>In Aspire, that&#8217;s all expressed as resources on the AppHost. The same model that drives our local dev loop drives our tests:<\/p>\n<pre><code class=\"language-csharp\">var builder = DistributedApplication.CreateBuilder(args);\n\n\/\/ Emulated dependencies\nvar cosmos = builder.AddAzureCosmosDB(\"cosmos\").RunAsEmulator();\nvar storage = builder.AddAzureStorage(\"storage\").RunAsEmulator();\n\n\/\/ Key Vault emulator comes from the community package\n\/\/ AzureKeyVaultEmulator.Aspire.Hosting (james-gould\/azure-keyvault-emulator)\nvar keyVault = builder.AddAzureKeyVault(\"kv\").RunAsEmulator();\n\n\/\/ Stub for the one dependency we can't emulate\nvar authStub = builder.AddContainer(\"auth-stub\", \"wiremock\/wiremock\")\n    .WithHttpEndpoint(targetPort: 8080, name: \"http\");\n\n\/\/ Our actual services\nvar controlPlane = builder.AddProject&lt;Projects.ControlPlane&gt;(\"control-plane\")\n    .WithReference(cosmos)\n    .WithReference(storage)\n    .WithReference(keyVault)\n    .WithReference(authStub.GetEndpoint(\"http\"));\n\nvar executionPlane = builder.AddProject&lt;Projects.ExecutionPlane&gt;(\"execution-plane\")\n    .WithReference(controlPlane);<\/code><\/pre>\n<p>The test fixture brings the whole graph up, hands the test a typed <code>HttpClient<\/code> for each service, and tears it all down when the test finishes:<\/p>\n<pre><code class=\"language-csharp\">public class HermeticFixture : IAsyncLifetime\n{\n    private DistributedApplication _app = null!;\n\n    public HttpClient ControlPlaneClient { get; private set; } = null!;\n\n    public async Task InitializeAsync()\n    {\n        var appHost = await DistributedApplicationTestingBuilder\n            .CreateAsync&lt;Projects.ChaosStudio_AppHost&gt;();\n\n        _app = await appHost.BuildAsync();\n        await _app.StartAsync();\n\n        ControlPlaneClient = _app.CreateHttpClient(\"control-plane\");\n        await _app.ResourceNotifications\n            .WaitForResourceHealthyAsync(\"control-plane\");\n    }\n\n    public async Task DisposeAsync() =&gt; await _app.DisposeAsync();\n}<\/code><\/pre>\n<p>That last <code>WaitForResourceHealthyAsync<\/code> call is one of those quietly important details. Tests don&#8217;t run until the service graph is actually ready, and &#8220;ready&#8221; means real health checks \u2014 not arbitrary sleeps that drift and flake.<\/p>\n<h2>What we&#8217;re actually testing<\/h2>\n<p>We&#8217;re up to roughly <strong>90 hermetic tests<\/strong>, and they cover meaningfully more than the original &#8220;do the services start up&#8221; check. The interesting ones are the scenario tests \u2014 end-to-end fault-injection flows driven through the real service graph:<\/p>\n<ul>\n<li>A <strong>zone-outage<\/strong> scenario, exercising the full LRO lifecycle from request through orchestration<\/li>\n<li>An <strong>identity-outage<\/strong> scenario, validating how the data plane behaves when an identity provider goes sideways<\/li>\n<li>A <strong>DNS-failure<\/strong> scenario, covering one of the trickiest classes of resilience bugs to catch in any other way<\/li>\n<li>A <strong>geo-replication-failure<\/strong> scenario, walking the cross-region paths end to end<\/li>\n<\/ul>\n<p>Each of those used to be a careful manual exercise in a shared environment. Now they run on every PR, in parallel, with no cross-talk.<\/p>\n<h2>The agent payoff<\/h2>\n<p>Here&#8217;s the part I genuinely didn&#8217;t see coming.<\/p>\n<p>When the team started leaning into agent-assisted development in earnest, this test suite quietly became our trust anchor. An agent can propose a meaningful refactor or a non-trivial feature, and we have a real signal \u2014 not just &#8220;the unit tests still pass&#8221; \u2014 that the change actually composes across services.<\/p>\n<blockquote>\n<p>Agents don&#8217;t have to be perfect. They have to be <strong>checkable<\/strong>.<\/p>\n<\/blockquote>\n<p>That distinction is the whole game. Perfection isn&#8217;t a realistic bar for any contributor, human or otherwise \u2014 and chasing it tends to slow the team down more than it helps. Checkability is. If the system can tell you, quickly and unambiguously, whether a proposed change holds up end to end, you can move fast and stay honest about it.<\/p>\n<p>Hermetic end-to-end tests turn out to be one of the highest-leverage checks you can give an agent, because:<\/p>\n<ul>\n<li>The feedback is <strong>structured<\/strong> \u2014 you can read the test output and see exactly what broke and where in the service graph<\/li>\n<li>The failure is <strong>reproducible<\/strong> \u2014 no &#8220;works on my machine&#8221; mystery, because there is no &#8220;my machine&#8221; state involved<\/li>\n<li>The signal is <strong>strong<\/strong> \u2014 these are real services exercising real flows, not mock-against-mock theater<\/li>\n<\/ul>\n<p>This isn&#8217;t a hypothetical. The last few months of our V2 push would have been a much scarier ride without it.<\/p>\n<h2>A few practical notes<\/h2>\n<p>If you&#8217;re considering something similar, a couple of things saved us time:<\/p>\n<ul>\n<li><strong>Start with one happy-path scenario, end to end.<\/strong> Don&#8217;t try to build a full test grid on day one. One working hermetic test is a much better foundation than a long list of half-wired ones.<\/li>\n<li><strong>Treat your AppHost as production code.<\/strong> Same resources, same wiring, same configuration shape. If your test AppHost drifts from your real one, your tests will quietly start lying to you.<\/li>\n<li><strong>Be honest about what you stub.<\/strong> A WireMock stub for a service you can&#8217;t emulate is fine \u2014 but write down what behavior you&#8217;re assuming, and revisit it when that service evolves.<\/li>\n<li><strong>Run them on every PR.<\/strong> Hermetic tests are only valuable as a feedback loop if they actually feed back. Ours run in Azure Pipelines on every change, and that&#8217;s where the velocity unlock really shows up.<\/li>\n<\/ul>\n<h2>Closing<\/h2>\n<p>Aspire didn&#8217;t just make hermetic testing possible for us \u2014 it made it the path of least resistance.<\/p>\n<p>If you&#8217;re building a distributed system and your end-to-end test story isn&#8217;t where you want it to be, give <a href=\"https:\/\/aspire.dev\/testing\/overview\/\"><code>Aspire.Hosting.Testing<\/code><\/a> a serious look. It&#8217;s quietly one of the most valuable things in the package.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how the Azure Chaos Studio team uses Aspire.Hosting.Testing to run hermetic, per-PR end-to-end tests across four services \u2014 with emulators, stubs, and no shared environments.<\/p>\n","protected":false},"author":214125,"featured_media":426,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,17],"tags":[25,26,9,60],"class_list":["post-425","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-aspire-category","category-deep-dives","tag-net","tag-apphost","tag-aspire","tag-hosting-integrations"],"acf":[],"blog_post_summary":"<p>Learn how the Azure Chaos Studio team uses Aspire.Hosting.Testing to run hermetic, per-PR end-to-end tests across four services \u2014 with emulators, stubs, and no shared environments.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/posts\/425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/users\/214125"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/comments?post=425"}],"version-history":[{"count":2,"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/posts\/425\/revisions"}],"predecessor-version":[{"id":432,"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/posts\/425\/revisions\/432"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/media\/426"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/media?parent=425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/categories?post=425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/aspire\/wp-json\/wp\/v2\/tags?post=425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}