AzureFunBytes Episode 36 - Intro to Chaos Engineering with Ana Margarita Medina!

AzureFunBytes is a weekly opportunity to learn more about the fundamentals and foundations that make up Azure. It’s a chance for me to understand more about what people across the Azure organization do and how they do it. Every week we get together at 11 AM Pacific on Microsoft LearnTV and learn more about Azure.

This week I welcomed Ana Margarita Medina, Senior Chaos Engineer and Developer Advocate from Gremlin to discuss Chaos Engineering on Azure. I have been really lucky to become friends with Ana over the last few years, she’s so dedicated to helping this Chaos community!

Ana even provided us with a link to get some Gremlin Stickers! How cool is that?

2:33 – Intro.
5:53 – Let’s Meet Ana
11:26 – The Principles of Chaos Engineering.
18:01 – What’s Ana’s definition of Chaos Engineering?
20:56 – Chaos Engineering is the thoughtful, planned experiments designed to reveal a weakness in our systems.
25:59 – How to do Chaos Engineering.
31:12 – No time for excuses!
33:54 – We don’t need to break things. They break on their own!
36:21 – We test proactively, instead of waiting for an outage.
37:31 – Experimenting on Azure Kubernetes Service.
44:27 – Viewing impact via Azure Monitor.
50:21 – Should Chaos Engineering be part of our DevOps Pipeline?

What exactly is Chaos Engineering? Well the Principles of Chaos Engineering paper defines it as so:

Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.

Sometimes we can prepare for the worst. This includes creating plans that help us mitigate failure, but much of that failure is difficult to predict in the context of a deployed application. Rather than leave things to chance, Chaos Engineering looks to increase the resiliency of your IT solutions by creating failure in planned scenarios. These scenarios can be part of larger “game days” that seek out to find single points of failure, determine impact across the application, and allow teams to solve problems before they occur in production.

Testing in production can be critical for the long-term success of your application because you apply failure in real-time. The Microsoft Docs page on Chaos Engineering recommends applying this methodology when you are:

Deploying new code.
Adding dependencies.
Observing changes in usage patterns.
Mitigating problems.

Join us for a discussion on Chaos Engineering, covering the tools, practices, and metrics you need to implement Chaos Engineering in your Azure Kubernetes environment. You’ll learn how you can use Chaos Engineering to modernize safely, ensure reliability, and reduce downtime.

Learn about Azure fundamentals with me! You can also find the recordings here as well:

AzureFunBytes on Twitch
AzureFunBytes on Twitter
AzureFunBytes on YouTube
Azure DevOps YouTube Channel

Useful docs:

Microsoft Learn: Introduction to Azure Fundamentals
What is Kubernetes?
Azure Kubernetes Service (AKS) Docs
Microsoft Azure Well-Architected Framework
Overview of the performance efficiency pillar
Chaos engineering on Microsoft Docs
Principles of Chaos Engineering
Gremlin Inc.
Gremlin Stickers!