{"id":913,"date":"2021-05-24T20:13:38","date_gmt":"2021-05-24T19:13:38","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/sustainable-software\/?p=913"},"modified":"2021-05-24T20:38:34","modified_gmt":"2021-05-24T19:38:34","slug":"the-carbon-monkey","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/sustainable-software\/the-carbon-monkey\/","title":{"rendered":"The Carbon Monkey"},"content":{"rendered":"<p><a href=\"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/CarbonMonkey.jpg\"><img decoding=\"async\" class=\"size-full wp-image-918 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/CarbonMonkey.jpg\" alt=\"Image CarbonMonkey\" width=\"800\" height=\"490\" srcset=\"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/CarbonMonkey.jpg 800w, https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/CarbonMonkey-300x184.jpg 300w, https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/CarbonMonkey-768x470.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p>Can Chaos Engineering help our sustainable software engineering goals?<\/p>\n<p>According to <a href=\"https:\/\/principlesofchaos.org\/\">https:\/\/principlesofchaos.org\/<\/a>, Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system\u2019s capability to withstand turbulent conditions in production. I have followed this discipline through the years finding it fascinating, especially when applied to large scale applications and systems. As the site explains:<\/p>\n<p><em><span style=\"font-size: 10pt;\">\u201cEven when all of the individual services in a distributed system are functioning properly, the interactions between those services can cause unpredictable outcomes. Unpredictable outcomes, compounded by rare but disruptive real-world events that affect production environments, make these distributed systems inherently chaotic. <\/span><\/em><\/p>\n<p><em><span style=\"font-size: 10pt;\">We need to identify weaknesses before they manifest in system-wide, aberrant behaviors. Systemic weaknesses could take the form of improper fallback settings when a service is unavailable; retry storms from improperly tuned timeouts; outages when a downstream dependency receives too much traffic; cascading failures when a single point of failure crashes; etc. <\/span><\/em><em><span style=\"font-size: 10pt;\">We must address the most significant weaknesses proactively, before they affect our customers in production. <\/span><\/em><\/p>\n<p><em><span style=\"font-size: 10pt;\">We need a way to manage the chaos inherent in these systems, take advantage of increasing flexibility and velocity, and have confidence in our production deployments despite the complexity that they represent. An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering.\u201d<\/span><\/em><\/p>\n<h3><span style=\"font-size: 14pt;\">Build a Hypothesis around Steady State Behavior<\/span><\/h3>\n<p>Let\u2019s start with the first step: a steady state behavior is the condition our application should aspire to be in. If we translate this principle into a sustainable one, this becomes the most beautiful and efficient state of an application: <em>one where no energy is wasted, and efficiency and performance is at its best<\/em>.<\/p>\n<p><img decoding=\"async\" class=\"size-large wp-image-915 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/pexels-pixabay-235990-1024x576.jpg\" alt=\"Image pexels pixabay 235990\" width=\"640\" height=\"360\" srcset=\"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/pexels-pixabay-235990-1024x576.jpg 1024w, https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/pexels-pixabay-235990-300x169.jpg 300w, https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/pexels-pixabay-235990-768x432.jpg 768w, https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/pexels-pixabay-235990-1536x864.jpg 1536w, https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-content\/uploads\/sites\/60\/2021\/05\/pexels-pixabay-235990.jpg 1920w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/p>\n<p>The most difficult part is how to measure and set this initial state. My colleagues have shared numerous\u00a0ideas on the <a href=\"https:\/\/aka.ms\/sse\/blog\">Sustainable Software Engineering<\/a> blog that might help you jumpstart your measurement. However, I feel that at some point, this will have to reach a standardized and widely accepted form where we have a \u201ccarbon limit\u201d where an application is considered inefficient and not sustainable.<\/p>\n<h3><span style=\"font-size: 14pt;\">Vary Real-world Events<\/span><\/h3>\n<p>This is the principle that represents how close chaos engineering and sustainable software engineering are. There is no steady and predictable flow of energy coming from the same renewable source. From the challenging big picture of using solar, wind or hydro energy down to when we plug our device into the outlet, we still have limited ways to retrieve exactly how the energy that is powering the device is produced in that exact moment in time (considering things like seasonality, time of day, peak hours, weather conditions that trigger renewable power supplies usage.) The variables around this concept are too many!<\/p>\n<p>Imagine now that your application is running on a virtual datacenter where you have even less information of its carbon impact. We still need to start somewhere, though, and set an amount of carbon usage for the application. This will be useful to measure its increase and decrease to drive efficiency.<\/p>\n<p>Back to chaos engineering. Simulating power outages is just a start. We can think of it as the starting point for a sustainable application:<\/p>\n<ul>\n<li>What if the renewable power sources are suddenly unavailable and therefore, I have spikes of energy consumption that I could not foresee even in the greenest application?<\/li>\n<li>What if at some point my application has become a \u201ccarbon monster,\u201d greedy with energy because a query has gone wrong and it\u2019s suddenly taking most of its energy just to search for that item in your cart? Or because at some point the network path has changed due to an outage in the network route and its latency spikes? And so, trying to replicate real-life energy events directly into an application will make it more resilient to lower energy availability and overall, more efficient.<\/li>\n<\/ul>\n<p>This concept is a \u201ccarbon\u201d monkey: a process or system that triggers energy inefficiencies at random, testing how your application reacts, and measuring differential performance that can relate to the differential carbon impact.<\/p>\n<p>We have given the problem of how to measure an application\u2019s carbon efficiency a lot of thought, but this might represent a change of perspective. Instead of measuring how much it consumes, we should test adding energy events to see how the application behaves, then driving change to improve its reaction to such events that might make it less green. We won\u2019t have a carbon impact exact measurement, only a differential. With time, while other systems allow us to retrieve more precise energy consumption metrics, this differential can become an absolute number. Meanwhile, let the carbon monkey help us reducing impact regardless of the metric standardization!<\/p>\n<p>I\u2019d like to see an action from the developers communities on creating one or more \u201ccarbon monkeys\u201d that can introduce energy-impacting events into applications, to foster resiliency towards sustainability. The main trigger is defining a set of incorrect assumptions about energy usage that can prevent our application from performing \u201cgreen\u201d, such as the highest energy cost\/carbon use\/region, the shortest\/longest queries, the shortest\/longest network paths, the highest compute and memory usage, etc\u2026 These assumptions should then be introduced by an automated process (our monkey) that will make sure that the application patterns are resilient enough to overcome those issues without completely failing. At the end of the run, we could set up a carbon resiliency value that can help set a standard for the application carbon impact differential evaluation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>applying chaos engineering to sustainable software to find a sweet spot that allows a differential measurement of the carbon impact of an application<\/p>\n","protected":false},"author":38845,"featured_media":918,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[91,97,96,22],"tags":[74,145,73,143,63,24,144,82],"class_list":["post-913","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-applications","category-architecture","category-concepts","category-sustainable-software-engineering","tag-carbon-intensity","tag-carbon-monkey","tag-carbon-aware","tag-chaosengineering","tag-software-engineering","tag-sse","tag-sustainable-software","tag-sustainable-software-engineering"],"acf":[],"blog_post_summary":"<p>applying chaos engineering to sustainable software to find a sweet spot that allows a differential measurement of the carbon impact of an application<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/posts\/913","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/users\/38845"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/comments?post=913"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/posts\/913\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/media\/918"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/media?parent=913"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/categories?post=913"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/sustainable-software\/wp-json\/wp\/v2\/tags?post=913"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}