{"id":9761,"date":"2007-02-11T08:06:43","date_gmt":"2007-02-11T08:06:43","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/bharry\/2007\/02\/11\/managing-quality-part-4-stress-testing\/"},"modified":"2018-08-14T00:34:18","modified_gmt":"2018-08-14T00:34:18","slug":"managing-quality-part-4-stress-testing","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/bharry\/managing-quality-part-4-stress-testing\/","title":{"rendered":"Managing Quality (part 4) &#8211; Stress Testing"},"content":{"rendered":"<p>The goal of our stress testing is to run an application under load for an extended period of time and capture all &#8220;failures&#8221;.&nbsp; The purpose is to uncover race conditions, long term resource leaks, and bugs that only occur as the result of unexpected sequences or combinations of operations.&nbsp; Mostly we focus on server stress testing, but some teams do some client stress testing (using automated GUI tests) to find similar problems in client logic.<\/p>\n<p>People use the term &#8220;Stress Testing&#8221; in different ways.&nbsp; The Windows team uses the term &#8220;Stress Testing&#8221; to mean running systems at resource exhaustion (memory, disk, etc) and making sure the system handles it properly.&nbsp; There&#8217;s a 90% overlap with what we do but a slide variation in purpose.<\/p>\n<h3>Considerations for Stress Testing<\/h3>\n<p><strong>Tests<\/strong> &#8211; Like Load Testing (described in my last post in this series), we use the VSTS for Testers product to simulate many users executing randomly selected tests.&nbsp; In fact, we use exactly the same set of tests&nbsp;in Stress testing as we do in Load testing.&nbsp; While in Load Testing, you choose the distribution of tests based on as close a simulation of the &#8220;real world&#8221; mix as you can get, in Stress testing you&nbsp;artificially inflate the frequency of rare and disruptive tests.&nbsp; As a result our Stress testing test mix is different than Load testing.<\/p>\n<p><strong>Measurements<\/strong> &#8211; The primary thing you are looking for with Stress testing are systemic failures.&nbsp; It&#8217;s not functional testing &#8211; you aren&#8217;t trying to verify the results of the tests.&nbsp; You don&#8217;t even care if the data was correct.&nbsp; I see people get very confused over this point.&nbsp; You have to remember that every kind of testing you do has a purpose.&nbsp; All testing does not test everything.&nbsp; Let functional testing do what it does (determine correctness of operations) and let stress testing do what it does (find rarely occurring &#8220;catastrophic&#8221; problems, leaks, etc).<\/p>\n<p>In stress testing, you monitor:<\/p>\n<ul>\n<li>Responses to look for exceptions, deadlocks,&nbsp;or other indications of&nbsp;catastrophic failures.<\/li>\n<li>Tests per second &#8211; both to convince yourself that the system is performing a reasonable workload (if it isn&#8217;t running very many tests, you probably won&#8217;t get much good data) and to&nbsp;watch tests per second for unexpected shapes in the graph &#8211; reduction over time indicates some kind of leak; extreme spikes, drop-outs, etc imply some kind of resource contention or undesirable interaction between the tests.<\/li>\n<li>Key performance counters &#8211; memory, cpu utilization, etc.&nbsp; Again looking for long term trends.<\/li>\n<li>The event log to look for indications that the system under test is experiencing problems you can&#8217;t observe from the responses.<\/li>\n<li>Sometimes we run our stress tests with the debugger attached so that we can break immediately when an exception happens and have a better shot at determining the cause.<\/li>\n<\/ul>\n<p><strong>Failure resolution<\/strong>&nbsp;&#8211; One thing to look out for with stress testing is bug handling.&nbsp; There is strong tendency to resolve bugs as &#8220;no repro&#8221; because, they don&#8217;t actually reproduce at will.&nbsp; You don&#8217;t want to let go of an occurrence until you are certain you&#8217;ve exhausted every possibility to identify the cause.&nbsp; If you see a pattern of stress bugs being resolved &#8220;no repro&#8221;, then you have a problem.&nbsp; I&#8217;ve generally found that problem to be one of two things &#8211; not enough understanding on the part of developers of the importance of pursuing the cause relentlessly or not enough instrumentation to help identify the cause.&nbsp; Attaching a debugger before the run starts can sometimes help.&nbsp; Sometimes, when we find stress bugs we can&#8217;t isolate, we add specific instrumentation to the code aimed at helping identify it and run with that in future runs.<\/p>\n<p><strong>Load profile<\/strong> &#8211; We pick a load (# of simulated users)&nbsp;that we know runs the server at about 70% utilization and run a constant load profile at that level.&nbsp; The goal is not to drive the&nbsp;system under test&nbsp;to saturation, but rather just to keep it very busy the whole time.&nbsp; Unlike Load testing, we don&#8217;t ramp the load up over time because we are not measuring the increasing response times &#8211; in Stress testing, we don&#8217;t care about the response times.<\/p>\n<p><strong>Frequency\/Duration<\/strong> &#8211; We run &#8220;short haul&#8221; and &#8220;long haul&#8221; stress testing.&nbsp; A short haul run is an 8 hour run and starts at night and completes by morning &#8211; ready for analysis.&nbsp; We generally run short haul runs on every day\/build.&nbsp; A long haul run is 120 hours (5 days).&nbsp; We generally run long haul runs on builds that have good short haul pass rates.&nbsp; The reason for both is that you need the cycle time of being able to fix bugs and pick up new build that you get with short haul runs.&nbsp;&nbsp;But certain kinds of problems (particularly resource leaks) don&#8217;t always show up in an 8 hour run and 120 hours helps.&nbsp; In the past we&#8217;ve done some math to equate how much simulated calendar time a long haul run (120 hours at high load) represents for a &#8220;typical&#8221; team.&nbsp; With a reasonable set of assumptions &#8211; it generally comes out to be months but I don&#8217;t obsess on that question.&nbsp; We&#8217;ve just determined 120 hours works out to be a good number and more than that doesn&#8217;t help much.&nbsp; When I worked on the .NET Framework team, we experimented with &#8220;ultra long haul testing&#8221;&nbsp;&#8211; running for 5 weeks but it didn&#8217;t yield much new information.<\/p>\n<p><strong>Development cycle<\/strong> &#8211; We generally start short haul stress testing early in the cycle.&nbsp; In my opinion, the earlier the better.&nbsp; The kinds of problems found in stress testing are hard to isolate and debug and the closer you find them to their introduction the better you are.&nbsp; We generally don&#8217;t start long haul testing until around our final Beta.&nbsp; There&#8217;s no point doing it until short haul is passing at a very high rate &#8211; because it&#8217;s just going to find the same problems.&nbsp; Because of the cycle time, anything you can find using short haul is better &#8211; you get to try fixes much faster.<\/p>\n<p><strong>Execution infrastructure<\/strong>&nbsp;&#8211; You don&#8217;t need particularly high end hardware.&nbsp; You just need something that can run your tests and achieve a solid load.&nbsp; Whereas in Load testing, we run the system under test &#8220;pristinely&#8221;, using separate clients for the load agents, in Stress testing it&#8217;s different.&nbsp; We generally run the load agent on the same physical machine as the code under test.&nbsp; You don&#8217;t care if the load on the server is affected by the test infrastructure and combining things allows you to conserve hardware and run more instances; remember &#8211; let each kind of testing do what it&#8217;s designed to.<\/p>\n<p>For short haul testing, we generally have 10-15 &#8220;pods&#8221; (collections of machines) that run stress tests.&nbsp; For long haul testing, we generally use 3-5.&nbsp; There are a mix of topologies among the pods &#8211; single server\/dual server, domain\/workgroup, etc.&nbsp; I&#8217;ll be talking about test matrices in a future post in this series.&nbsp; You want to make sure you have plenty of redundancy in your execution infrastructure.&nbsp; There are a few reasons:<\/p>\n<ul>\n<li>The bugs you are going after are rare and you want to clock lots of simulated hours to make sure you are finding as much as you can.<\/li>\n<li>More common failures tend to hide less common ones &#8211; so it&#8217;s not unusual for several runs to end in the same failure.&nbsp; By having many machines you have more chance of hitting some of the less common ones.<\/li>\n<li>It&#8217;s not uncommon to have to take a pod out of rotation for a day or more for detailed investigation of a failure &#8211; you don&#8217;t want to stop all of your testing when this happens.<\/li>\n<\/ul>\n<p><strong>Misc<\/strong>&nbsp;&#8211; There is some debate about the value of running debug vs release builds.&nbsp; The advantage of debug builds is that you get the benefits of the asserts in your code.&nbsp; A\n disadvantage is that the debugging code (asserts, etc) can affect&nbsp;the timing and cause you to miss failures.&nbsp; I am a fan of running both debug and release builds &#8211; but opinions vary on this topic.<\/p>\n<h3>Stress Testing Reports<\/h3>\n<p>At the highest level, we track the overall stress runs build by build, noting # of runs, # of new bugs found, # of passes and tests per second.&nbsp; This gives the 10,000 foot view of how you are progressing&#8230;<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D%5B7%5D.png\"><img decoding=\"async\" style=\"border-right: 0px;border-top: 0px;border-left: 0px;border-bottom: 0px\" height=\"350\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D_thumb%5B3%5D.png\" width=\"648\" border=\"0\"><\/a> <\/p>\n<p>We have daily reports that include more information on the runs of that day&#8230;<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D%5B11%5D.png\"><img decoding=\"async\" style=\"border-right: 0px;border-top: 0px;border-left: 0px;border-bottom: 0px\" height=\"207\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D_thumb%5B5%5D.png\" width=\"751\" border=\"0\"><\/a> <\/p>\n<p>And the status of bugs that have recently been found by stress testing&#8230;<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D%5B15%5D.png\"><img decoding=\"async\" style=\"border-right: 0px;border-top: 0px;border-left: 0px;border-bottom: 0px\" height=\"393\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D_thumb%5B7%5D.png\" width=\"748\" border=\"0\"><\/a> <\/p>\n<p>Of course we also use the reports generated by VSTS for Testers that show Tests\/sec, perf counters, etc.&nbsp; I don&#8217;t have one handy at the moment (because we don&#8217;t send them around in email and I&#8217;m at home right now :)) but I&#8217;ve included a generic screen shot from MSDN&#8230;<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D%5B19%5D.png\"><img decoding=\"async\" style=\"border-right: 0px;border-top: 0px;border-left: 0px;border-bottom: 0px\" height=\"385\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/image%7B0%7D_thumb%5B9%5D.png\" width=\"450\" border=\"0\"><\/a> <\/p>\n<h3>Conclusion<\/h3>\n<p>Stress testing is an important part of any critical application.&nbsp; It enables you to identify and fix hard to reproduce bugs.&nbsp; There&#8217;s a significant investment in infrastructure, process and training to get going seriously with it but it pays dividends.&nbsp; Visual Studio for Testers is a good tool that can get you a great start (shameless product plug here :)).<\/p>\n<p>I hope this was useful to you.&nbsp; Until next time&#8230;<\/p>\n<p>Brian<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The goal of our stress testing is to run an application under load for an extended period of time and capture all &#8220;failures&#8221;.&nbsp; The purpose is to uncover race conditions, long term resource leaks, and bugs that only occur as the result of unexpected sequences or combinations of operations.&nbsp; Mostly we focus on server stress [&hellip;]<\/p>\n","protected":false},"author":244,"featured_media":14617,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[5],"class_list":["post-9761","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-tfs"],"acf":[],"blog_post_summary":"<p>The goal of our stress testing is to run an application under load for an extended period of time and capture all &#8220;failures&#8221;.&nbsp; The purpose is to uncover race conditions, long term resource leaks, and bugs that only occur as the result of unexpected sequences or combinations of operations.&nbsp; Mostly we focus on server stress [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/9761","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/users\/244"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/comments?post=9761"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/9761\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media\/14617"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media?parent=9761"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/categories?post=9761"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/tags?post=9761"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}