{"id":13145,"date":"2017-06-28T09:31:10","date_gmt":"2017-06-28T14:31:10","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/bharry\/?p=13145"},"modified":"2019-02-27T06:26:22","modified_gmt":"2019-02-27T06:26:22","slug":"testing-in-a-cloud-delivery-cadence","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/bharry\/testing-in-a-cloud-delivery-cadence\/","title":{"rendered":"How we approach testing VSTS to enable continuous delivery"},"content":{"rendered":"<p>I like to write, from time to time about our experiences, successes, failures and learnings from delivering Visual Studio Team Services (VSTS), a large scale service, on a cloud delivery cadence.\u00a0 My most recent post reflected on how cool it is to be able to <a href=\"https:\/\/devblogs.microsoft.com\/bharry\/what-does-an-agiledevops-organization-look-like\/\">deliver customer fixes within a day or two<\/a>.\u00a0 And I&#8217;ve written many times about our practice of delivering all our work to production every sprint (or, in some cases, even more often).<\/p>\n<p>Usually my posts are sparked by something I see happen.\u00a0 Today I got an email about progress on our efforts towards reliable tests and it made me think about sharing it.<\/p>\n<p>When we first started the journey towards accelerated delivery we began by accelerating all the processes we&#8217;d had in place for years.\u00a0 It quickly became apparent that doing that would never get us to where we wanted to be and we had to be prepared to do things very differently.\u00a0 In the intervening 6 years or so, we have gone through many transformations &#8211; how we plan, track progress, deploy, manage feedback, monitor, architect, develop and test.\u00a0 Change has been constant over the 6 years and we&#8217;re no where near finished &#8211; I&#8217;m not sure we&#8217;ll ever finish.\u00a0 If we&#8217;d tried to do it all at once. I&#8217;m sure we would have failed.\u00a0 Taking it one or two key practices at a time has worked out well for us and allowed us to bring the team and the code base along for the ride.\nA little over 2 years ago, we realized that one of our biggest remaining impediments to our goal of &#8220;continuous delivery&#8221; was test &#8211; everything about test: our org, roles, frameworks, tests, harnesses, analysis, &#8230;<\/p>\n<p>Two years ago, we had 10&#8217;s of thousands of tests.\u00a0 They were written by &#8220;testers&#8221; to test code written by &#8220;developers&#8221;.\u00a0 While there were some advantages of this model &#8211; like clearly measurable and controllable investment in test, expertise and career growth in the testing discipline, etc. there were also many disadvantages &#8211; lack of accountability on the developers, slow feedback cycle (introduce bug, find bug, fix bug), developers had little motivation to make their code &#8220;testable&#8221;, divergence between code architecture and test architecture made refactoring and pivoting very hard\/expensive, and more.<\/p>\n<p>A very high percentage of our tests were end-to-end functional &#8220;integration tests&#8221;.\u00a0 Often they automated UI or command line interfaces.\u00a0 This meant that they were very fragile to small\/cosmetic changes and were very slow to run.\u00a0 Because UI code isn&#8217;t really designed to be testable there were often random timing issues and the test code was littered with &#8220;Sleep(5000)&#8221; to wait for the UI to reach a steady state.\u00a0 Not only was that incredibly fragile (sometimes the UI would take a while &#8211; network hiccup or something), it also contributed greatly to the tests taking a very long time to run.<\/p>\n<p>The result of all of this is that full testing would take the better part of a day to run, many more hours to &#8220;analyze the results&#8221; to identify false failures and days or weeks to repair all the tests that were broken due to some legitimate change the in the product.<\/p>\n<p>So, 2 years ago, we started on a path to completely redo testing.\u00a0 We combined the dev and test orgs into a consolidating &#8220;engineering&#8221; org.\u00a0 For the most part, we eliminated the distinction between people who code and people who test.\u00a0 That&#8217;s not to say every person does an identical amount of each, but every person does some of everything and is accountable for the quality of what they produce.\u00a0 We also set out to completely throw away our 10&#8217;s of thousands of tests that took 8 years to create and replace them with new tests that were done completely differently.<\/p>\n<p>We knew we needed to\u00a0reduce our reliance on\u00a0fragile, slow, expensive UI automation tests.\u00a0 We created a taxonomy to help us think about different &#8220;kinds&#8221; of tests:<\/p>\n<ul>\n<li><strong>L0<\/strong> &#8211; An L0 test is a classic Unit Test.\u00a0 It exercises an API.\u00a0 It has no dependencies on the product being installed.\u00a0 It has no state other than what&#8217;s in the test.<\/li>\n<li><strong>L1<\/strong> &#8211; An L1 is like a Unit Test, except it can rely on SQL Server being in the environment.\u00a0 Our product is very SQL Server dependent and, in my opinion, trying to mock SQL is unwise\/impractical for the depth of dependency that we have.\u00a0 Also, of course, a bunch of our code is in SQLServer stored procs and we need to test that too.<\/li>\n<li><strong>L2<\/strong> &#8211; An L2 test is written against a fully deployed TFS\/Team Services &#8220;instance&#8221; but with some key things mocked out.\u00a0 The mocking is done to simplify testing and eliminate fragility.\u00a0 The best example is that we mock out authentication so we don&#8217;t have to deal with creating test identities, managing secrets, etc.\u00a0 Some L2s are UI automation but only a smallish percentage.<\/li>\n<li><strong>L3<\/strong> &#8211; An L3 test is an end-to-end functional test against a production TFS\/VSTS instance.\u00a0 You might call it &#8220;Testing in Production&#8221;\u00a0 Many of these are UI automation.\u00a0 The truth is that we&#8217;ve only recently gotten to the point that we&#8217;re ready for rolling out L3 tests and we only have a few.\u00a0 Over time, the count will grow some but it will always be a tiny fraction of L0 and L1.<\/li>\n<\/ul>\n<p>Early in the process, we created this diagram to demonstrate what we were after in the transformation.\u00a0 TRA, BTW, stood for &#8220;Tests Run Anywhere&#8221; &#8211; that&#8217;s what we called our last generation testing framework and it was an advance over the previous generate where tests could only run in controlled lab environments (developers couldn&#8217;t run the tests on their own).\n<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Test-transformation.png\"><img decoding=\"async\" class=\"alignnone wp-image-13146\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Test-transformation.png\" alt=\"\" width=\"434\" height=\"317\" \/><\/a>\nWe&#8217;re probably 95% done with the transition now and here&#8217;s where we are today:\n<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Counts-by-type-table.png\"><img decoding=\"async\" class=\"alignnone wp-image-13165\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Counts-by-type-table.png\" alt=\"\" width=\"291\" height=\"111\" \/><\/a>\u00a0\u00a0\u00a0\u00a0 \u00a0<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Counts-by-type.png\"><img decoding=\"async\" class=\"alignnone wp-image-13155\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Counts-by-type.png\" alt=\"\" width=\"223\" height=\"246\" \/><\/a>\nWe run the L0 and L1 tests as part of every Pull Request &#8211; so every checkin gets that much validation.\u00a0 We then run rolling runs of L2s all day.\u00a0 A big part of that time, btw is installing and configuring an instance to test against.\u00a0 We haven&#8217;t established a consistent practice for running L3&#8217;s though, I expect, they will be run as part of every release definition to validate post-deployment.<\/p>\n<p>This is all coupled, of course, with other process changes (like feature flags and ring based deployment).\u00a0 I&#8217;m focusing on testing here but you really can&#8217;t separate them.\u00a0 We couldn&#8217;t really rely on Test in Production the way we are headed without also doing ring based deployments, for instance.\nBut this whole post was kicked off by a mail I got about test reliability &#8211; so everything above here is really context \ud83d\ude42<\/p>\n<p>The changes above go a long way to helping test reliability &#8211; but doesn&#8217;t solve it 100%.\u00a0 Tests are still code.\u00a0 Code has bugs.\u00a0 Tests can fail for even when the code they are testing is working correctly.\u00a0 The most insidious form of these are &#8220;flakey&#8221; tests &#8211; tests that pass sometimes and fail others.\u00a0 Years ago, we used to have a bug resolution called &#8220;pass on re-run&#8221;.\u00a0 And that meant a test failed and someone went to debug it and every time they ran the test, it passed so they just resolved the bug.\u00a0 I used to rant about how bad this was.\u00a0 There&#8217;s a bug there &#8211; it might be a product bug and it might be a test bug but it&#8217;s a bug and it needs to get fixed.<\/p>\n<p>Flakey tests also erode developer confidence in the tests.\u00a0 If a test fails and you are pretty sure your changes can&#8217;t have affected it, you have an urge to ignore the failure.\u00a0 The problem is sometimes your change did break it or maybe it&#8217;s a latent bug that the team allows to perpetuate because the tests aren&#8217;t &#8220;trustworthy&#8221;.\u00a0 It&#8217;s also just plain inefficient to be constantly dealing with rejected runs and debugging sessions that yield nothing due to flakey tests.\u00a0 It must be the case that when a test fails, the vast majority of the time, there really is a product bug to go fix.<\/p>\n<p>Over the past many months, we&#8217;ve been instituting a formal test reliability process.\u00a0 Our test reliability runs are rolling runs that run 24&#215;7.\u00a0 A reliability run picks the latest successful CI build, runs all the tests and looks at the results.\u00a0 Any test that fails is considered flakey (because it previously passed on the same build).\u00a0 The test is disabled and a bug is filed.\u00a0 When the run completes, it again picks the most recent succeeding CI build (might be the same one if there&#8217;s not a newer one) and repeats.\u00a0 Once the bug associated with a flakey test is resolved, the test is automatically re-enabled.<\/p>\n<p>Here&#8217;s a graph of our test run reliability over the past year\u00a0 We rolled out the new reliability system in sprint 116.\n<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Test-Reliability.png\"><img decoding=\"async\" class=\"alignnone wp-image-13175\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/Test-Reliability-1024x404.png\" alt=\"\" width=\"686\" height=\"271\" \/><\/a><\/p>\n<p>Now you might say, &#8220;well, all you&#8217;ve showed me is that you disabled a bunch of tests in sprint 116.&#8221;.\u00a0 Clearly cutting a bunch of test coverage would be a bad outcome.\u00a0 The system tracks resolution time on these bugs and in the sprint 116 -&gt; 119 time period, there were 170 test reliability bugs with an average resolution time of 3.75 days &#8211; so tests weren&#8217;t disabled for too long.<\/p>\n<p>Now, as you may guess, not all the bugs the reliability system finds are test bugs &#8211; sometimes they are race conditions in the product itself.\u00a0 It&#8217;s a small percentage but we&#8217;ve definitely seen some.\u00a0 In my book the opportunities to fix those is even better.<\/p>\n<p>Right now, the reliability system is only rolled out for L2 tests.\u00a0 There aren&#8217;t many L3s yet and the L2s are, by their nature, much less reliable than L0&#8217;s and L1&#8217;s so we started there.<\/p>\n<p>Overall, this effort to completely redo our test system over the past 2+years has been a massive investment.\u00a0 Every single sprint many feature teams across my team invested time in this.\u00a0 In some sprints it was most of what a feature team did.\u00a0 I&#8217;d hate to even try to calculate the total cost but we couldn&#8217;t go where we are trying to go with the business without doing it, so I know, in the long term it was worth it.\u00a0 I have to admit that 18 months into it and lots of missed opportunity cost later, I started to agitate about when we were going to be &#8220;done&#8221; with it.\u00a0 It was a bit nerve fraying to see it continue to drag on but, it was a big investment and not every team did the work at the same time.\u00a0 And I kept reminding myself how much better it was going to be when we were done.<\/p>\n<p>While we&#8217;re still not completely done with this transition, we&#8217;re close enough that I think of us as done.\u00a0 No feature teams are reporting work on this as a significant part of their work in their sprint mails and we are starting to reap the benefits in improved quality, agility and engineer satisfaction.\u00a0 As this has been drawing to a close, we&#8217;ve already started our next big engineering investment &#8211; containerization of all of our services.\u00a0 We think the benefits here are going to be innumerable, including improving our tests &#8211; by making test deployments faster, easier to do compat testing and more.\u00a0 Once we&#8217;ve made a bit more progress on that, I&#8217;ll write something up.<\/p>\n<p>No engineering team should ever stop investing in improving their engineering systems.<\/p>\n<p>UPDATE: Oh and one more point I meant to make&#8230;\u00a0 As we wrap up this work, we are looking at how to ship this automation and workflow in the VSTS\/TFS product so it will be a little easier for you all to implement it than it was for us.\u00a0 I hope to get something on the <a href=\"https:\/\/www.visualstudio.com\/en-us\/articles\/news\/features-timeline\">published roadmap <\/a>before too long.\nUPDATE:\u00a0I don&#8217;t like doing updates because people don&#8217;t get notified about the change but I stumbled across some additional interesting data.\nI was reading one of our Monthly service reviews and saw this piece of data about my team&#8217;s test runs:\n<span style=\"margin: 0px; font-family: 'Courier New';\"><span style=\"color: #000000;\">o<\/span><span style=\"font: 7pt 'Times New Roman'; margin: 0px;\"><span style=\"color: #000000;\">\u00a0\u00a0\u00a0\u00a0<\/span><\/span><\/span><span style=\"color: #000000; font-family: Calibri;\">All of the VSTS L0\/L1\/L2\/L3 tests are now using MSTest V2.\u00a0 They clock ~450 runs per day with each run having ~45300 tests (typical working day). That is in the order of 20 Million test executions per day.<\/span><\/p>\n<p>Another piece of data I came across was a chart showing the migration of our &#8220;old&#8221; tests to &#8220;new&#8221; tests over more than a 2 year period (each sprint is 3 weeks).\u00a0 The data is a little &#8220;dirty&#8221;.\u00a0 There were more &#8220;old tests&#8221; discovered along the way so the initial count is actually lower than reality.\u00a0 Old tests, because they were much more heavy weight also tended to exercise more surface area where as &#8220;new&#8221;\u00a0tests are more focused and therefore more numerous.\u00a0 Gold is the old TRA tests and various shades of blue are the L0\/L1\/L2\/L3 tests.\n<a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/TestTrend.png\"><img decoding=\"async\" class=\"alignnone size-large wp-image-13195\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/8\/2019\/02\/TestTrend-1024x158.png\" alt=\"\" width=\"879\" height=\"136\" \/><\/a><\/p>\n<p>Thanks,\nBrian<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I like to write, from time to time about our experiences, successes, failures and learnings from delivering Visual Studio Team Services (VSTS), a large scale service, on a cloud delivery cadence.\u00a0 My most recent post reflected on how cool it is to be able to deliver customer fixes within a day or two.\u00a0 And I&#8217;ve [&hellip;]<\/p>\n","protected":false},"author":244,"featured_media":14617,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[9],"class_list":["post-13145","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-vs-team-services"],"acf":[],"blog_post_summary":"<p>I like to write, from time to time about our experiences, successes, failures and learnings from delivering Visual Studio Team Services (VSTS), a large scale service, on a cloud delivery cadence.\u00a0 My most recent post reflected on how cool it is to be able to deliver customer fixes within a day or two.\u00a0 And I&#8217;ve [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/13145","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/users\/244"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/comments?post=13145"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/posts\/13145\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media\/14617"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/media?parent=13145"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/categories?post=13145"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/bharry\/wp-json\/wp\/v2\/tags?post=13145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}