{"id":5793,"date":"2007-04-02T12:11:00","date_gmt":"2007-04-02T12:11:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/vcblog\/2007\/04\/02\/the-cost-of-a-test-defect\/"},"modified":"2019-02-18T18:54:32","modified_gmt":"2019-02-18T18:54:32","slug":"the-cost-of-a-test-defect","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/the-cost-of-a-test-defect\/","title":{"rendered":"The Cost of a Test Defect"},"content":{"rendered":"<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Hi, I&rsquo;m Marina Polishchuk, an SDET on the VC++ Compiler Front End team.<span>&nbsp; <\/span>Recently, I have been working on investigating Orcas test pass results, which involves validating and reproducing the results, researching into causes of failures, and filing bugs when appropriate.<span>&nbsp; <\/span>This is a typical activity for an SDET throughout the latter stages of our product cycle, when we most heavily focus on trying to uncover any lingering product issues in our supported features\/scenarios.<span>&nbsp; <\/span>It is a fact that the longer a bug persists in the source, the more costly it becomes: hence, finding bugs early on during Beta testing is very important so they can be fixed in a timely manner.<span>&nbsp; <\/span>However, the ability to detect product bugs rests crucially on having accurate test results in one&rsquo;s possession, which brings up the often overlooked type of bug: a test defect.<span>&nbsp; <\/span>The cost of a test defect, much like that of a product bug, depends largely on the duration of time before it is found&mdash;for example, whether it is found before or after a real product regression occurs and is not caught because the associated test case is buggy. In this blog post, I will talk about several kinds of test defects I&rsquo;ve encountered, steps to ensure a product issue is not hiding behind the failing test, and ideas on how to better detect certain types of test defects.<span>&nbsp; <\/span><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">An &ldquo;easy&rdquo; test bug to be faced with is a test that no longer works due to an intended behavior change in the compiler.<span>&nbsp; <\/span>I call this bug &ldquo;easy&rdquo; in the sense that it is straightforward to detect: &ldquo;regressed&rdquo; tests (passing tests that start to fail from one version of the compiler to the next) will be examined by a human, who will determine that the cause is benign, and the tests need updating.<span>&nbsp; <\/span>Ideally, tests are patched right away to account for the new behavior; if the affected number is large, we may allow them to fail until fixes can be applied, and associate a test bug in our tracking system to the failures so everyone on the team is aware that the failures are expected.<span>&nbsp; <\/span><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Another type of test result often seen during beta testing is a false positive: a &ldquo;failed&rdquo; result that is not indicative of incorrect compiler behavior.<span>&nbsp; <\/span>False positives may arise from incorrect machine or product configurations, dependencies on other failing tests, test commands run with insufficient user privileges, etc.<span>&nbsp; <\/span>False positives are diagnosed either by examining logs or re-running the test if logs are insufficient.<span>&nbsp; <\/span>Re-creating the original test environment is nontrivial and generally infeasible for factors such as machine stress during a full suite run or non-determinism when tests are executed concurrently on multiple processors.<span>&nbsp; <\/span>If I cannot recreate the failure, a general course of action is to wait and see whether it manifests on subsequent test runs with the same parameters, possibly expanding its logging output to ease configuration failure diagnoses.<span>&nbsp; <\/span>In my view, false positives are also benign with respect to product quality, unless of course one considers the time taken away from discovering real defects while investigating spurious errors.<span>&nbsp;&nbsp;&nbsp; <\/span><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font face=\"Calibri\" size=\"3\">Finally, a highly distressing type of test result is a false negative: a passing result from a test that exercises a bug in the compiler, but fails to report it.<span>&nbsp; <\/span>Since code coverage of the compiler is the same whether a test reports &ldquo;1&rdquo; or &ldquo;0&rdquo; at the end, and testers generally do not perform much investigating of passing results, such test failures often go undetected for a long time.<span>&nbsp; <\/span>A phenomenon I find interesting with respect to bug-finding tools at large is that they generally throw away information associated with affirmative results.<span>&nbsp; <\/span>Within our test harnessing, we avoid logging test execution details for tests that report &ldquo;pass&rdquo; by default.<span>&nbsp; <\/span>Given the current practices, this is quite reasonable: no one will investigate passing test results, so why preserve logs that no one will be looking at?<span>&nbsp; <\/span>However, when one reasons about causes of failures solely from logging information, details of how the test last passed become important.<span>&nbsp; <\/span>In particular, saving test traces regardless of result type opens up opportunities for better diagnosis of false negatives: rather than reporting isolated test results per test run, a more advanced harness might consider previous results to contextualize test results from run to run.<span>&nbsp; <\/span>Taking &ldquo;pass&rdquo; and &ldquo;fail&rdquo; results only, six new result types may be derived from considering result pairs, as follows:<\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\">\n<table class=\"MsoNormalTable\" cellspacing=\"0\" cellpadding=\"0\" border=\"1\">\n<tbody>\n<tr>\n<td class=\"\" valign=\"top\" width=\"74\">\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">Run<sub>1<\/sub><\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">Run<sub>2<\/sub><\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"119\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">Log Differences<\/p>\n<p><\/font><\/font><\/b><\/p>\n<p class=\"MsoNormal\"><b><span><\/p>\n<p><font face=\"Calibri\">&nbsp;<\/font><\/p>\n<p><\/span><\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"\" valign=\"top\" width=\"74\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">(i)<\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">pass<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">pass<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"119\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">same<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"\" valign=\"top\" width=\"74\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">(ii)<\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">pass<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">fail<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"119\">\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"\" valign=\"top\" width=\"74\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">(iii)<\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">fail<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">pass<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"119\">\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"\" valign=\"top\" width=\"74\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">(iv)<\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">fail<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">fail<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"119\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">different<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"\" valign=\"top\" width=\"74\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">(v)<\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">pass<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">pass<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"119\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">different<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"\" valign=\"top\" width=\"74\">\n<p class=\"MsoNormal\"><b><font size=\"3\"><font face=\"Calibri\">(vi)<\/p>\n<p><\/font><\/font><\/b><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">fail<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"80\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">fail<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<td class=\"\" valign=\"top\" width=\"119\">\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">same<\/p>\n<p><\/font><\/font><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p class=\"MsoNormal\">&nbsp;<\/p>\n<\/p>\n<p class=\"MsoNormal\"><font size=\"3\"><font face=\"Calibri\">Result types (i) and (vi) would not need to be examined, as they definitely have no associated product behavior changes (presumably, (vi) has been investigated previously).<span>&nbsp;&nbsp; <\/span>Result type (ii) is a regression that would be investigated as usual.<span>&nbsp; <\/span>Result type (iii) could be used to verify that a new passing result corresponds to a bug fix in the product and not a manifestation of a test bug.<span>&nbsp; <\/span>Result (iv) could be used to examine a test case for possible coverage of several compiler bugs: it may already be failing due to Bug #1, but may have started to fail differently due to Bug#2.<span>&nbsp; <\/span>Currently, this test case would not effectively detect Bug#2, as Bug#1 will have been associated to it as the cause for its failure, deeming it an &ldquo;expected failure&rdquo; that does not need to be re-analyzed.<span>&nbsp; <\/span>Result (v) is most notable in that it could be used to find false positives.<span>&nbsp; <\/span>For example, a few months ago, a small number of our tests were found to return &ldquo;pass&rdquo; even when the compiler was not installed.<span>&nbsp; <\/span>With either result (iii) or (v), this bug would have been caught at the first configuration failure, rather than sporadically later on.<span>&nbsp; <\/span>Result (v) may also expose logic errors: if a test prints &ldquo;Value=4&rdquo; during one run, and &ldquo;Value=2147942402&rdquo; on the next run, both passing, then its behavior may be worth double-checking.<span>&nbsp; <\/span>Of course, logging information would have to be designed with differencing in mind, handling run-specific values (such as pointer values and timing information) specially.<span>&nbsp; <\/span>If anyone has tried this type of test result analysis, I would be curious to hear about your experiences.<span>&nbsp;&nbsp; <\/span><\/font><\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font face=\"Calibri\" size=\"3\">If you have feedback, comments, or questions about the discussion in this blog, please post them here! <\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font face=\"Calibri\" size=\"3\">Thanks,<\/font><\/p>\n<p class=\"MsoNormal\">\n<p><font face=\"Calibri\" size=\"3\">&nbsp;<\/font><\/p>\n<\/p>\n<p class=\"MsoNormal\"><font face=\"Calibri\" size=\"3\">Marina Polishchuk<\/font><\/p>\n<p class=\"MsoNormal\"><font face=\"Calibri\" size=\"3\">Visual C++ Compiler Front End QA Team<\/font><\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hi, I&rsquo;m Marina Polishchuk, an SDET on the VC++ Compiler Front End team.&nbsp; Recently, I have been working on investigating Orcas test pass results, which involves validating and reproducing the results, researching into causes of failures, and filing bugs when appropriate.&nbsp; This is a typical activity for an SDET throughout the latter stages of our [&hellip;]<\/p>\n","protected":false},"author":289,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-5793","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cplusplus"],"acf":[],"blog_post_summary":"<p>Hi, I&rsquo;m Marina Polishchuk, an SDET on the VC++ Compiler Front End team.&nbsp; Recently, I have been working on investigating Orcas test pass results, which involves validating and reproducing the results, researching into causes of failures, and filing bugs when appropriate.&nbsp; This is a typical activity for an SDET throughout the latter stages of our [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/5793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/289"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=5793"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/5793\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=5793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=5793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=5793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}