Managing Quality (part 7) - Code Coverage

Code coverage is a multi-edged sword 🙂 There is no one right answer for how to do it and there are many ways to misuse it. Here I’ll talk about how to think about code coverage and then talk about what we do and reports we use to track it.

Let’s say your test pass rate is 99.9%. Is that good? Well, maybe. It’s not nearly enough information to know. Are the tests testing the primary user scenarios? Do you have enough tests? Are you missing testing on entire modules? Code coverage is one tool in the toolshed to help answer the question “Is that good?”

Code coverage data is collected while your tests are run (both automated and manual) and records what blocks in your code are covered (and not covered). It can be used in many scenarios:

To determine what parts of your code are not getting covered and allow you to make intelligent decisions about whether or not to do additional testing to cover that code.
To determine when you have too many tests – lots of tests that test basically the same code and don’t cover distinct scenarios.
Combined with pass rate, defect rate and code churn info, help prioritize where do direct your testing efforts.
Identify “dead” code that can never be called so it can be removed.

How much is enough?

Code coverage is often used as a metric to determine whether sufficient testing has been done to “release” software. The age old question is “How much is enough?” One size does not fit all. Except in the most extreme cases, 100% is not the right answer. Think about what would be involved in 100%. You’d have to write tests to execute every single exception handler, every single branch in your code. Your code probably includes machine generated code – for example, web service SOAP proxies that you don’t even use. There are other example of “dead” code that is hard or inadvisable to remove.

So, if it’s not 100%, what is it? We use a minimum bar of 70%, with the vast majority of that coming from automated tests. We only use manual testing coverage to bridge gaps in our automated testing. Would more be better? Maybe. One thing to keep in mind is that every automated test you have comes with a tax – you have to write it, run it and maintain it indefinitely (but this is the subject of a whole different blog post – so I’ll try to get to that one in the next few months).

Be careful how you use code coverage. No amount of code coverage will tell you if you have a well tested app. This is because there are tons of bugs that can’t be flagged by code coverage. The problem in these cases is not that the code isn’t being covered but rather that the data variation isn’t sufficient. Buffer overruns are a very simple example of bugs that can still exist in code that has 100% code coverage. This fact really hit me most as we were building TFS and I realized how susceptible SQL is to this problem. Running every query/stored procedure tells so little about the correctness of your SQL code. SQL depends so much on the data you have in the database and the values you query for.

I’ve seen teams drive for exceptionally high code coverage numbers. Like everything else in this world, the closer you want to get to “perfect”, the harder it is. It’s much easier to go from 20%->30% than from 60%->70% and even harder to go from 80%->90%. Grinding out test after test to cover every block of code becomes an exercise of increasing effort and decreasing return (in terms of useful bugs found). At some point you are better off directing that extra effort at alternative testing approaches. For us, we’ve decided 70% is the min bar. Making that the min bar, of course means that our average is a bit higher than that.

The DevDiv code coverage methodology

As I mentioned before, we focus primarily on automated testing code coverage. The reason is that we have to test on such a wide range of configurations, we must be able to rely on our automated tests to find the vast majority of problems. We break things down at the DLL/Assembly level and our metrics are based around the percentage of component that have 70% or better code coverage. Our primary report looks like this and shows % of components and the % code coverage of that component by coverage band, trended by build:

We classify each componentby % code coverage bands and track the trends over time. By the time we ship Orcas, the bar should be 100% green. Our goal for Beta 1 was 75% of components over 70% code coverage. Our goal for Beta 2 is 90% of components over 70% code coverage and our goal for RTM is 100% of components over 70% code coverage.

We then break that down by feature team and factor in the pass rate on each run. Obviously, it lots of tests are failing, it’s going to affect your coverage numbers.

And then we enumerate each of the binaries that are failing the 70% min bar and identify action plans for each.

Summary

Code coverage is a very useful tool in helping to spot areas of your code that need additional testing. Use it that way. Don’t be a slave to code coverage. Nothing replaces the value of having a person think through all of the scenarios the software supports and describing an appropriate test plan. The code coverage data is a tool to help you refine your test plan and identify areas you may not be thinking about.

Brian

Managing Quality (part 7) – Code Coverage

How much is enough?

The DevDiv code coverage methodology

Summary

Author

0 comments

Leave a commentCancel reply

Read next

Teamprise announces a Java SDK for Team Foundation Server

What's up with Orcas Beta 1?