People often talk about the percentage of code coverage they have from unit tests, whether it be actual coverage or the goal they or their project have set. Having a high coverage percentage is often seen as a quality, but code coverage is not really a metric that gives much value in itself. In this post I want to investigate the different types of code coverage that exists and address some of the problems with code coverage.
Types of Code Coverage
There are various forms of code coverage, not just the one we seem to always talk about. A short introduction to some of the existing coverage types:
Statement coverage – How many percent of statements / lines have been covered?
Branch coverage – How many percent of branches (control structures) have been covered? In the case of an if statement, has the expression evaluated to both true and false. There are variations of branch coverage that talk about more detailed decision coverage, like covering all permutations of true and false in an expression that contain stuff like AND and OR.
Path coverage – How many percent of all possible paths have been covered? This may not sound too different from the two above, but it is actually much more complicated. An example could be a function with two simple if statements (A and B) after each other. To obtain full path coverage, you would have to get all four permutations of true and false in the two if statements (A B - !A B - A !B - !A !B) whereas in statement coverage you would only have to exercise the if statements in isolation. Mind you that this is the simple case – if statements with multiple sub expressions, nested ifs and loops are much worse.
The coverage type we usually talk about is actually statement coverage – but as you can see from just looking at branch and path coverage, much is left to be desired. Coverage types like path coverage are also hard to measure, as many loops will have a near-infinite number of paths.
Another thing that is very hard to measure with code coverage is multi-threaded behavior. It is definitely not caught in our statement coverage and if getting full path coverage is hard, consider getting full path coverage with multiple threads.
One Problem - Test Quality
The problem with using code coverage on its own is that code coverage tells you nothing about the quality of the tests that cover the code. Code coverage tells you how much of your code has actually been executed, but it tells you nothing about the asserts that were made in the tests – that is, it says nothing about the correctness of the code.
The perfect example of this is writing a test with zero asserts (state or interaction). This test will produce a certain percentage of coverage, while ensuring nothing except that the code can successfully execute. While such sanity tests can be useful in some tricky cases with exceptions, this is often a worthless test - it has no quality whatsoever, it doesn't verify any intent of the programmer who wrote the code it is testing.
The thing that is easy to sell about code coverage is that is is reasonably easy to measure. It is easy to set a percentage, a goal and then try to obtain this goal. 100% code coverage is a great goal – it might be unrealistic in most situations, but it is a good thing to strive towards. But if the tests suck or the programmers who write the tests become lazy the code coverage will be nothing but misleading.
The problem with test quality is that measuring it is really really hard – at least programmatically. How can you even begin to measure how much this code matches the intent of the programmer. Usually the number of asserts that make sense for a given test match only a very low number of the values in the system. And even if we could measure test quality using a program, we might as well do away with the tests and just measure programmer intent versus the real program.
The solution, to me, is discipline and good engineering principles. Code coverage can be really valuable to a team that treat their test code like production code, sharing ownership of the code and doing regular inspections / code reviews to ensure that the test quality is high.
The Inverse Property
One of the things I like best about code coverage is it's quality as an inverse property - that is, a tool that can specifically tell me which parts of my code that I have not tested. This is a clear signal that you need to be more careful when touching this code.
This is also one of the reasons that I actually like to remove tests that either are very low quality or haven’t kept up to date with the intent of the code they are testing. The ideal solution is to rewrite the tests to match the intent of the code / desired quality, but this is not always realistic. To me, such tests are more harmful than no tests at all – they give a false sense of security and will only confuse if anyone look at them. At least if there is no tests my coverage tool can tell me there is a problem in this part of the system.
And while we are at it, it is actually amusing that some people treat test code like it is nowhere near production code and then the same people seem to think that it is blasphemy to delete even a single test. Treat your test code like your normal code, delete / rewrite it if it is obsolete or doesn’t make sense. Maintaining code that doesn’t add value to the system makes no sense.
Code coverage is a useful metric, but often not in isolation. If you use it then be aware of the implications of the way you’re using it, increasing test quality is much more valuable to me than reaching a specific coverage percentage, although it is good to have some sort of goal. Write tests to verify intent, not to increase the coverage percent – use coverage to find the intents not covered.