The Comfort of a High Number

There is something deeply reassuring about seeing 90% test coverage on a project dashboard. It suggests thoroughness. It implies that the code has been carefully verified. It makes stakeholders feel confident that the software works.

But test coverage, as a single number, is one of the most misunderstood metrics in software engineering. A high coverage percentage does not mean the code is well tested. A low percentage does not necessarily mean it is poorly tested. The number, on its own, tells you remarkably little about the quality of your test suite.

What Coverage Actually Measures

Test coverage measures which lines (or branches, or statements) of your source code are executed when your test suite runs. If a function has ten lines and your tests cause eight of them to execute, that function has 80% line coverage.

Notice what this does and does not tell you. It tells you that the code was executed. It does not tell you that the code was verified. A test that calls a function but never checks its return value still counts towards coverage. A test that exercises a code path but asserts nothing meaningful about its behaviour still counts towards coverage.

This distinction is critical. Execution is not verification.

How Teams Game Coverage

When coverage targets are imposed as hard requirements, teams find ways to hit the number without necessarily improving test quality. This is not always deliberate. Sometimes it is simply the path of least resistance.

Tests without meaningful assertions

The simplest way to inflate coverage is to write tests that call the code without checking the results. A test that invokes a function and asserts only that it "does not throw" exercises every line in the function but verifies almost nothing about its behaviour. Coverage goes up. Confidence should not.

Testing trivial code

Another common pattern is writing extensive tests for getters, setters, constructors and other trivial code that is unlikely to contain bugs. These tests are easy to write, quick to pass and contribute to the coverage number. But they provide little value because the code they test has little opportunity to be wrong.

Avoiding complex scenarios

Coverage rewards breadth over depth. A test suite that touches every function once scores higher than one that deeply tests the three most critical functions. This creates an incentive to spread testing thinly across the codebase rather than concentrating effort where it matters most.

Line Coverage vs Branch Coverage

Most coverage tools report line coverage by default. But line coverage has a significant blind spot: conditional logic.

Consider a function with an if/else statement. If your test only exercises the "if" branch, line coverage may report the function as partially covered. But if the else branch is a single line, the difference between 90% and 100% line coverage might represent an entirely untested code path that handles error conditions.

Branch coverage addresses this by measuring whether each branch of each decision point has been executed. It is a more granular metric and catches gaps that line coverage misses. But it still suffers from the same fundamental limitation: it measures execution, not verification.

A function with 100% branch coverage and no assertions is no better tested than one with 0% coverage. The code ran. Nobody checked what it did.

What Coverage Misses

Assertion quality

Coverage tools have no way to evaluate whether your assertions are meaningful. A test that asserts expect(result).toBeDefined() contributes the same coverage as one that asserts expect(result).toEqual({ id: 1, name: "Alice", status: "active" }). The first catches almost no bugs. The second catches many.

Edge cases

Coverage measures whether a code path was executed, not whether it was tested with representative inputs. A function that processes user input might have 100% coverage from a test that passes a single, well-formed string. But what about empty strings, null values, extremely long inputs, special characters or unicode? These edge cases are where bugs hide, and coverage provides no signal about whether they have been tested.

Integration behaviour

Unit test coverage tells you that individual functions work in isolation. It tells you nothing about whether those functions work correctly together. Many of the most serious bugs occur at boundaries between components: incorrect data formats passed between functions, race conditions in concurrent code, unexpected state mutations that affect downstream behaviour.

Error handling

Error paths are frequently undertested because they are harder to trigger. Coverage might show that the happy path is thoroughly exercised while the error handling code has barely been touched. If your coverage number is 85%, the untested 15% likely includes the code that runs when things go wrong, which is precisely when correct behaviour matters most.

A More Honest Approach to Coverage

None of this means coverage is useless. It is a valuable signal when interpreted correctly. The key is to use it as one input among many, not as a definitive measure of test quality.

Use coverage to find gaps, not to prove quality

Coverage is most valuable as a detection tool. Low coverage on a critical module is a clear signal that more testing is needed. But high coverage on the same module is not proof that testing is sufficient. Think of coverage as a necessary but not sufficient condition.

Measure branch coverage, not just line coverage

Branch coverage gives a more accurate picture of how thoroughly conditional logic has been tested. If you are only tracking line coverage, you are missing important gaps in your test suite.

Pair coverage with mutation testing

Mutation testing is a technique that modifies your source code in small ways (changing a > to >=, replacing true with false, removing a function call) and checks whether your test suite detects the change. If a mutation is introduced and all tests still pass, the test suite has a blind spot.

Mutation testing directly measures verification, not just execution. It is more computationally expensive than coverage analysis, but it provides a far more accurate picture of test quality.

Look at coverage trends, not absolutes

A coverage number in isolation is less useful than a coverage trend. Is coverage increasing or decreasing over time? Are new features being shipped with tests, or is coverage eroding with each release? The direction matters more than the position.

Focus testing effort on high-risk code

Not all code deserves the same level of testing. Payment processing, authentication, data validation and security-sensitive code warrant deep, thorough testing. Internal utility functions and display formatting can tolerate lighter coverage. Allocate testing effort based on the cost of failure, not the pursuit of a uniform percentage.

The Metrics That Matter

Coverage is one signal. Here are others worth tracking:

Test failure rate. How often do tests fail for reasons other than genuine bugs? A high false-positive rate erodes trust in the suite.

Assertion density. How many assertions does the average test contain? Tests with more meaningful assertions catch more bugs.

Test execution time. A slow test suite discourages developers from running tests frequently, which reduces their effectiveness regardless of coverage.

Defect escape rate. How many bugs reach production that should have been caught by tests? This is the ultimate measure of test suite effectiveness.

Coverage Is a Starting Point

The danger of test coverage is not that it is wrong. It is that it is incomplete. It measures one dimension of test quality and ignores several others. When it becomes the sole metric by which test quality is judged, teams optimise for the number rather than for the outcome the number is supposed to represent.

Use coverage. Track it. Set reasonable baselines. But do not mistake a high number for a well-tested codebase. The question is not "how much of my code is executed by tests?" It is "how confident am I that my tests will catch the next bug before my users do?" Those are very different questions, and only one of them matters.