Why Metrics Matter

Engineering teams make decisions every day about where to invest their time. Without objective measurements, those decisions rely on gut feeling, anecdotal evidence, or whichever part of the codebase caused the most recent production incident.

Quality metrics change this. They provide a shared language for discussing codebase health and a foundation for prioritising improvements. But not all metrics are equally useful, and some are actively misleading if interpreted naively.

Here are seven metrics that, taken together, give engineering teams genuine visibility into the health of their codebases.

1. Test Coverage (and Its Limits)

Test coverage measures the percentage of your code that is exercised by automated tests. It is typically expressed as line coverage, branch coverage, or a combination of both.

What it measures: How much of your code has at least one test that runs through it.

Why it matters: Code without test coverage is code you cannot refactor with confidence. When you change untested code, you are relying on manual verification to catch regressions.

The important caveat: High coverage does not mean high quality. A codebase with 95% coverage can still have terrible tests. If the tests assert that functions return what they return (tautological tests) or if they mock everything including the thing being tested, the coverage number is meaningless. We explore this in more detail in why your test coverage number is misleading.

The real value of test coverage is as a floor, not a ceiling. A team with 20% coverage knows it has significant blind spots. A team with 70% coverage has a reasonable foundation. Beyond 80%, the returns diminish rapidly, and the remaining untested code is often genuinely difficult to test (error handlers, third-party integrations, infrastructure glue).

Track coverage trends rather than absolute numbers. A codebase where coverage is steadily increasing is healthier than one where it is static or declining.

2. Cyclomatic Complexity

Cyclomatic complexity measures the number of independent paths through a function or module. Each if, else, switch case, for, while, and logical operator adds a path.

What it measures: How difficult a piece of code is to understand, test, and maintain.

Why it matters: Functions with high cyclomatic complexity are disproportionately likely to contain bugs. They are harder to review, harder to test exhaustively, and harder to modify without introducing regressions.

A function with a complexity of 5 has five distinct paths, which is manageable. A function with a complexity of 25 has twenty-five paths, meaning you need at least twenty-five test cases for full branch coverage. In practice, such functions are rarely tested adequately.

Most teams set a threshold between 10 and 15. Functions exceeding this threshold should be candidates for refactoring, typically by extracting helper functions or using early returns to reduce nesting. For practical advice on keeping complexity in check, see our guide on how to measure codebase maintainability.

3. Dependency Count

Dependency count is the total number of third-party packages your project depends on, including both direct and transitive dependencies.

What it measures: Your project's exposure to external code that you do not control.

Why it matters: Every dependency is a trust relationship. You are trusting the maintainer to keep the package secure, compatible, and maintained. Each dependency also increases build times, bundle sizes, and the surface area for supply-chain attacks.

The raw number matters less than the trend. A project that adds ten dependencies per month is accumulating risk faster than one that adds two. Similarly, a project with 200 dependencies where 50 are unmaintained is in worse shape than one with 300 dependencies that are all actively maintained.

Pay particular attention to transitive dependencies. Your project might directly depend on 30 packages, but those 30 packages might collectively pull in 800 transitive dependencies. Vulnerabilities anywhere in that tree affect your application. For a comprehensive look at dependency risks, read keeping your dependency supply chain healthy.

4. Security Vulnerability Count

This metric counts known vulnerabilities in your dependencies, categorised by severity (critical, high, medium, low).

What it measures: Your exposure to publicly disclosed security issues in the packages you depend on.

Why it matters: Dependency vulnerabilities are one of the most common attack vectors in modern software. Databases like the Open Source Vulnerabilities (OSV) database and GitHub Advisory Database catalogue thousands of known issues, and attackers actively exploit them.

The key distinction is between direct and transitive vulnerabilities. A critical vulnerability in a package you import directly is more urgent than one buried four levels deep in a transitive dependency that your application might never exercise.

Track this metric continuously, not just at audit time. New vulnerabilities are disclosed daily, and a codebase that was clean last week might have critical issues today. Automated CI/CD quality checks can catch new vulnerabilities before they reach production.

5. Documentation Coverage

Documentation coverage measures whether key project artefacts are documented: README completeness, environment variable documentation, API documentation, and inline code comments for complex logic.

What it measures: How easy it is for a new team member (or your future self) to understand and work with the codebase.

Why it matters: Poor documentation is the silent killer of engineering velocity. When developers cannot find answers in documentation, they read source code, ask colleagues, or guess. All three are slower and more error-prone than reading clear documentation.

Specific things to measure include whether the README covers setup instructions, whether environment variables are documented with descriptions and example values, whether API endpoints have documented request and response formats, and whether complex business logic has explanatory comments. Our article on why documentation is a code quality signal explores this topic further.

Documentation that exists but is outdated can be worse than no documentation at all. The most valuable documentation metrics also check for staleness, flagging docs that reference files or configurations that no longer exist.

6. Circular Dependencies

A circular dependency exists when module A imports from module B, which imports from module C, which imports from module A (or any longer chain that forms a cycle).

What it measures: Architectural integrity and module boundary health.

Why it matters: Circular dependencies make code harder to reason about, harder to test in isolation, and harder to refactor. They often indicate that module boundaries are poorly defined or that responsibilities are leaking between layers.

In JavaScript and TypeScript projects, circular dependencies can also cause runtime issues. Depending on the module system and bundler, circular imports can result in undefined values at import time, leading to subtle bugs that are difficult to diagnose. For a deeper look at this problem, see understanding circular dependencies.

The ideal count is zero. Any circular dependency is worth investigating, though not all are equally problematic. A cycle between two closely related utility files is less concerning than a cycle between your data layer and your presentation layer.

7. Code Churn

Code churn measures how frequently files are modified, particularly when the same files are changed repeatedly in a short time window.

What it measures: Which parts of your codebase are unstable or undergoing repeated rework.

Why it matters: High churn in a file often indicates one of several problems: the original implementation was incomplete, requirements are changing frequently, or the code is a "god file" that too many features depend on.

Files with both high churn and high complexity are particularly risky. They change often (increasing the chance of introducing bugs) and are complex (making bugs harder to spot). This intersection is often a strong signal for technical debt that needs managing.

Code churn is also useful for identifying change coupling, where two or more files almost always change together. Strong coupling between files in different modules suggests that the module boundaries are not well drawn.

Track churn over rolling windows (30 days is typical) rather than cumulative totals, so the metric reflects current activity rather than historical patterns.

Frequently Asked Questions

How many metrics should a team track at once?

Start with three or four that address your biggest pain points, then add more as your tooling and processes mature. Tracking all seven from day one can overwhelm a team that is new to quality measurement.

What is a good target for test coverage?

There is no universal answer, but 70% to 80% line coverage is a reasonable target for most teams. The trend matters more than the absolute number. A codebase improving from 40% to 60% is in a healthier position than one stuck at 80% with brittle tests.

Should metrics be enforced in CI or just reported?

Both, but at different stages. Start by reporting metrics to build awareness. Once the team is comfortable, enforce thresholds on critical metrics like security vulnerabilities and cyclomatic complexity in your CI/CD pipeline.

How often should teams review their metrics?

Weekly reviews of trends are ideal. A brief weekly check takes ten to fifteen minutes and catches regressions before they compound into larger problems.

Bringing It All Together

No single metric tells the full story. Test coverage without considering test quality is misleading. Dependency counts without vulnerability data miss the point. Complexity without churn data lacks context.

The real value comes from tracking all seven metrics together, over time, and watching for trends. A codebase where coverage is rising, complexity is falling, vulnerabilities are being addressed, and documentation is improving is a codebase that is getting healthier, regardless of the absolute numbers.

Start by establishing baselines. Measure where you are today across all seven domains. Then set modest improvement targets and review progress weekly. The teams that consistently measure and respond to quality metrics build better software, ship with more confidence, and spend less time fighting fires.