Back to blog

How to Measure Codebase Maintainability

The Challenge of Measuring Maintainability

Every developer has an intuitive sense of whether a codebase is maintainable. You open a project, read a few files, and within minutes you know whether this is going to be pleasant or painful to work with. The challenge is turning that intuition into something measurable.

Maintainability matters because it directly affects how quickly a team can deliver changes. A highly maintainable codebase lets developers understand existing code, make changes confidently, and ship without unexpected side effects. A poorly maintainable codebase turns every change into an archaeology project where the developer spends more time understanding the code than modifying it.

The good news is that maintainability, while subjective in its experience, is composed of concrete, measurable signals. None of these signals is sufficient on its own. Together, they build a reliable picture of how easy or difficult a codebase is to work with.

The Signals That Matter

File Size

Large files are harder to understand, navigate, and modify. A file with 2,000 lines of code requires significant mental effort just to hold its structure in your head. When multiple developers need to work on the same file, merge conflicts become frequent.

There is no universal threshold for "too large," but research and industry experience suggest that files over 300 lines start to become harder to maintain, and files over 500 lines are a strong signal of code that should be decomposed.

Measuring file size across a codebase reveals the distribution of complexity. A project where 90% of files are under 200 lines and a few are over 500 has a different maintainability profile than one where the average file is 400 lines. The distribution matters more than any single file.

Nesting Depth

Deeply nested code is hard to follow. When conditional logic goes four or five levels deep, the developer has to track multiple conditions simultaneously to understand the current execution path. This increases the likelihood of bugs and makes the code resistant to modification.

if (user) {
  if (user.isActive) {
    if (user.subscription) {
      if (user.subscription.isValid) {
        // actual logic buried here
      }
    }
  }
}

This pattern can almost always be refactored into early returns or guard clauses that keep the nesting shallow and the logic clear. Measuring maximum and average nesting depth across a codebase provides a concrete signal of maintainability.

Cyclomatic Complexity

Cyclomatic complexity counts the number of independent paths through a function. A simple function with no branches has a complexity of 1. Each if, else, for, while, case, or logical operator adds to the count.

Functions with a cyclomatic complexity above 10 are generally considered difficult to understand and test. Functions above 20 are typically candidates for decomposition.

This metric is particularly useful because it correlates directly with testability. A function with 15 independent paths needs at least 15 test cases to achieve full branch coverage. If that feels like too many tests for a single function, the function is probably doing too much.

Function Length

Long functions, like long files, are harder to understand and modify. A function that spans 80 lines is doing multiple things, even if those things are related. Breaking it into smaller functions with descriptive names creates self-documenting code that is easier to read, test, and reuse.

Measuring function length across a codebase reveals how well the team practises decomposition. A project with an average function length of 15 lines is easier to maintain than one with an average of 45 lines, all else being equal.

README Presence and Quality

This is often overlooked in maintainability discussions, but documentation is a critical part of the maintenance experience. A project without a README (or with a placeholder README) forces every new contributor to understand the codebase entirely through the code itself.

A useful README answers four questions: what does this project do, how do I set it up locally, how do I run the tests, and how is the code organised. The presence of these sections can be detected automatically, and their absence is a measurable signal of maintainability risk.

Beyond the README, the presence of inline documentation for complex functions, environment variable documentation, and architecture decision records all contribute to maintainability. Code that explains its own reasoning is code that future developers can modify with confidence.

Code Duplication

Duplicated code is a maintenance multiplier. When a bug is found in duplicated logic, it needs to be fixed in every copy. When the behaviour needs to change, every instance needs to be updated. Inevitably, some copies get missed.

Measuring duplication across a codebase is straightforward. Tools can identify blocks of code that appear multiple times, either as exact copies or as near-duplicates with minor variations. A small amount of duplication is normal. A codebase where 15% of the code is duplicated has a systemic problem.

Dependency Complexity

The number and structure of internal dependencies affect maintainability. A module that imports from 20 other modules is tightly coupled to the rest of the system. Changing anything it depends on might break it. Changing it might break its dependents.

Circular dependencies are a particularly strong negative signal. When module A imports from module B and module B imports from module A, the two modules are effectively one unit that cannot be understood or tested independently. Detecting circular dependencies and measuring the overall dependency graph complexity provides insight into how modular (or monolithic) the codebase really is.

Combining Signals Into a Maintainability Score

No single signal tells the full story. A codebase can have small files but deep nesting. It can have excellent documentation but high cyclomatic complexity. The value comes from combining these signals into a composite view.

The most effective approach weights the signals based on their relative impact and produces a score that represents the overall maintainability posture. The score alone is useful for comparisons and tracking, but the breakdown is where the actionable insight lives.

When a team sees that their maintainability score is 62 out of 100, the natural question is "why?" The answer might be: file sizes are good, nesting depth is fine, but cyclomatic complexity is high in 15 functions and the README is missing setup instructions. That breakdown tells the team exactly where to invest their time.

Frequently Asked Questions

What is a good maintainability score?

There is no universal "good" score because it depends on the project's age, size, and purpose. What matters more is the trend. A score of 60 that is improving over time is healthier than a score of 75 that is declining. As a rough guide, most mature projects with active maintainability practices score between 65 and 85.

Which signal matters most for maintainability?

Cyclomatic complexity and file size tend to have the highest impact because they directly affect how long it takes a developer to understand and modify code. However, the combination of signals matters more than any individual metric.

How often should we measure maintainability?

Ideally on every pull request, with a full analysis at least weekly. Continuous measurement catches regressions early. If you can only measure periodically, weekly or fortnightly is a reasonable cadence to spot trends without generating alert fatigue.

Can maintainability be improved without a dedicated refactoring sprint?

Yes. The most sustainable approach is to improve maintainability incrementally. When a developer touches a file, they improve it slightly: extract a long function, reduce nesting, add documentation. Over time, these small improvements compound. Dedicated refactoring sprints are sometimes necessary for structural issues like circular dependencies, but most maintainability improvements can happen alongside normal feature work.

Tracking Maintainability Over Time

A single maintainability measurement is a snapshot. Tracking it over time reveals trends that are far more valuable.

A gradually declining score suggests that new code is being written without attention to the signals described above. Perhaps the team is under delivery pressure and cutting corners. Perhaps new developers are not aware of the team's conventions. The trend provides an early warning before the codebase reaches a point where changes become significantly slower.

A gradually improving score confirms that the team's investment in code quality is paying off. Refactoring efforts, code review standards, and tooling changes can all be validated by their effect on the maintainability trend.

Practical Steps

If you want to start measuring maintainability in your codebase, begin with the signals that are easiest to collect and most likely to reveal problems.

First, analyse file sizes across the project. Identify the largest files and assess whether they could reasonably be decomposed. This is often the highest-leverage starting point because large files tend to contain multiple other issues (high complexity, deep nesting, duplication).

Second, measure cyclomatic complexity for the most frequently changed files. These are the files where maintainability matters most because they are the files developers interact with daily.

Third, check for a README with meaningful content. If the project has one, verify that it covers setup, testing, and project structure. If it does not, write one. The return on investment is immediate.

Fourth, run a dependency analysis to identify circular dependencies. These are structural problems that become harder to fix over time and should be addressed early.

Finally, establish a baseline and track it. Maintainability is not a problem to solve once. It is a property to maintain continuously. The teams that measure it consistently are the teams that keep their codebases healthy as they grow.