A snapshot of your codebase quality tells you where you stand today. That is barely useful on its own. The number in isolation has no reference point. A 78 is not good or bad. You have no idea whether it is rising or falling.
What matters is the direction. Monitoring is snapshot then plural snapshots over time then trend. The trend is the signal. The absolute number is noise.
Teams that set up health dashboards with a single current number and review it in quarterly strategy meetings get almost nothing from the exercise. Teams that track the same signals continuously and watch the movement catch regressions before they become problems.
What "monitoring" actually means
Automated measurement of a fixed set of quality signals, on a fixed cadence, producing a record that spans months.
Fixed signals. The same measurements each time. If you change what you measure, the trend loses meaning. Pick a set, stick with it, only change if the set is wrong.
Fixed cadence. Every merge to main, or every commit, or every scheduled daily run. Whichever you pick, stick to it. Irregular sampling produces noisy trends.
Recorded. Stored somewhere queryable. A CI tool that shows the current run's findings but not last month's is not monitoring; it is just CI.
Signals worth monitoring
Not every metric deserves a spot on the dashboard. A useful set:
- Overall quality score across domains. The top-line trend.
- Per-domain scores. Security, testing, architecture, performance, dependencies, accessibility, documentation.
- File size distribution. Number of files above soft and hard thresholds.
- Cyclomatic complexity. Mean and p95 across all functions.
- Test-to-source ratio. Count of test files divided by count of source files.
- Dependency count. Direct and transitive.
- Vulnerability count. By severity.
- Circular dependency count. Number of cycles in the module graph.
- Dead export count. Exports nobody imports.
Ten signals. Each one is mechanical to measure. A daily run on the main branch produces ten data points. Over a quarter that is 900 data points, which is enough to see trends with confidence.
Cadence
Daily is overkill for most signals but useful for vulnerability scanning (new CVEs appear continuously). Per-commit is useful for per-PR gating. Once a week is probably enough for architectural signals, which change slowly.
A pragmatic setup:
- Per-commit: vulnerability scan, lint, type check.
- Per-PR: full quality score, per-domain breakdown, gated on thresholds.
- Daily: trend data point for the dashboard.
- Weekly: team reviews trends.
The daily cadence is the monitoring backbone. Weekly is where humans engage with the data.
Dashboards
A good dashboard shows three things:
- The trend line for the headline metric over 90 days.
- Per-domain breakdown with trend indicators (up, down, flat).
- Recent changes that moved the number noticeably.
A bad dashboard shows a single current number with no history. Or twenty-five charts on one screen. Or charts with no context (no threshold lines, no colour coding, no comparison).
Pick the three-thing pattern. Put it somewhere the team actually looks.
What to alert on
Most signals do not need alerts. Most signals move slowly, and a weekly review catches the trend. Alerts are for sharp signals only.
Alert on:
- New high-severity security findings (new vulnerability in a dependency, new committed secret).
- A quality score drop of more than N points in a single commit or day. Sharp drops are often regressions.
- Test-to-source ratio falling below a floor. Tests are being added more slowly than source, consistently.
- A new circular dependency. Catch it the day it is introduced.
Do not alert on slow, smooth declines. Those are for dashboards, not pages. An alert that fires daily becomes an alert that fires never.
Common failure modes
Dashboards nobody checks. The team sets up a Grafana board or a quality dashboard and then nobody opens it. A dashboard without a ritual is dead weight.
Alerting on noise. Every PR that drops the score by 1 point pages someone. Alert fatigue sets in. The real alerts get ignored.
Metrics with no action. The score has been dropping for three months. The team knows. Nothing happens. Monitoring is diagnostic, not curative; you still have to do the fixing.
Drift between what you measure and what you care about. You measure coverage. You care about test quality. The metric does not capture what you actually want. Pick measurable proxies and check periodically that they still track the underlying thing.
The ritual
Monitoring only works with a ritual. A weekly 15-minute team review where someone looks at the dashboard, notes what moved, and either files a ticket or flags a concern.
That is the mechanism that turns data into action. Without it, the data pile up and nobody notices when something is wrong.
If you do not have the ritual, you do not have monitoring. You have a dashboard.