Back to blog

CI/CD Quality Checks That Actually Work

The Problem with Most Quality Checks

Every engineering team has been there. Someone adds a quality check to the CI pipeline. For a few weeks, it works. Developers fix the issues it catches and the codebase improves. Then the failures start piling up. The check is slow, the error messages are cryptic, and people start finding workarounds. Eventually, someone adds a skip flag or the check gets moved to a non-blocking step that nobody looks at.

The check is still running. It is no longer doing anything useful.

This pattern repeats across the industry because most quality checks are designed around tooling capabilities rather than developer behaviour. A check that is technically correct but practically unusable is worse than no check at all. It creates the illusion of quality enforcement while training developers to ignore pipeline results.

Why Checks Get Bypassed

Before designing better checks, it is worth understanding why the existing ones fail. The reasons are consistent across teams and tools.

Slow Feedback Loops

A quality check that adds eight minutes to a pipeline will be tolerated for about a week. Developers context-switch while waiting, lose focus, and start to resent the process. When a check fails after a long wait, the frustration compounds. The developer has already moved on mentally and now has to go back to a change they thought was finished.

Checks that run in under sixty seconds get attention. Checks that take several minutes get ignored.

Unclear Failure Messages

"Quality check failed" is not a useful message. Neither is a wall of linting errors sorted alphabetically by file path. When a check fails, the developer needs to know three things immediately: what failed, where it failed, and what to do about it.

The best quality checks produce output that reads like a helpful colleague pointing at a specific line and saying "this needs fixing because of this reason." The worst produce output that requires ten minutes of scrolling to find the actual problem.

All-or-Nothing Thresholds

Many teams set quality thresholds once and never adjust them. A security score threshold of 90 might be reasonable for a mature codebase but completely unachievable for a legacy project that scores 45 today. When a threshold feels unreachable, developers stop trying.

Worse, some checks treat every domain identically. A team might care deeply about security but have no accessibility requirements at all. Forcing a single pass/fail across all domains means the check fails for reasons the team does not consider relevant, which erodes trust in the entire system.

No Integration with Review Workflows

A check that reports results only in the pipeline logs is a check that nobody reads. Developers live in pull request interfaces. If the quality feedback does not appear where the code review is happening, it might as well not exist.

Designing Checks That Teams Respect

Effective quality checks share several characteristics. None of them are technically difficult. They just require thinking about developer experience as carefully as you think about the analysis itself.

Optimise for Speed

The single most important factor is speed. A fast check that catches 80% of issues is more valuable than a thorough check that catches 95% but takes five minutes to run.

There are several practical approaches. Run checks in parallel rather than sequentially. Cache dependencies aggressively. Analyse only the files changed in the pull request rather than the entire codebase. If a full scan is necessary, run it as a separate non-blocking job and keep the blocking check lightweight.

Some teams run a fast subset of checks on every push and the full suite on a nightly schedule. This gives developers rapid feedback during the day while still catching issues that the lightweight checks miss.

Make Failure Messages Actionable

Every failure message should answer three questions: what is wrong, where is the problem, and how should the developer fix it. File paths should be clickable. Line numbers should be precise. Suggested fixes should be specific.

Good output looks like this: "Security: hardcoded API key detected in src/config/database.ts:42. Move the value to an environment variable." Bad output looks like this: "FAIL: security threshold not met (68/70)."

When the output appears as a comment on the pull request, developers see it without leaving their review workflow. When it appears as inline annotations on the diff, they see the problem in the exact context where it matters.

Use Per-Domain Thresholds

Different domains deserve different thresholds. Security issues in a financial application should block merges. Documentation gaps in an internal tool probably should not.

Effective quality gates let teams configure thresholds independently for each domain. A team might set security at 70, architecture at 60, and leave testing and documentation as advisory (reported but not blocking). This means the check only fails for reasons the team actually considers important.

Starting with lower thresholds and tightening them over time is far more effective than starting strict and loosening when people complain. Progressive tightening creates a culture of improvement. Starting strict and retreating creates a culture of resentment.

Distinguish Blocking from Advisory

Not every quality signal needs to block a merge. Some are better presented as information for the reviewer to consider.

A blocking check should cover areas where mistakes have clear, measurable consequences: security vulnerabilities, broken accessibility, architectural violations that would be expensive to reverse. An advisory check should cover areas where the team wants visibility but not enforcement: documentation coverage, code complexity trends, dependency update suggestions.

The distinction matters because it preserves developer autonomy for judgement calls while enforcing standards for non-negotiable concerns. When developers trust that blocking checks are genuinely important, they stop looking for workarounds.

Show Trends, Not Just Pass/Fail

A check that says "your security score is 72, threshold is 70, passed" is useful. A check that says "your security score is 72 (up from 68 last week), threshold is 70, passed" is much more useful. Trends give developers context. They can see whether their work is improving the codebase or just maintaining the status quo.

When a check fails, showing the trend helps the developer understand whether this is a new regression or a pre-existing issue that was already close to the threshold. This changes the conversation from "the pipeline is broken" to "this change pushed us below the threshold we agreed on."

Getting Team Buy-In

The best-designed check in the world will fail if the team does not support it. Buy-in comes from three things: involvement in setting thresholds, transparency about what the check does, and a genuine commitment to adjusting the rules when they prove wrong.

Let teams propose their own thresholds based on their current scores. Publish the scoring methodology so developers can understand why a check passes or fails. Review thresholds quarterly and adjust them based on what the team has learned.

Quality checks work when they feel like a tool the team chose to use, not a constraint imposed from above.

Frequently Asked Questions

How many quality checks should a CI pipeline have?

Start with one or two checks covering the domains your team cares about most. Security is usually a good starting point. Add more as the team gets comfortable, but keep the total pipeline time under two minutes for blocking checks.

Should quality checks block merges or just report?

Both. Use blocking checks for high-impact domains like security and architecture. Use advisory checks for domains where the team wants visibility without enforcement, such as documentation or code complexity.

What if our codebase scores too low to set any threshold?

Set the initial threshold at or just below your current score. This means the check passes immediately and only fails on regressions. Tighten the threshold gradually as the team addresses existing issues.

How often should we review our quality thresholds?

Quarterly is a sensible cadence. Review whether thresholds are still appropriate, whether any domains should move from advisory to blocking, and whether the team has feedback on the check output.

Practical Implementation

Start with one or two domains that the team already cares about. Security is usually a good starting point because the consequences of ignoring it are concrete and well understood. Set the initial threshold at or just below the current score so the check passes immediately and only fails on regressions.

Add the check to the pull request workflow so results appear as a status check and a comment. Make sure the failure messages are clear and specific. Run the check for two weeks and gather feedback. Adjust the threshold, the output format, and the domains based on what the team reports.

Then add another domain, repeat, and tighten the thresholds as the codebase improves.

Quality enforcement is a practice, not a configuration. The teams that succeed are the ones that treat their quality checks as a living system that evolves alongside their codebase.