Back to blog

PR Quality Gates: A Complete Guide

What Is a PR Quality Gate?

A pull request quality gate is an automated check that evaluates the quality impact of a code change before it is merged. When a developer opens or updates a pull request, the gate analyses the change across one or more quality domains and either passes or fails based on predefined thresholds.

If the gate passes, the PR proceeds through the normal review process. If it fails, the team is alerted to a quality concern that needs to be addressed before merging.

Quality gates are not a replacement for code review. They are a complement to it. Human reviewers are good at evaluating design decisions, business logic, and naming choices. Automated gates are good at catching the things humans miss: security patterns, dependency issues, coverage regressions, and architectural drift. For more on the cost of underinvesting in review, see the hidden cost of skipping code reviews.

Why Teams Use Quality Gates

Consistency at Scale

In small teams, senior developers can review every PR and catch quality issues through experience. As teams grow, this does not scale. Quality gates ensure that the same standards apply to every change, regardless of who wrote it and who reviews it.

Catching Problems Early

A security vulnerability caught in a PR is a five-minute fix. The same vulnerability caught in production is an incident involving multiple teams, customer communications, and potentially regulatory reporting. Quality gates shift detection left, to the point where fixing problems is cheapest.

Reducing Review Burden

When reviewers know that automated checks have already verified security patterns, dependency health, and test coverage, they can focus their attention on the aspects of the code that require human judgement. This makes reviews faster and more effective.

Protecting Against Gradual Decline

Codebases rarely degrade in dramatic steps. They degrade through hundreds of small compromises, each individually reasonable. Quality gates create a floor that prevents this gradual erosion by ensuring that every change meets a minimum standard.

Setting Up Per-Domain Thresholds

The most effective quality gates evaluate multiple domains independently rather than relying on a single aggregate score. This prevents a high score in one domain from masking a low score in another. Understanding what makes a good quality gate is the first step in getting this right.

Core Domains

These are the domains where failures have the most significant impact. Most teams start here.

Security is typically the first domain teams gate on. A reasonable starting threshold is 70 out of 100. This catches committed secrets, dangerous API patterns, and dependency vulnerabilities without being so strict that routine changes are blocked. For background on the patterns being detected, see security patterns every developer should know.

Architecture protects against structural degradation. A threshold of 60 is a sensible starting point. This flags changes that introduce circular dependencies, violate module boundaries, or significantly increase coupling.

Maintainability guards against code that is difficult to read, modify, or debug. A threshold of 60 catches extreme complexity and oversized files while allowing normal development to proceed.

Supplementary Domains

These domains are valuable but often start disabled (threshold of 0) because many existing codebases would fail immediately if gates were enabled.

Testing is the most common supplementary gate. Many codebases have low test coverage, and enabling a testing gate immediately would block every PR. Teams that want to improve coverage typically start with a low threshold (say, 40) and increase it quarterly.

Performance catches heavy imports, sequential awaits that should be parallel, and other performance anti-patterns. A threshold of 50 is reasonable once the team has addressed any existing performance issues.

Dependencies evaluates dependency count, vulnerability status, and licence compliance. This is especially valuable for teams in regulated industries. A healthy approach to managing your dependency supply chain starts here.

Accessibility checks for WCAG compliance patterns. Teams building user-facing applications should enable this gate once their existing accessibility issues are resolved.

Documentation verifies that changes maintain documentation quality. This is most useful for projects with public APIs or extensive onboarding requirements.

Blocking vs Advisory Checks

Not all quality gates need to block merging. The distinction between blocking and advisory checks is one of the most important decisions in configuring a quality gate system.

Blocking Checks

A blocking check prevents the PR from being merged until the issue is resolved. The GitHub check status is set to "failure", and branch protection rules (if configured) prevent merging.

Use blocking checks for domains where failures have severe consequences. Security is the obvious candidate. If a PR introduces a committed secret or a critical dependency vulnerability, it should not be possible to merge it without addressing the issue.

Advisory Checks

An advisory check reports its findings but does not prevent merging. The check might show as a warning or a neutral status. The information is available to the reviewer and the author, but the decision to act on it is left to the team.

Advisory checks work well for domains where context matters. A documentation score drop might be acceptable if the PR is a hotfix. A slight maintainability decrease might be intentional if the team is implementing a complex algorithm that genuinely requires nested logic.

A Practical Approach

Many teams start with blocking checks on security only, advisory checks on architecture and maintainability, and all other domains disabled. As the team builds confidence in the system and addresses existing issues, they progressively move domains from disabled to advisory to blocking.

This gradual approach avoids the common failure mode of enabling strict gates across all domains on day one, which results in every PR failing and developers quickly learning to ignore or bypass the system.

Common Pitfalls

Setting Thresholds Too High

The most common mistake is setting initial thresholds too high. If 80% of PRs fail the quality gate, developers will view the system as an obstacle rather than a safeguard. Start with thresholds that would have caught your last two or three genuine quality issues but would have passed most normal PRs.

Setting Thresholds Too Low

The opposite mistake is setting thresholds so low that the gate never fails. A gate that always passes provides no value and creates a false sense of security. Review your gate pass rate periodically; if it has not failed in three months, your thresholds might be too lenient.

Treating All Domains Equally

Security failures and documentation gaps are not equivalent risks. Weighting all domains equally, or gating all domains at the same threshold, fails to reflect the actual impact of different types of quality issues.

Ignoring False Positives

Every automated system produces false positives. If developers encounter false positives frequently and have no way to dismiss or override them, they will lose trust in the system. Provide a mechanism for dismissing findings with a reason, and review dismissed findings periodically to improve detection accuracy.

Not Reviewing Thresholds Over Time

A threshold that was appropriate six months ago might be too lenient or too strict today. Schedule quarterly reviews of your quality gate configuration. As your codebase improves, ratchet the thresholds upward. If a domain consistently causes false failures, investigate whether the threshold needs adjustment or the detection logic needs refinement.

Gating Without Visibility

Quality gates are most effective when developers can see the full analysis report, not just a pass or fail status. A gate that says "architecture check failed" is frustrating. A gate that says "architecture score dropped from 72 to 64 due to 3 new circular dependencies in the payments module" is actionable.

Provide detailed reports for every PR check, including the specific findings that contributed to the score and clear guidance on what needs to change.

Frequently Asked Questions

How long should a PR quality gate take to run?

Under two minutes is ideal. If the gate takes longer, developers will context-switch while waiting and the feedback loop loses its value. If full analysis takes longer, consider running a lightweight check on PR open and a full analysis asynchronously.

Should quality gates apply to all branches or just PRs targeting main?

Most teams gate PRs targeting their main or production branch. Gating PRs targeting feature branches is usually unnecessary and adds friction to work-in-progress collaboration.

What should happen when a developer disagrees with a gate failure?

Provide a clear escalation path. For advisory checks, the developer can merge with a comment explaining their reasoning. For blocking checks, a senior engineer or tech lead should be able to override with a documented justification. Review overrides regularly to identify patterns that suggest the gate needs tuning.

Can quality gates work with monorepos?

Yes, but they require more sophisticated configuration. Ideally the gate analyses only the packages or modules affected by the PR rather than the entire repository. Path-based filtering and per-package thresholds are common approaches.

How do quality gates interact with CI/CD pipelines?

Quality gates are typically one step in a broader CI/CD pipeline. They run alongside (or after) linting, type checking, and unit tests. The quality gate evaluates higher-level concerns like architecture and security that conventional CI checks do not cover.

Implementation Checklist

If you are setting up PR quality gates for the first time, this sequence has worked well for many teams.

First, run a baseline analysis of your main branch across all domains. This tells you your starting point and helps you set realistic thresholds.

Second, enable security as a blocking check with a threshold 5 to 10 points below your current score. This ensures the gate passes on your existing code but catches regressions.

Third, enable architecture and maintainability as advisory checks with similar thresholds. Let the team get accustomed to seeing the reports without being blocked.

Fourth, review the data after four weeks. Look at how many PRs would have been blocked, what findings were reported, and whether any false positives appeared.

Fifth, adjust thresholds based on the data. Tighten thresholds that are too lenient, loosen any that are causing false failures, and consider moving advisory checks to blocking if the team is ready.

Sixth, enable additional domains as your team addresses existing issues in those areas.

The Long-Term View

Quality gates are not a one-time setup. They are a living system that evolves with your codebase and your team. The best implementations treat quality gate configuration as a form of engineering policy: documented, reviewed periodically, and adjusted based on evidence.

When configured well, quality gates become invisible. PRs that meet the standard pass without friction. PRs that introduce genuine problems are caught early. And over time, the codebase steadily improves because every change is held to a consistent, reasonable standard.

That is the goal: not perfection on every PR, but steady, measurable progress toward a healthier codebase.