Consider two bugs.
Bug one.
const token = req.headers.authorization;
db.query(`SELECT * FROM users WHERE token = ${token}`);
Unparameterised SQL, direct concatenation of user input. Static analysis flags this with 100% confidence. Regex finds it. Semgrep finds it. sqlmap can exploit it in under a minute.
Bug two.
async function bulkUpdate(users: User[]) {
for (const user of users) {
await updateUser(user);
}
await logAudit("bulk update", users.length);
await notifySlack(`Updated ${users.length} users`);
await metricsClient.record("bulk_update", users.length);
}
No static-analysis tool flags this. Every line is syntactically correct. No dangerous pattern. And yet it is wrong for reasons that matter: updating users sequentially wastes time, the logging and notification and metrics calls block the function but add no sequencing requirement, and any failure in updateUser aborts the whole batch without retry or rollback.
To spot bug two you need to understand what the code is trying to do. Rules cannot do that. This is where AI code review earns its keep.
What each tool is good at
Static analysis is deterministic, fast, cheap, and reliable within its scope. It catches patterns. A rule checks for something specific (like "concatenation into a SQL string") and either it fires or it does not. The same code always gets the same result. Thousands of files process in seconds.
AI code review is probabilistic, slower, more expensive per run, and context-aware. It reads the actual code and evaluates it against general principles (is this efficient, is this idiomatic, does this do what it claims). It catches things static analysis cannot because it understands intent.
Rules catch known anti-patterns. AI catches "this is fine in theory but probably wrong in context".
Where each one fails
Static analysis fails when the pattern is contextual. A rule cannot know whether a particular eval call is safe (some are; they take data that is guaranteed to be trusted). It has to flag all of them or none. False positives build up. Developers learn to ignore them. Real issues hide in the noise.
Static analysis fails when the bug is cross-file. A function that looks fine in isolation might violate an invariant elsewhere in the codebase. Rules that chain across files exist but are rare and expensive.
Static analysis fails at semantics. Whether a variable is named well, whether a function is doing two jobs, whether the abstraction makes sense. These need a reader.
AI fails on determinism. The same code reviewed twice can produce different findings. Not wildly different, but different. This is acceptable for advisory review. It is a problem for gating decisions.
AI fails at scale-per-dollar. Reviewing 10,000 files with AI is slow and expensive. Reviewing the same files with static analysis is seconds and free.
AI fails at confidence. It will sometimes flag things that are not issues ("this function is complex" when the function has to be complex). It will sometimes miss things that are. A scanner that says "X is wrong" with 100% certainty is easier to act on than one that says "X might be wrong".
The practical combination
Neither tool is sufficient alone. The pattern that works:
- Static analysis as the baseline, in CI on every commit. Fast, deterministic, blocks merges on high-severity findings. Zero AI cost.
- AI review on pull requests, producing advisory findings and commentary. Not gating. The score comes from static signals; AI adds context.
- Human review for intent and architectural decisions, which neither tool captures well.
This stack covers three distinct layers: mechanical correctness (static), contextual judgement (AI), and design judgement (human). Each is expensive in a different way. Together they cost less than any one trying to do the job alone.
A common mistake
Teams sometimes try to replace static analysis with AI. "AI is smarter, let us just run AI on every PR." This sounds reasonable and is wrong.
Static analysis on const password = "hunter2" takes 0.1ms, costs nothing, and produces a deterministic finding. AI on the same code might produce the same finding or might miss it entirely depending on context and prompt. Paying 10,000x the cost for a probabilistic version of a reliable result is not progress.
The correct framing: static analysis handles the stuff where rules can be clear, AI handles the stuff where they cannot.
The thing that surprised me
When we built Implera's scoring engine, we assumed AI would be the star and static analysis would be the warm-up. The reverse turned out to be more useful.
Static analysis produces the number. It is fast, reproducible and trustworthy enough to gate on. AI produces context: what is this finding, why does it matter for this codebase, how would you fix it. That context is what makes the number actionable, but it is not what makes the number.
Neither tool alone is enough. Static analysis without AI is a pile of findings without a narrative. AI without static analysis is a narrative without a trustworthy number to anchor it.
The bottom line
Use both. Use static analysis for speed, reliability and gating decisions. Use AI for context, summarisation and the judgement calls rules cannot make. Use humans for intent and architecture.
The right tool for each layer, not a single tool pretending to do all three.