The Security Debt AI Is Quietly Creating

Sixty-nine percent of organisations have found security vulnerabilities in AI-generated code. Most of them did not find those vulnerabilities before they shipped.

The pattern is consistent across teams I've worked with: AI tools doubled PR volume, security review capacity stayed flat, and the code that slipped through looked fine. It passed functional tests. It satisfied reviewers who were moving faster than before. It reached production. The exposure was already there before anyone thought to look for it.

This is not a story about AI being insecure. It is a story about AI-generated code having a specific vulnerability profile that existing review and tooling processes were not built to catch, combined with a volume increase that made the review capacity gap impossible to close with effort alone. The security debt is accumulating quietly because the code looks right. That is the part that makes it dangerous.

AI-Generated Code Has a Specific and Predictable Vulnerability Profile

AI coding tools do not know your system. They know patterns from training data. When an engineer asks Claude or Copilot to implement an authenticated endpoint, the model produces code that is plausible for an authenticated endpoint in general. It does not know your internal auth patterns, your API security conventions, or how your data classification rules should shape what that endpoint is allowed to return.

The result is a predictable vulnerability profile that is different from the one human engineers typically produce. Research from GitLab and OWASP consistently surfaces the same categories: credential exposure, insecure defaults, overly permissive access patterns, and copy-paste of patterns that work in isolation but are insecure in the specific context they are being dropped into.

Human engineers make security mistakes too, but their mistakes tend to be idiosyncratic. They misunderstand an edge case, forget to validate an input in a specific flow, or implement a business rule incorrectly. AI security failures are more systematic. The model is generating code that is correct by the patterns it was trained on and insecure by the specific conventions of your system. The same class of mistake appears across multiple PRs from multiple engineers, because they are all prompting the same model without giving it the context it would need to know better.

Consider what happens when three engineers on the same team each ask an AI tool to implement a service-to-service API call. Each one gets a slightly different implementation, but all three are likely to share the same default authentication approach, because the model has no visibility into your internal API security standard. If that standard exists and those implementations deviate from it, you now have three vulnerabilities with a common root cause, none of which a SAST tool will flag, none of which will look obviously wrong in a PR review, and all of which share the same exploit surface.

This is the part that organisations are discovering after the fact. The 69% finding vulnerabilities in AI-generated code are not seeing exotic attacks. They are seeing insecure defaults, missing authentication checks, and credential handling that would have been caught in a security-aware review. The code looked right. The reviewer was moving quickly. The combination was enough for it to get through.

Your Review Capacity Did Not Scale With Your PR Volume

Cortex's 2026 engineering benchmark found that teams with high AI adoption merged 98% more PRs per engineer. Review time per PR increased 91%. The combined result: reviewers are processing roughly twice the code in roughly the same total time budget. The time available to scrutinise any individual PR dropped significantly.

Security review is not something that degrades gracefully under time pressure. A functional reviewer who is moving fast will catch logic errors and miss security implications. They will approve code that does what it is supposed to do without noticing that it does something else it should not. The 23.5% increase in production incidents that accompanied the volume increase reflects exactly this: more output, same human attention, more of what the attention should have caught getting through.

The capacity mismatch is structural. Teams that doubled their PR volume with AI did not hire additional senior engineers to review it. They did not change their review process. They assumed the existing process would scale. It did not, and it was not going to. Security review specifically requires the reviewer to think about what the code does beyond its intended function. That takes time that reviewers no longer have when volume doubles.

This is not a people problem. The engineers doing reviews are not being negligent. They are doing exactly what the process asks of them, at the volume the process now requires. The problem is that the process was designed for a different throughput and nobody updated it.

Security review specifically demands a different kind of attention than functional review. Functional review asks: does this code do what it is supposed to do? Security review asks: what else does this code do, and who else can make it do something it was not supposed to? Those are adversarial questions. They require the reviewer to think from outside the implementation, not from within it. That cognitive shift takes time, and it is the first thing that disappears when reviewers are under volume pressure. Engineers prioritise confirming that code works correctly; they are not wrong to do so. The gap is that the security question requires a separate, deliberate pass, and most review processes do not enforce one.

The Failure Modes That Existing Security Tooling Misses

Most engineering teams have SAST tools, dependency scanners, and secrets detection in their CI pipeline. These were built for human-paced code production. They were calibrated, tuned, and integrated at a time when the volume of code entering the pipeline was a fraction of what AI generates today.

The volume problem is the obvious one: the same tooling, configured for human-paced output, is now running on twice the code. The signal-to-noise ratio has degraded in many teams because the tools were not reconfigured for AI-volume output. Security alerts that were manageable at previous volume become a queue that nobody has time to process seriously.

But the subtler problem is what these tools are not designed to catch. SAST finds known patterns: SQL injection constructs, hardcoded secrets, use of deprecated cryptographic functions. It does not catch context-specific security failures. A SAST tool will not tell you that the AI-generated endpoint is returning fields your data classification rules say should be restricted to admin roles. It will not tell you that the generated service-to-service authentication pattern follows a convention that was deprecated in your architecture six months ago. It will not tell you that the insecure default the AI used is insecure specifically because of how your infrastructure is configured.

These context-specific failures are where AI-generated code is most exposed, and they are precisely what existing tooling was not built to find. The teams that have 30% higher change failure rates from AI-generated code are not seeing SAST failures. They are seeing context failures: code that is generically correct and specifically insecure.

What AI-Aware Security Review Actually Looks Like

The fix is not to slow down AI adoption. It is to build a security review process that is designed for AI-volume output and AI-specific failure modes. Teams I've worked with that have navigated this have done three things consistently.

The first is an AI-specific security review checklist embedded into the PR template. This is different from a generic security checklist. It surfaces the specific failure modes AI tools produce: hardcoded or improperly scoped credentials, insecure defaults for your specific infrastructure context, auth patterns that deviate from your internal conventions, data access that exceeds what the endpoint requires. The checklist makes the review explicit. It takes what was previously an implicit expectation (that reviewers would catch these things) and turns it into a structured step that happens for every AI-assisted PR.

The second is automated security gates that run before human review, not alongside it. The CI pipeline should be blocking PRs with known security antipatterns before a human looks at them, not flagging them as warnings after merge. This requires tuning your existing tooling for AI-generated volume: updating rules, reducing false positives so that real signals are not lost, and making the security gate a genuine gate rather than an advisory.

The third is context injection: giving the AI tool enough of your security conventions that it generates more contextually appropriate code in the first place. This means adding security conventions to your architecture documentation and CLAUDE.md equivalent files, so that when engineers prompt the model, it is working with your constraints rather than generic patterns. This does not eliminate the problem. It reduces the frequency of context-specific failures at the generation stage, which makes the review stage more tractable.

A fourth approach worth addressing directly: using AI to review AI-generated code for security issues. This has a specific and limited use case. AI review tools are genuinely useful for catching the pattern-level problems: known insecure function calls, hardcoded secrets, missing input validation that follows a recognisable form. This is the same category that SAST handles. AI review is an extension of automated checking, not a replacement for contextual human review.

What AI review cannot catch reliably is context-specific exposure. It does not know that your auth convention works differently from the standard pattern, or that a particular field your API is returning is restricted under your data classification policy. The human security review pass is not optional because AI review exists. It is optional only if the code being reviewed has no context-specific security requirements, which describes almost no production codebase.

None of these require a new security tool or a new headcount. They require updating an existing process that was built for a different production rate and a different class of failure mode.

The Compounding Risk of Skipping This Now

Security debt compounds differently from technical debt. A piece of technical debt slows future development. A piece of security debt sits quietly until it is exploited, and then the cost is rarely proportional to the code that caused it.

The AI security debt accumulating in most engineering organisations right now shares a specific characteristic: it is systemic. When the same class of vulnerability appears across multiple PRs from multiple engineers because they are all prompting the same model without adequate context, you do not have isolated vulnerabilities. You have a pattern. A single class of exploit that works on one AI-generated endpoint is likely to work on others generated with the same patterns.

The teams that find this first in a security audit are in a much better position than the teams that find it in an incident. The audit is uncomfortable and the remediation is expensive. The incident, when the vulnerability class is systemic, is categorically more expensive: not one endpoint to fix but many, not one data exposure to manage but a class of exposure that requires a full audit to scope.

The 69% of organisations that found vulnerabilities in AI-generated code found them. The number that has similar vulnerabilities in production and has not found them yet is almost certainly higher. The gap between those two groups is not security sophistication. It is whether they looked.

The volume of AI-generated code entering production is not going to decrease. Every week of not updating the security review process is a week of additional exposure accumulating at AI pace. The debt is real, it is growing, and it is not going to surface in your SAST dashboard before it surfaces somewhere else.

I help engineering teams close the gap between "we use AI tools" and "AI actually changed how we deliver." Book a 20-minute call and I'll tell you where the leverage is.