Technical17 min read

Secure Code Review for AI-Generated Code: A Practical Guide

AI coding assistants are now part of the daily engineering loop in most enterprise development teams. Copilot, Cursor, Claude Code, and internal models produce a large and growing share of the code that ships to production. The vulnerability classes have not changed; the volume, distribution, and review signals have. For internal AppSec teams, product security teams, security engineering teams, security champions, and CISO programmes setting an AI-coding policy, this guide covers what is different about reviewing AI-generated code, the recurring vulnerability classes that show up disproportionately, the per-pull-request review pattern, the SAST and SCA harness that scales the review, the prompt and policy guardrails that reduce upstream risk, the audit-read shape AI-assisted commits leave behind, and a phased rollout that takes a programme from ad-hoc Copilot use to a defensible AI-coding policy.

What Is Different About Reviewing AI-Generated Code

AI-generated code is not categorically less secure than human-written code. It does change the distribution of failure modes and the signals a reviewer reads. Human-written code carries the intent of the author and the recent design conversation; the reviewer can lean on context that was discussed before the commit landed. AI-generated code often arrives without that context, and the human developer who accepted the suggestion may not be able to fully reconstruct why the assistant chose a particular library, idiom, or default.

Three review-pattern shifts follow from that asymmetry. The first is behavioural verification: AI-generated code looks confident and idiomatic even when it does not solve the problem the prompt described, so the reviewer explicitly checks that the produced code does what the developer intended rather than what the assistant inferred. The second is dependency hygiene: AI assistants suggest third-party libraries with high frequency, and a meaningful share of those suggestions are outdated, deprecated, or non-existent (the package-hallucination failure mode), so the reviewer treats every new dependency as a first-class review item. The third is security-default scrutiny: AI assistants reproduce common patterns from training data, including weak cryptographic defaults, unsafe deserialization, and insecure session handling that look idiomatic, so the reviewer treats security-sensitive defaults as suspect rather than assumed-correct.

The cumulative effect is that AI-generated code review takes slightly longer per change and is materially more rigorous on the three axes above. The counter-balance is that AI-assisted developers produce more changes per unit time, so the overall review throughput shifts toward more changes per reviewer per day with each individual change requiring sharper attention on the AI-specific axes. Programmes that try to absorb AI-assisted volume without shifting the review pattern accumulate the failure modes silently.

Eight Recurring Vulnerability Classes in AI-Generated Code

Field reports across enterprise AppSec teams and published AI-assisted-coding studies converge on eight recurring failure modes. The list is not exhaustive and the relative rates vary by language, framework, and assistant tier. The shape is consistent enough to anchor the review pattern.

1. Injection (SQL, command, LDAP)

The assistant emits string concatenation rather than a parameterised query or a properly escaped command construction. Common when the prompt does not specify the framework idiom or when the assistant defaults to the most generic example pattern from training data. The review check is that every dynamic query, command, and external invocation uses the framework-native safe construction.

2. Cross-site scripting

The assistant emits unsafe output rendering because the framework auto-escape is bypassed (raw HTML helpers, manual string concatenation into HTML, JSON responses without proper content-type handling). Common in templates and view-layer code that mixes server-rendered and client-rendered output. The review check is that every output context renders through the framework escape mechanism appropriate to that context.

3. Insecure deserialization

The assistant uses pickle in Python, Java native deserialization, or equivalent unsafe formats for what reads as a clean object parser. Common when the prompt asks for a generic data-loading function and the assistant does not infer that the data may come from untrusted input. The review check is that any deserialization of external input uses a constrained format (JSON with schema validation, MessagePack with allowed types) rather than a code-equivalent serialization.

4. Hardcoded secrets and example credentials

The assistant emits API keys, passwords, or tokens from the training set that survive into the produced code as fake-looking but real test credentials, or as placeholder strings the developer forgets to replace. Common in example configurations, integration tests, and quick-start scaffolding. The review check is a secret scanner running on every change plus a manual sweep on AI-generated configuration and bootstrap files.

5. Outdated and hallucinated dependencies

The assistant suggests package versions known to have public CVEs because they were the dominant version in the training data, or package names that do not exist on the registry (the package-hallucination failure mode). The second case is a supply-chain risk: malicious actors can register the hallucinated name and ship malware. The review check is dependency verification at pull-request time plus SCA scanning on the resulting lockfile.

6. Weak cryptographic defaults

The assistant emits MD5, SHA-1, ECB-mode AES, static IVs, or weak password hashing because they are still common in legacy code samples. The defaults look idiomatic and compile cleanly. The review check is that every cryptographic primitive is named explicitly (algorithm, mode, key length, IV handling, KDF for password hashing) and reviewed against the operative standard (NIST SP 800-131A for transition guidance, OWASP ASVS V6 for application-level cryptographic verification).

7. Path traversal and SSRF

The assistant concatenates input into file paths or outbound URLs without normalisation, allowlisting, or scheme validation. Common in file-upload, webhook, and integration code. The review check is that every input that influences a filesystem path or an outbound network request is normalised and validated against an allowlist before use.

8. Authorization bypass

The assistant produces handler logic that authenticates correctly and authorises incorrectly: a route that is gated on login but not on resource ownership, an admin handler that checks the user is logged in but not that the user has the admin role, a tenant-scoped query that reads from the wrong tenant. This is the hardest class to catch because the code looks idiomatic, the tests pass, and the failure surfaces only when an adversary manipulates the request. The review check is explicit authorization assertion in every handler that touches a privileged resource, paired with insecure direct object reference review on object lookups.

For the broader vulnerability taxonomy these classes map into, the OWASP Top 10 explainer covers the application-layer baseline and the CWE explainer covers the structured taxonomy that maps each AI-introduced finding back to a stable identifier.

The Per-Pull-Request Review Pattern

Mature AppSec programmes converge on a layered review pattern that sits on top of the standard pull-request workflow. The pattern below is the operating shape most teams settle into; the exact rules are tuned per programme.

Review itemStandard human-written changeAI-assisted change
Behavioural intentInferred from author and design context.Verified explicitly. Reviewer reads the change against the prompt or task description and confirms the produced code matches intent.
New dependenciesReviewed for license, maintenance, and known CVEs.Reviewed for license, maintenance, known CVEs, package existence, publisher identity, and version pinning.
Cryptographic defaultsReviewed against the team standard.Treated as suspect by default. Algorithm, mode, key length, IV, and KDF named explicitly in review.
Authorization assertionsReviewed against the design.Reviewed against the design and against the actual assertion in the handler. Reviewer reads each authz check against the resource ownership model.
Input validationReviewed where input crosses a trust boundary.Reviewed where input crosses a trust boundary, plus an explicit pass on every parameter that flows into a query, command, file path, or outbound URL.
Test coverageTests assert correctness.Tests assert correctness and security-relevant behaviour. AI-generated tests are reviewed for the same plausible-looking-but-wrong failure mode as AI-generated code.
Secret scanningAutomated scanner on the diff.Automated scanner on the diff plus a manual sweep on AI-generated configuration and bootstrap files.
Provenance labelAuthor identity in commit metadata.Author identity plus AI-assisted label so the reviewer applies the elevated pattern.

For the wider review checklist these items extend, the secure code review checklist covers the baseline review pattern that AI-assisted review builds on.

The SAST and SCA Harness That Scales the Review

Manual review at human pace cannot absorb the volume AI assistants produce. The answer is not to slow the developer down; the answer is to put a layered scanner harness on every pull request so the recurring failure modes surface automatically and the human reviewer can focus on the cases that need judgement.

Static application security testing (SAST)

A SAST scanner reads the source code and flags known-bad patterns: tainted input flowing into sinks, unsafe deserialization, weak cryptographic primitives, hardcoded secrets, path-traversal-shaped code, and other pattern-matchable failure modes. Tuned for the language stack, the SAST catches a meaningful share of the eight failure modes above before the reviewer reads the code. The discipline is to keep the rule set current, triage findings to keep noise low, and treat persistent SAST findings as a signal that the upstream prompt or policy needs adjustment.

Software composition analysis (SCA)

An SCA scanner reads the manifest and the lockfile, identifies third-party packages, and flags known CVEs against each version. SCA is the harness that catches the outdated-dependency and (partially) the package-hallucination failure modes. The discipline is to require SCA on every change, to fail the build on net-new high-severity introductions rather than on the entire backlog at once, and to require a manual approval path for any new dependency.

Secret scanning

A secret scanner reads the diff and flags strings that look like API keys, passwords, tokens, and certificates. The harness catches the hardcoded secrets failure mode and is independent of whether the secret is real or an example. The discipline is to fail the build on detection rather than merge with a warning, and to pair with credential rotation any time a real secret slipped through.

Reachability layered on SCA

Reachability analysis ranks SCA findings by whether the vulnerable code path is invokable from the application, demoting the long tail of unreachable dependency vulnerabilities to a tracked exception class. For AI-generated code where the dependency suggestion volume is high, the reachability filter prevents the reviewer from drowning in unreachable-CVE noise. The reachability analysis explainer covers the technique in operational detail.

Policy gate at merge

The harness output funnels into a merge-time policy: net-new high-severity SAST findings block, net-new known-exploited (KEV) dependencies block, new hardcoded secrets block, and lower-severity findings annotate the pull request without blocking. The gate is calibrated to the team is throughput; the failure mode is a gate so loose it never blocks (and the harness degenerates into an audit-after-the-fact tool) or so tight it blocks every merge (and the team disables it).

For the wider scanner-stack composition this harness sits inside, the ASPM explainer covers how SAST and SCA output consolidate into a single posture record across multiple tools and many teams.

Prompt and Policy Guardrails Upstream of Review

Review catches what makes it into the pull request. Upstream guardrails reduce what reaches the review layer. Mature programmes pair the review pattern with a small policy stack that shapes how AI assistants are used in the first place.

Approved-tools list

A short list of the AI assistants permitted for production code, the deployment tier (enterprise vs consumer) approved for each, and the boundary cases. Programmes that leave the list implicit accumulate a long tail of personal tools whose data-handling posture is unknown.

Data-handling clause

Specifies what code, secrets, customer data, and proprietary information may be shared with each approved assistant, aligned with the assistant vendor data-use policy and the data classification of the project. The clause is binary at the secret line: no production credentials, customer data, or regulated data into any AI assistant under any circumstances.

Prompt hygiene

Lightweight guidance that improves the quality of the AI output: name the framework explicitly, name the security constraints (parameterised queries, escape on output, no dynamic eval, allowlist on outbound URLs), name the error-handling expectations. Programmes that codify a small prompt template for security-sensitive code reduce the AI failure rate on the first pass rather than relying entirely on the review layer.

Pull-request labelling

AI-assisted commits are flagged in the pull-request description, the commit trailer, or both. The label is the trigger for the elevated review pattern. Programmes that rely on reviewer guesswork to identify AI-assisted changes apply the elevated pattern inconsistently, which converges on applying it nowhere.

Code ownership

The human developer who accepts an AI suggestion is the owner of the resulting code. Ownership does not transfer to the assistant or to the prompt. The clause matters because it prevents the blame-shifting-to-the-tool failure mode and aligns the incentive: the developer reviews the AI output before accepting it, because the consequence lands on the developer is record.

Retention of prompt and completion record

For the subset of changes where audit-read durability matters (regulated code, security-sensitive components, framework-bound features), the prompt and completion record is retained alongside the change as evidence the change was reviewed under the elevated pattern. The retention scope is narrow on purpose; retaining every prompt is operationally infeasible and often legally complicated.

The OWASP Top 10 for LLM Applications covers the security risks of building with LLMs in your own product. This guide is a different surface: the security risks of using AI assistants to build any code, including non-LLM products. Programmes operating both surfaces benefit from reading the two guides together.

How the Audit Read Shapes AI-Assisted Coding Evidence

Auditors and assessors reading AI-assisted coding evidence look at four lenses: signal coverage, decision durability, framework alignment, and provenance. The first three apply to any code-review evidence; the fourth is the AI-specific addition.

  • Signal coverage: SAST, SCA, and secret scanning ran on the change, with the scanner identity, version, and scan date recorded.
  • Decision durability: if a finding was accepted, deferred, or fixed, the decision can be reconstructed from the record alone, including the owner, the basis, the expiry, and the re-evaluation trigger.
  • Framework alignment: the change maps to operative AppSec controls. NIST SSDF practice PW.4 (well-secured software), ISO 27001 Annex A 8.28 (secure coding), OWASP ASVS V1 (architecture, design, and threat modelling), SOC 2 CC8.1 (change management), PCI DSS Requirement 6.2 (bespoke and custom software development) all expect a documented review trail.
  • Provenance: AI-assisted changes are labelled, the assistant tier is recorded at the project level, and the human owner of the change is identified. Auditors are not asking for the prompt verbatim; they are asking that the elevated review pattern was applied where it should have been.

For the framework-mapping detail across the operative AppSec controls, the OWASP ASVS framework page covers the verification standard that secure code review programmes anchor on, and the NIST SSDF implementation guide covers the practice-by-practice expectations for secure software development.

A Phased Rollout for an AI-Coding Policy

The rollout below takes an internal AppSec, product security, security engineering, or CISO programme from ad-hoc Copilot use to a defensible AI-coding policy over four to six quarters. Operating value lands at each phase rather than only at the end.

Phase 1: Inventory current usage

Catalogue which AI assistants are in use, on which tiers, by which teams, with what data-handling posture. Map the gap between what is in use and what would be acceptable. The output is a one-page picture of current state that subsequent phases work against.

Phase 2: Approved-tools list and data clause

Publish the approved-tools list, the deployment-tier requirement, and the data-handling clause. Communicate to engineering. Measure conformance over a quarter. The output is a default policy whose violation is visible rather than a blanket prohibition that drives shadow-IT use.

Phase 3: Pull-request labelling and elevated review

Roll out the AI-assisted-commit label and the elevated review pattern in one or two pilot teams. Train reviewers on the eight recurring failure modes and the per-pull-request review pattern. Measure review time, defect escape rate, and reviewer feedback over a quarter. Tune the pattern. The output is a validated review shape that other teams can adopt.

Phase 4: SAST, SCA, and secret scanning harness

Wire the scanner harness into every pull request across the pilot teams. Calibrate the merge-time policy gate so it blocks net-new high-severity findings without blocking every merge. Track the signal-to-noise ratio and tune. The output is a harness that absorbs the recurring AI failure modes automatically.

Phase 5: Security champions and feedback loop

Embed security champions in product teams who carry the AppSec read into engineering daily work and surface AI-specific patterns to the central team. Use the feedback loop to refine the prompt template, the policy stack, and the SAST rule set. The output is a programme that learns from its own findings rather than accumulating them.

Phase 6: Audit-read and steady state

Settle into the steady-state cadence: scanner harness running on every change, elevated review pattern applied to AI-labelled commits, retention of prompt-and-completion records for the audit-relevant subset, annual review of the policy stack against the evolving assistant landscape. Run an internal audit dry-run against the AI-assisted coding evidence; the gaps that surface are the next quarter of operating work.

Where AI-Generated Code Review Sits in the Wider Operating Model

AI-generated code review is one workflow inside a wider internal security organisation. It sits next to the daily AppSec triage function, the engineering-side product security function, the security engineering team building the build-and-scan platform, and the CISO is policy stack.

For the daily operator function, SecPortal for AppSec teams covers the find-track-fix-verify shape that AI-assisted change review feeds into. For product security teams shipping software with a defensible posture record, SecPortal for product security teams covers the producer-side discipline. For the security engineering team building the harness, SecPortal for security engineering teams covers the platform-side reading path. For the CISO sponsoring the AI-coding policy, SecPortal for CISOs covers how the consolidated record rolls up into leadership reporting.

Pair the programme with adjacent operating reading. The SAST vs SCA explainer covers the harness composition. The threat modelling guide covers the upstream design discipline AI-assisted features still need. The SDLC vulnerability handoff use case covers the routing pattern that gets findings from the scanner harness onto the right engineering owner.

Run AI-Assisted Code Review on a Single Operating Record

The scanner output, the manual review finding, the lifecycle state, the exception decision, the framework mapping, and the engineering owner all need to live on the same record so the AppSec triage queue, the leadership dashboard, and the audit read collapse into one query rather than spreading across three or four scanner consoles.

SecPortal is built around a single engagement record: code scanning via Semgrep SAST and dependency analysis running against connected repositories at pull-request cadence, repository connections via GitHub, GitLab, and Bitbucket OAuth that wire the build-side ingestion, findings management with CVSS calibration, lifecycle tracking, and over three hundred finding templates that absorb both scanner output and manual review findings, continuous monitoring for the recurring scan cadence, the activity log for the timestamped chain of state changes that produces the audit-read trail, compliance tracking with ISO 27001, SOC 2, PCI DSS, and NIST framework mappings, and AI report generation for the leadership read of the AppSec posture record.

SecPortal does not generate or review code itself; it does not inspect the content of an AI prompt, does not ship a Copilot-style assistant, and does not replace the human reviewer who reads the change. It is the operating-record platform that the review programme runs against, so the SAST, SCA, manual review, and exception decisions all land on a single backlog rather than in four scanner consoles. Programmes evaluating dedicated ASPM or AppSec consolidation platforms should benchmark coverage of their specific scanner stack against SecPortal and against the named alternatives.

Scope and Limitations

This guide describes the operating shape of secure code review for AI-generated code as it is consumed in mainstream enterprise programmes. The assistant landscape evolves rapidly: model behaviour, vendor data-use policies, enterprise-tier features, and the empirical failure-rate distributions across languages and frameworks all shift between releases. Specific failure-class rates, named-assistant behaviour, and the precision-versus-recall properties of any specific SAST rule should be verified against current vendor documentation and against benchmark exercises on the team is own codebase.

AI-assisted coding is a lasting shift in how production software gets written, not a temporary phase. Programmes that adopt the elevated review pattern, the scanner harness, the policy stack, and the audit-read trail land somewhere close to the security posture they had before assistants entered the loop, with higher developer throughput on top. Programmes that skip the controls in the name of speed accumulate the failure modes silently and pay the cost on the audit, on the incident, or on the customer-disclosed defect.

Run AI-assisted code review on SecPortal

Stand up the operating record in under two minutes. Free plan available, no credit card required.