OWASP Top 10 for LLM Applications (2025): A Practical Guide
The OWASP Top 10 for LLM Applications is the working risk catalogue every AppSec, product security, and AI/ML team now reads when shipping a feature backed by a large language model. It is short, vendor-neutral, maintained by practitioners, and easy to map onto a real engineering programme. This guide walks each entry from LLM01 Prompt Injection through LLM10 Unbounded Consumption with the threat model, where to test it, the controls that actually work, and the audit evidence to keep alongside the operating record. It then covers how the LLM Top 10 composes with the OWASP web Top 10, the OWASP API Top 10, ASVS, MITRE ATLAS, NIST AI RMF, and ISO/IEC 42001, and ends with a four-week rollout for an internal LLM security programme.
What the OWASP Top 10 for LLM Applications Is
The OWASP Top 10 for LLM Applications is an open, community-maintained list of the ten most critical security risks for applications that use large language models. It is published by the OWASP GenAI Security Project, was first released in 2023, and is revised on a regular cadence as the application patterns around LLMs evolve. The list is deliberately scoped to the application layer. It does not attempt to enumerate the internal safety properties of foundation models. It enumerates the risks that engineering and security teams have to design for when they integrate a model into a product.
The 2025 iteration of the list reads: LLM01 Prompt Injection, LLM02 Sensitive Information Disclosure, LLM03 Supply Chain, LLM04 Data and Model Poisoning, LLM05 Improper Output Handling, LLM06 Excessive Agency, LLM07 System Prompt Leakage, LLM08 Vector and Embedding Weaknesses, LLM09 Misinformation, and LLM10 Unbounded Consumption. The order reflects observed prevalence and impact in real engagements rather than a strict severity ranking, and the descriptions evolve as the project absorbs more field data. Read the canonical text on the OWASP project site for current wording before taking the descriptions in this guide as final.
Use the LLM Top 10 the way good AppSec programmes use the OWASP web Top 10: as a shared vocabulary for threat modelling, a checklist for design review, a structure for penetration testing, and a starting taxonomy for the per-finding records inside the vulnerability programme. The value is in the operational discipline, not the list itself.
LLM01: Prompt Injection
Prompt injection is the LLM Top 10 entry that absorbs most attention because it is the entry point for almost every other exploit. The attacker submits text the model treats as a higher-priority instruction than the system prompt the developer wrote. The model ignores its guardrails, executes the attacker instruction, and emits output the surrounding application trusts. The vulnerability exists because LLMs do not enforce a hard separation between data and instructions inside the context window.
The two operational variants matter. Direct injection comes from the user typing into a chat or hitting an API endpoint. Indirect injection arrives in a document, web page, email, ticket, image alt text, or tool output the model later reads. Indirect injection is harder to defend against because the payload crosses a tenant or user boundary: a malicious document uploaded by user A can hijack the assistant when it later runs for user B. The deeper write-up on the prompt injection vulnerability page covers payload construction, defence patterns, and CWE-1427 mapping in detail.
Test every model-facing input with direct injection variants. Test every retrieval source and every tool return path with indirect injection variants. The defence is layered: minimise the trust the application places in raw model output, enforce output schemas, separate the model context for retrieved untrusted data, scope tool authorisations narrowly, and require explicit human approval for any high-impact action.
LLM02: Sensitive Information Disclosure
Sensitive information disclosure covers the case where the LLM emits content it should not: training data fragments, system prompt contents, retrieved documents from another tenant, PII from a logged-in user's context, or business data from a connected source. The mechanism is varied. The model may memorise rare strings during training or fine-tuning. The retrieval layer may pull a document the calling user is not authorised to read. The system prompt may itself contain secrets. The model may infer and disclose attributes from data the application thought was anonymised.
The defence has both a design layer and a runtime layer. The design layer prevents secrets from entering the model in the first place: do not place credentials in the system prompt, do not fine-tune on raw PII, do not load cross-tenant documents into a single retrieval index without per-document ACL enforcement. The runtime layer scrubs sensitive output: filter responses against a deny list of sensitive identifiers, redact patterns that match credentials, and log requests with privacy controls aligned to the organisation's data-handling policy. Tie the operational evidence into the same vulnerability programme that handles classical sensitive data exposure findings.
LLM03: Supply Chain
Supply chain risk for LLM applications covers the integrity of every input that shapes the model: the base model weights, the fine-tuned weights, the embedding models, the vector index contents, the third-party plugins or tools, and the LLM-adjacent libraries inside the application. A compromised model checkpoint can ship a backdoor that triggers on a specific prompt. A poisoned embedding index can warp retrieval. A malicious plugin can read or alter tool output the model depends on.
Treat LLM supply chain risk as a generalisation of classical vulnerable dependencies and apply the same engineering discipline that the rest of the supply chain programme uses. Pin model checkpoints to a verified hash. Track model provenance the way the engineering programme tracks software provenance under the SLSA framework and the SBOM guide. Vet the third-party plugins or tools the model can call. Quarantine new embedding sources before they are mixed into the production retrieval index.
LLM04: Data and Model Poisoning
Data and model poisoning is the deliberate manipulation of training, fine-tuning, or retrieval data to shape model behaviour. An attacker who can write to the corpus the model later reads (a public knowledge base, a user-contributed document store, a scraped web index, a customer support log used for fine-tuning) can plant text that biases the model in ways that are hard to detect once the model is shipped. Targeted poisoning can install behavioural triggers. Untargeted poisoning can degrade quality and shift the response distribution.
Defences cluster around data hygiene and provenance. Validate the source of every corpus the model consumes. Apply integrity controls to fine-tuning pipelines so an unreviewed dataset cannot reach a production model. Maintain a documented training data lineage that the AppSec function can audit. Quarantine new retrieval sources, run them through content scanners, and grant write access to retrieval indexes only to the specific service identities that need it. The same operational discipline that governs hardcoded secrets in code (commit-time scanning, signed artefacts, access controls on what gets pushed) applies to training data lineage.
LLM05: Improper Output Handling
Improper output handling is the LLM Top 10 entry that catches AppSec teams off guard most often. The model returns text the surrounding application then trusts as if it were sanitised. That output is rendered as HTML, passed to a SQL query, sent to a shell, written into a markdown document with active content, or used to construct a URL fetched server-side. The classical web vulnerability classes re-emerge through a new entry point.
Treat every model output as untrusted user input. Render LLM-emitted HTML through an output encoder, the same way the web programme already handles cross-site scripting. Validate parameters extracted from LLM output before they reach a query layer to avoid SQL injection. Refuse to fetch URLs constructed from LLM output without explicit allow-list checks to avoid server-side request forgery. Refuse to execute commands constructed from LLM output to avoid command injection. The discipline is uncomplicated: model output crosses a trust boundary on its way back into the application, and every trust boundary requires the standard output-handling controls.
LLM06: Excessive Agency
Excessive agency is the entry that turns a contained prompt injection into a real-world action. The model has tools attached. The tools authorise actions: read a document, send an email, charge a card, change a configuration, run a query, deploy code. Each tool grant is a delegated authority. When the grants exceed the trust the model deserves, an attacker who controls the prompt controls the tool calls, and the impact of the injection scales to the impact of the highest-permission tool the model can reach.
Defences are operational. Apply least privilege to every tool grant. Constrain tool parameters with schemas the application validates rather than trusting the model to format them correctly. Separate high-impact tools behind explicit human approval steps. Log every tool invocation with the prompt that triggered it, the parameters the model produced, and the outcome of the call, so the post-incident reconstruction is straightforward. The activity discipline that internal teams already apply to classical broken access control findings extends naturally to model-driven action surfaces.
LLM07: System Prompt Leakage
System prompt leakage is the disclosure of the instructions, formatting rules, or operational secrets a developer placed inside the model's system prompt. Some leakage is unavoidable: a sufficiently persistent attacker can almost always coax a chat-style model into echoing parts of the system prompt. The risk is in what the system prompt contains. If the prompt holds API keys, internal endpoint URLs, customer identifiers, or business rules that the application later trusts, the leakage degrades from embarrassment into a security finding.
Treat the system prompt the way mature engineering programmes treat configuration files. Do not place credentials, tokens, or unrotatable secrets in the prompt. Do not place authoritative business rules in the prompt expecting the model to enforce them; enforce them in the application. Pair the engineering discipline with the same information disclosure triage workflow that the vulnerability programme already runs.
LLM08: Vector and Embedding Weaknesses
Vector and embedding weaknesses cover the security properties of the retrieval layer that most production LLM applications now depend on. Retrieval-augmented generation (RAG) gives the model access to a corpus through embeddings. The corpus is the new attack surface. Cross-tenant retrieval that does not enforce per-document ACLs leaks data. Embedding inversion can reconstruct sensitive source text from stored vectors. Embedding poisoning can warp retrieval to favour attacker-controlled documents. Embedding denial-of-service can degrade retrieval quality at scale.
Defences look like a generalisation of classical access-control discipline. Enforce per-document authorisation on every retrieval call rather than treating the index as a single global corpus. Encrypt vector stores at rest and in transit. Apply integrity controls to ingestion so unreviewed documents cannot reach the production index. Log every retrieval call with the requesting identity, the query, and the documents returned, so a cross-tenant disclosure incident is reconstructible.
LLM09: Misinformation
Misinformation in the LLM Top 10 covers the application-layer consequences of model outputs that are confidently wrong. The model produces a hallucinated reference, a fabricated identifier, a fictional patch, an incorrect dosage, or a non-existent legal citation. The surrounding application or the human using it then acts on the output. The risk is not abstract. There are documented cases across legal, medical, security tooling, and customer support contexts where hallucinated output led to real-world harm.
The defences are partly technical and partly about product design. Constrain the model to answers it can ground in retrieval. Surface uncertainty in the user interface rather than hiding it. Require citation back to the retrieval source when the answer is consequential. Run an evaluation harness against a representative test set and track regression on hallucination rate the way the rest of the engineering programme tracks defect-density metrics. For consequential decisions, keep a human in the loop and log both the model output and the human override.
LLM10: Unbounded Consumption
Unbounded consumption is the LLM Top 10's name for the resource and cost exhaustion class. An attacker submits prompts that cause unboundedly large responses, recursive tool invocations, repeated retrieval calls, or expensive token usage. The result is degraded availability for legitimate users, an unbounded cloud bill, exhausted rate-limit headroom on the model provider, or a model-reasoning loop that hangs the application. The same family contains classical missing rate limiting and denial of service findings, recast around model-specific resource units.
Defences are unsurprising. Enforce per-user and per-tenant quotas on token consumption, request rate, tool invocation count, and retrieval call count. Cap maximum response length. Cap recursion depth and tool call chains. Monitor cost-per-request distribution for outliers and treat sudden cost spikes as a security signal rather than a finance signal. The same scanner result triage discipline that internal teams use for classical resource-abuse findings applies cleanly here.
How the LLM Top 10 Composes With Other Frameworks
The LLM Top 10 does not stand alone. An LLM-backed feature inherits every applicable risk from the adjacent OWASP lists and frameworks, and the AppSec programme that handles LLM features needs all the relevant catalogues active in the same threat model.
The OWASP framework hub covers the wider OWASP ecosystem the LLM Top 10 sits within. The OWASP ASVS verification standard is where most engineering teams operationalise their AppSec controls; the LLM Top 10 sits naturally on top of an ASVS programme as the model-specific risk catalogue. The OWASP API Security Top 10 is essential reading because most LLM applications are accessed through APIs that inherit every API-side vulnerability class. The wider OWASP web Top 10 still applies to the surrounding HTTP surface.
Above OWASP, the relevant AI-specific frameworks are the NIST AI Risk Management Framework (AI RMF 1.0 and the GenAI Profile published as NIST AI 600-1), ISO/IEC 42001 for AI management systems, and MITRE ATLAS for adversarial tactics and techniques against ML systems. NIST AI RMF gives leadership a risk-management vocabulary with the four functions (GOVERN, MAP, MEASURE, MANAGE) and the seven trustworthy AI characteristics, ISO 42001 gives the GRC function a certifiable management-system anchor, and MITRE ATLAS gives the offensive team a tactics catalogue. The LLM Top 10 reads cleanly inside any of the three; AI RMF is the framework most enterprise procurement and audit cycles read against.
A Four-Week Rollout for an Internal LLM Security Programme
For an internal AppSec function adopting the LLM Top 10 as its working catalogue for the first time, a four-week rollout is realistic without disrupting in-flight engineering work.
Week 1: Inventory. List every product feature that touches an LLM. Capture the model in use, the deployment shape (provider API, self-hosted), the retrieval sources, the tools the model can call, and the user surface (chat, summarisation, agent). The inventory is the precondition for everything else.
Week 2: Threat model and policy. Run a structured threat model against each LLM feature in the inventory using the LLM Top 10 as the threat catalogue. Output a short written policy that names which LLM Top 10 entries the programme treats as in-scope, the design principles each entry imposes, the required engineering controls, and the testing expectations. The policy is the document an internal review or external auditor will read first.
Week 3: Testing. Schedule an LLM-aware security test against the highest-impact feature. Use the threat model output to drive the test plan. Cover prompt injection (direct and indirect), output-handling regressions, agency boundary tests, and unbounded-consumption stress. Tie each finding back to the relevant LLM Top 10 entry. The companion threat modelling guide covers the upstream design step that makes the testing layer cheaper.
Week 4: Operating record. Cut over the per-finding records into the same vulnerability programme the rest of the AppSec function already runs. Tag each finding with the LLM Top 10 entry, the affected feature, the asset criticality, and the framework mapping. Run the same review and reporting cadence the wider programme uses. The continued operating discipline is the same as for any other AppSec finding family; the catalogue is the only new piece.
Where the LLM Top 10 Sits Inside the Wider Internal Programme
LLM application security is one classification layer inside a wider internal security organisation. It sits next to the engineering-side AppSec function, the daily operational discipline of the vulnerability management team, the GRC owner's evidence cadence, and the leadership reporting cadence the CISO produces. Different audiences read the same LLM Top 10 evidence differently.
For the AppSec function that owns the secure-coding curriculum and the application security review process, SecPortal for AppSec teams covers how LLM Top 10 distribution feeds the architecture conversation. For the product security function that owns the per-release security posture across LLM and non-LLM features, SecPortal for product security teams covers how LLM-specific findings roll up alongside classical AppSec findings. For the vulnerability management function that runs the find-track-fix-verify queue, SecPortal for vulnerability management teams covers the per-finding lifecycle. For the CISO sponsoring the programme, SecPortal for CISOs covers how LLM-risk outcomes roll into leadership reporting. For the GRC owner translating LLM controls into evidence, SecPortal for GRC and compliance teams covers the audit-side discipline.
Pair the LLM-side programme with adjacent enterprise reading. The vulnerability prioritisation framework guide covers the multi-signal prioritisation theory the LLM-tagged findings flow into. The SAST vs SCA code scanning guide covers the classical code-side tooling that complements the LLM-specific testing. The secure code review for AI-generated code guide covers the parallel discipline of reviewing code that AI assistants produce (a different surface from building LLM features). The security champions program guide covers the engineering-embedded operating model that makes LLM-aware design review sustainable. The CWE guide covers the cross-cutting weakness taxonomy that LLM Top 10 entries map into (LLM01 maps to CWE-1427, LLM05 commonly maps into CWE-79, CWE-89, CWE-78, and CWE-918 depending on the downstream sink).
Capturing Defensible LLM Top 10 Audit Evidence
The audit conversation about LLM application security reduces to a manageable evidence set. Build the set as a side effect of doing the work, and the audit collapses into a query rather than a multi-team scramble.
The minimum evidence set has six artefacts. The first is the LLM feature inventory described in week one of the rollout, kept current as features ship or retire. The second is the dated policy that names the in-scope LLM Top 10 entries, the engineering controls per entry, and the testing expectations. The third is the threat model output for each LLM feature, refreshed when the feature's context window, retrieval surface, or tool surface changes materially. The fourth is the per-finding record carrying the LLM Top 10 tag, the affected feature, the severity vector, the owner, the lifecycle (detected, prioritised, assigned, remediated, retested, closed), and the evidence. The fifth is the framework mapping (NIST AI RMF, ISO/IEC 42001, OWASP ASVS, ISO 27001, SOC 2) so the evidence pack is portable across audits. The sixth is the testing record from each LLM-aware engagement, including direct and indirect injection coverage, output-handling regressions, agency-boundary tests, and unbounded-consumption stress.
SecPortal's findings management feature tracks each finding with a CVSS 3.1 vector, owner, evidence, and remediation status, and supports structured fields and tags so the per-finding LLM Top 10 entry can be carried alongside the severity vector. The engagement management feature keeps each LLM-aware test as a first-class engagement record with scope, methodology, timeline, and the attached findings. The code scanning feature ingests SAST and SCA output from connected repositories, which catches the classical code-side issues that often surround LLM-feature code. The activity log keeps the timestamped chain of state changes by user across findings, engagements, scans, documents, comments, and team changes, with plan retention of 30, 90, or 365 days. The compliance tracking feature maps findings and controls to ISO 27001, SOC 2, Cyber Essentials, PCI DSS, and NIST and exports the evidence pack as CSV. None of those features run an LLM-specific risk engine. What the platform provides is one record on which the LLM Top 10 tag, the severity vector, the owner, the lifecycle, and the framework mapping all live so the audit query reads from the same source the operator runs from.
Run an LLM-Aware AppSec Programme on a Single Record
LLM application security is mostly an engineering discipline problem in disguise. The risk catalogue is public, the testing patterns are documented, and the framework integration is well-understood. What stops most programmes from getting clean LLM Top 10 evidence is that the per-finding tags, the lifecycle audit trail, the cross-feature distribution view, the framework mapping, and the leadership read all sit on different records, so producing the evidence pack means reconciling four or five sources at audit time. SecPortal is built around a single engagement record: findings management with CVSS 3.1 calibration and structured fields for the LLM Top 10 entry, engagement management for the per-test record, repository connections for the code-side ingestion that surrounds LLM-feature code, the activity log for the timestamped chain of state changes across findings, engagements, scans, and team changes, compliance tracking with framework mappings and CSV export, and AI-powered report generation when leadership wants the executive summary.
None of these features assign an LLM Top 10 entry for you: the mapping is yours to make at finding creation time. What the platform does is keep the LLM Top 10 value, the lifecycle, the evidence, the framework mapping, and the cross-finding distribution view on the same record so the audit conversation collapses into a query rather than a multi-team scramble.
Scope and Limitations
This guide describes the operating shape of the OWASP Top 10 for LLM Applications as it is consumed by internal AppSec, product security, and AI/ML security teams. The OWASP project continues to evolve: revisions to entry wording, ordering, and the underlying examples should be tracked against the current release on the OWASP GenAI Security Project site. Specific framework references (NIST AI RMF, NIST AI 600-1, ISO/IEC 42001, MITRE ATLAS) should be read against current publications.
The LLM Top 10 is a risk catalogue, not a scoring system, and it does not replace an existing AppSec programme. Programmes that adopt the LLM Top 10 as a complement to existing AppSec, vulnerability management, and supply chain disciplines see durable operating value. Programmes that treat the LLM Top 10 as a substitute for the wider engineering programme tend to discover the gap when a classical web-side issue surfaces inside an LLM-backed feature.
Run LLM-aware AppSec on SecPortal
Stand up the engagement record in under two minutes. Free plan available, no credit card required.