Vulnerability

Misinformation in LLM Applications
detect, understand, remediate

Misinformation (OWASP LLM09:2025) is the application-layer class where the model produces confidently wrong output that surrounding systems or human users act on. The damage is rarely the model itself. The damage is the downstream decision, ticket, contract, code commit, configuration change, refund, prescription, attestation, or customer message that inherited the wrong fact through a feature engineered without retrieval grounding, citation enforcement, schema constraint, factuality check, or a representative evaluation harness.

No credit card required. Free plan available forever.

Severity

Medium

CWE ID

CWE-1039

OWASP Top 10

LLM09:2025 - Misinformation

CVSS 3.1 Score

7.4

What is misinformation in LLM applications?

Misinformation in LLM applications is the vulnerability class where the model produces output that is confidently wrong and the surrounding application, or the human reading the answer, then acts on that output. The wrongness can be a hallucinated citation, a fabricated case number, a non-existent function name, an invented API path, a misquoted policy, an incorrect dosage, a synthesised CVE identifier, a fictional legal precedent, or a confidently asserted statistic that has no source behind it. The 2025 OWASP Top 10 for Large Language Model Applications lists the class as LLM09:2025 Misinformation and treats it as an application-layer security risk because the downstream system or user trusted the answer.

The class is distinct from the data-confidentiality classes that sit beside it. Where sensitive information disclosure in LLM applications is the wrong content reaching the user, misinformation is the right shape of content with the wrong facts. Where system prompt leakage leaks developer instructions, misinformation invents content the developer never wrote. Where excessive agency in LLM applications amplifies the consequence of any model decision into a real-world action, misinformation is the upstream defect that turns a tool-call argument into a fabricated identifier and a downstream record into a poisoned write. Where improper output handling in LLM applications treats trusted-looking output as already safe in a downstream sink, misinformation is the parallel defect on the truthfulness axis.

For internal AppSec, product security, AI engineering, ML platform, and security engineering teams, misinformation is rarely caught by classic input-validation reviews. The model produced grammatical, plausible, confidently formatted text. The defect lives in the truthfulness layer the engineering programme typically did not test. Documented harms include hallucinated legal citations in court filings, fabricated medical recommendations, security tools that invented CVE identifiers in remediation guidance, customer service agents that promised refunds outside policy, code assistants that hallucinated dependency names later registered by attackers (a vector for LLM supply chain vulnerabilities), and search products that cited sources that never existed.

The defence shape is partly technical (retrieval grounding, citation enforcement, schema-constrained generation, calibrated uncertainty exposure, and post-generation factuality checks) and partly product-design (surfacing uncertainty in the user interface, requiring source citation for consequential answers, keeping humans in the loop for irreversible actions, and running an evaluation harness against a representative test set with regression tracking). SecPortal records misinformation findings against the AI feature, captures the prompt and the asserted-but-false output and the ground-truth source as evidence, generates the writeup, and tracks the fix through retest the same way the rest of the security backlog moves.

The misinformation surface

Hallucinated identifiers and references

The model emits a CVE identifier, a case number, a legal citation, a function signature, a configuration flag, a vendor product name, a regulator decision number, or a research paper title that does not exist. The downstream reader treats the identifier as authoritative because the surrounding sentence reads correctly. The fabricated identifier then propagates into tickets, code, remediation reports, audit responses, and customer communications.

Fabricated source citations

The model attributes a claim to a real publication, agency, person, or document that does not contain the claim. The citation looks legitimate at a glance because the source exists. A reader who clicks through finds nothing relevant or finds the source contradicts the claim. The class includes invented URLs that look canonical but were never published.

Plausible-but-wrong instructions and policies

The model invents a configuration step, a CLI flag, a policy clause, a contract term, a settlement number, an opt-out URL, a regulatory deadline, or a process the organisation never actually adopted. The reader follows the instructions, makes the configuration change, signs the contract, accepts the policy, or reports the deadline to the customer, and the wrong content propagates downstream.

Hallucinated package and API names

A code-assistant feature recommends a dependency that does not exist on the package registry, an API endpoint the vendor never shipped, a SDK method that the library does not implement, or a config key the framework does not recognise. The class pairs to package-confusion supply-chain attacks where attackers later register the hallucinated name with malicious payload.

Confidently misread retrieval

The retrieval layer surfaces a real document, the model paraphrases it, and the paraphrase introduces a factual error the source never asserted. The error survives because the citation back to the source convinces the reader that the answer is grounded. The retrieval was correct, the grounding step failed silently.

Numerical and arithmetic hallucination

The model invents a percentage, a count, a calculation, a benchmark result, a price, a ratio, or a statistic that has no source. The error is hard to detect because numbers read as facts. Decisions taken on the basis of the invented figure (budget approvals, risk weighting, executive reporting, regulatory filings) inherit the defect.

Overconfident refusal misinformation

The model asserts that an action is impossible, a feature is unavailable, a policy forbids the request, or a regulation requires a specific posture, when none of the four is true. The user accepts the assertion and abandons a legitimate action. The class disproportionately affects customer-facing support agents and internal compliance assistants.

Domain-shifted hallucination across versions

The model was trained on documentation, APIs, regulations, or framework versions that have since changed. The answer reflects the older state with confidence, the user reads the answer as current, and a real change (a deprecated endpoint, a revoked control, an updated framework clause, a recalled product, a withdrawn standard) goes unaccounted for in the downstream decision.

How it goes wrong

1

No retrieval grounding for consequential answers

The model is asked a question whose correct answer cannot be inferred from training alone (a customer-specific number, a policy clause, a CVE record, a contract term), but the application calls the model directly with no retrieval step. The model fills the gap with a confident guess. The architecture has no way to refuse the question because nothing distinguishes a groundable question from a non-groundable one.

2

Citations not enforced in the rendering layer

The prompt asks the model to cite sources. The model sometimes does and sometimes does not, and the renderer accepts either output. Hallucinated answers slip through because the citation requirement was advisory rather than structural. A renderer that refuses to display uncited claims for consequential answer types catches the defect at the boundary.

3

No post-generation factuality check

The application returns the model output directly to the user with no verification step against the retrieval source. A simple downstream check (extract every factual claim, look up the source, compare) would catch many hallucinations, but the engineering programme never wired one and the user is the verifier of last resort.

4

Uncertainty hidden in the user interface

The model emitted a low-confidence answer, but the surface presents the answer without uncertainty signals. The user reads a hedged response as a confident assertion because the design treated confidence as noise. Surfacing a confidence band, a low-certainty badge, or a phrase like "the source did not directly answer this" lets the user calibrate trust.

5

Schema-free generation in structured contexts

The model returns JSON, SQL, configuration, or tool arguments as free text. The downstream parser accepts whatever shape the model returned. Field-level constraints (an identifier must exist in a lookup table, a date must fall in a range, a price must round to the cent) are absent. Hallucinated values pass through to the database, the tool call, or the contract.

6

Eval harness never built or never run

The team launched the feature without a representative test set and a measurement of hallucination rate. Regressions on model upgrade, prompt edit, retrieval change, fine-tune update, or context-window resize are invisible until a customer reports a wrong answer. The eval discipline is the single most under-invested control in production AI features.

7

No human in the loop for irreversible actions

A code-assistant writes to source, a billing assistant issues a refund, a customer service agent updates an account, a compliance assistant files an attestation, or a security agent closes a finding. The action is hard or impossible to reverse, but the model is allowed to take it without human review. The misinformation finding lands on the operating record after the action has already executed.

8

Prompt design rewards confident tone

The prompt instructs the model to "answer authoritatively", "do not hedge", "respond confidently", or "give a direct answer". The instruction degrades the model's native uncertainty signalling. Hallucinations now arrive with high confidence by construction. The fix is a prompt design that permits and rewards "I am not sure" for the answer types where that response is correct.

9

Model upgraded without re-running the eval

The provider released a new model, the team switched the feature over, and the eval was not re-run. Hallucination rates regressed on the answer types the team cares about. The defect ships under the assumption that newer models are always better. The eval has to be a release gate, not a launch artefact.

Common causes

Treating LLM output as deterministic text

The team mentally models the LLM call the way it models a function call: same input gives same output, output is correct unless input is malformed. The architecture inherits the assumption. The truth is the model is a probabilistic generator whose output distribution shifts with temperature, model version, prompt edits, retrieval payloads, and context. Every reliability control has to live outside the model call.

Retrieval mistaken for guarantee of grounding

A RAG pipeline is added and the team assumes the model now answers from sources. In practice, the model freely mixes retrieved content with parametric knowledge, fills gaps with confident invention, and ignores low-quality retrieval. A retrieval step without enforced grounding and citation is not a guarantee of factuality.

Confident-tone prompt instructions

The product team asks for a confident voice because hedging reads as weak. The prompt rewards confidence in language, the model produces confident language for both correct and incorrect answers, and the user loses the ability to calibrate trust on confidence alone. The brand voice goal collides with the reliability goal and the reliability goal loses by default.

No representative evaluation set

The team has no curated set of (prompt, expected behaviour) pairs that represents the answer types the feature is meant to handle, with ground-truth labels and edge-case coverage. Without the set, hallucination rate cannot be measured, regressions cannot be detected, and the only signal is a customer complaint after a wrong answer reaches production.

Logging captures the request but not the verification

Application logs record what the model returned and which user received it. The logs do not record whether a citation was checked, whether the cited source contained the claim, whether the user accepted or corrected the answer, or whether a downstream action executed against a hallucinated argument. Post-incident reconstruction defaults to grep against unstructured text.

Schema-free pipes between model and downstream systems

The model returns natural-language text into a downstream parser that accepts whatever arrives. Identifier fields, numerical fields, status codes, and tool arguments are not validated against a registry, a range, or a lookup. The hallucination passes the parser because the parser was permissive by design.

How to detect it

Automated detection

  • SecPortal code scanning runs against connected GitHub, GitLab, and Bitbucket repositories and flags LLM call sites that emit text into a security-sensitive downstream sink (code generation, contract drafting, ticket creation, financial calculation, regulatory filing) without a paired verification step, a citation requirement, or a schema constraint on the structured fields
  • Code scanning also flags prompt-construction sites that instruct the model to answer confidently, suppress hedging, or refuse to admit uncertainty for answer types the application cannot independently verify, since the confident-tone instruction is correlated with downstream hallucination harm
  • Authenticated scanning drives the LLM-backed endpoint with a curated misinformation corpus (questions designed to elicit fabricated identifiers, fake citations, hallucinated package names, invented configuration steps, false numerical answers, overconfident refusal, version-shifted documentation) under a real session, and records every response whose factual claims fail a verification check against the ground-truth source
  • External scanning discovers public agent surfaces, public chat dashboards, debug routes, and public starter templates that may expose the feature to unauthenticated probing for misinformation, so a hallucination on a public surface lands as a finding before an external researcher writes the post
  • Continuous monitoring re-runs the misinformation probe on the configured cadence so a model upgrade, a prompt edit, a retrieval-pipeline change, a fine-tune update, or a context-window resize that regresses hallucination rate shows up against the baseline rather than waiting for the next pentest cycle or the next customer complaint
  • Bulk finding import accepts CSV output from third-party LLM evaluation frameworks (Ragas, DeepEval, OpenAI evals, internal harnesses) so the engineering programme can land per-test-case hallucination findings on the same workspace where the rest of the security backlog lives, with the test-case identifier, the asserted answer, the ground truth, and the divergence captured on the finding record

Manual testing

  • Build a representative test set of prompts that reflect the answer types the feature is meant to handle, with ground-truth labels, citation requirements, and edge-case coverage including questions whose correct answer is "I do not know"
  • Run the test set against the production prompt, model, retrieval, and tool registrations and measure the hallucination rate, the unsupported-citation rate, the unjustified-refusal rate, and the calibration of model-stated confidence against verified correctness
  • Probe for fabricated identifiers (CVE IDs, case numbers, function names, package names, regulatory references) with questions that have a known empty result, and record any response that invents an identifier rather than admitting absence
  • Probe for unsupported citations by asking questions whose answer requires a source the retrieval cannot supply, and verify whether the model fabricates a citation or correctly declines to cite
  • Probe for numerical hallucination by asking for percentages, counts, prices, ratios, or statistics that have no canonical source in the retrieval, and verify whether the model invents a figure or correctly states the absence
  • Repeat the full test set on every model upgrade, prompt edit, retrieval change, fine-tune update, and tool registration change as a release gate, and treat regression on hallucination rate as a release-blocking defect

How to fix it

Constrain answers to retrieval-grounded questions where the cost of wrong answers is high

For consequential answer types (citations, identifiers, policy statements, numerical claims, regulatory references), require the model to ground the answer in retrieved content the application controls. If the retrieval returns no useful source, the application returns a refusal that names the absence, not a model guess. The architecture has to encode which answer types are groundable and what to do when grounding fails.

Enforce citation in the rendering layer

When the feature emits a claim that requires support, the renderer refuses to display the claim unless the model returned a citation that resolves to a real source. The check runs after generation and before display. A claim without a resolvable citation surfaces as a structured refusal the user understands rather than an uncited assertion the user trusts.

Run a post-generation factuality check on consequential answers

Extract every factual claim the answer makes, resolve each against the retrieval source or an external verification, and surface the divergence as a finding before the answer reaches the user. The check is cheaper than the harm of a confidently wrong answer reaching a downstream decision, and it pairs naturally with the citation-enforcement step.

Surface uncertainty in the user interface

Where the model stated low confidence, where the retrieval returned weak support, where the citation did not directly answer the question, the rendering surface communicates that uncertainty. A confidence band, a low-certainty badge, an "I am not sure" phrase, or a "the source did not directly answer this" line lets the user calibrate trust on the response shape rather than on the model voice.

Constrain structured generation with a schema and a registry

When the model returns structured output (JSON, SQL, tool arguments, identifiers, prices, dates), the downstream parser validates every field against a schema and every identifier against a lookup. The hallucinated CVE ID, the invented function name, the impossible date, and the out-of-range price all fail validation rather than passing through to the database, the tool call, or the contract.

Keep a human in the loop for irreversible or consequential actions

Code that touches source, billing decisions, customer account changes, compliance attestations, regulatory filings, and security finding closures all sit behind a human approval that reads both the model output and the verification result. The human approval is part of the operating discipline, not an optional ergonomic step.

Build a representative evaluation harness and treat it as a release gate

Build a curated set of (prompt, expected behaviour) pairs that represents the answer types the feature handles, with ground-truth labels and edge-case coverage. Re-run the harness on every model upgrade, prompt edit, retrieval change, fine-tune update, and tool registration change. Block release on regression. The eval has to be the single most invested control in production AI features, not a launch artefact.

Design prompts that permit uncertainty

The prompt rewards the model for saying "I am not sure", "the source did not address this", or "I cannot verify this from the available context" when the answer cannot be grounded. The instruction restores the model's native uncertainty signalling. The brand-voice goal of sounding confident has to bend to the reliability goal for consequential answer types.

Pair each LLM call to a known answer-type taxonomy

Each call is tagged with an answer-type (factual lookup, summary, generation, classification, decision support, code suggestion). The defence stack the answer type requires (retrieval grounding, citation enforcement, schema constraint, factuality check, uncertainty surfacing, human approval) is wired by type. The architecture stops treating every model call as one undifferentiated text generation.

Log the verification result alongside the request and the response

Every call records the request, the model response, the citation check result, the factuality check result, the user acceptance signal, and the downstream action that did or did not execute. The post-incident reconstruction reads against a structured operating record rather than an unstructured log dump, and the eval can re-derive hallucination rate over time from the same record.

Re-run the misinformation regression probe on every prompt, model, retrieval, and tool change

A model upgrade, a prompt edit, a retrieval-pipeline change, a fine-tune update, or a context-window resize can re-open a previously closed misinformation finding. Treat the probe as a first-class CI gate alongside unit and integration tests, and keep the canary prompts in the test suite where the engineering team sees the regression at commit time.

What this looks like in SecPortal

Finding with the prompt, the wrong answer, and the ground truth

The finding captures the prompt the user submitted, the confidently wrong output the model returned, the ground-truth source the answer should have grounded against, and the downstream action (or attempted action) the wrong answer triggered. AppSec, product security, AI engineering, and ML platform read the same record the engineering team uses to reproduce the defect.

Code scanning across prompt and verification call sites

Code scanning runs against connected GitHub, GitLab, and Bitbucket repositories. Findings surface at LLM call sites that emit to a consequential downstream sink without a paired verification step, at prompt-construction sites that suppress hedging, and at structured-output sites that pipe model text into a parser without schema validation. The remediation lands at the construction site, not at a perimeter filter.

Authenticated scanning with the misinformation probe

Authenticated scanning drives the LLM-backed endpoint with a curated misinformation corpus under a real session. Fabricated-identifier probes, fake-citation probes, hallucinated-package probes, invented-configuration probes, numerical-hallucination probes, and overconfident-refusal probes all execute, and the finding ties each response to the divergence between the asserted answer and the verified ground truth.

External scanning across exposed AI surfaces

External scanning enumerates public agent endpoints, public chat dashboards, debug routes, and public starter templates that may expose the feature to unauthenticated probing for misinformation. The finding ties the hallucination on the public surface back to the access path the team has to gate.

Continuous monitoring against eval regression

Continuous monitoring re-runs the misinformation probe on the configured cadence. A model upgrade, a prompt edit, a retrieval-pipeline change, a fine-tune update, or a context-window resize that regresses hallucination rate shows up against the baseline rather than waiting for a customer complaint, with the changed hallucination rate, the affected answer types, and the trigger captured on the finding record.

Bulk import from external eval frameworks

Bulk finding import accepts CSV output from Ragas, DeepEval, OpenAI evals, and internal harnesses so the engineering programme can land per-test-case hallucination findings on the same workspace where the rest of the security backlog lives. Each finding carries the test-case identifier, the asserted answer, the ground truth, the answer-type tag, and the regression baseline.

Retest after the remediation ships

Once the fix deploys (the retrieval-grounding requirement, the citation-enforcement step, the post-generation factuality check, the schema constraint on structured fields, the uncertainty-surfacing UI change, the prompt edit that permits "I am not sure", the eval added as a release gate), a targeted retest replays the original misinformation probe against the new construction and records the post-fix response on the finding. The finding closes against the evidence rather than against a developer assertion.

AI-assisted writeups with explicit honest scope

AI reports generate the writeup, the executive summary, and the developer-facing reproduction steps from the finding record. The narrative stays within the verified evidence (the prompt, the asserted answer, the ground truth, the answer-type tag, the eval harness identifier) and does not invent guardrails, factuality services, or runtime tooling the product does not have.

Document management for the eval harness record

Document management stores the representative test set, the ground-truth labels, the answer-type taxonomy, the citation policy, the schema definitions, the uncertainty-surfacing rules, and the human-in-the-loop policy. Each artefact attaches to the finding so the auditor reads the operating record the engineering programme actually runs against.

Compliance tracking pairs the fix to control evidence

Compliance tracking maps misinformation findings to the controls that read against them (ISO/IEC 42001 AI management system, ISO 27001 A.5.34 privacy and protection, A.8.16 monitoring activities; SOC 2 CC4.1 monitoring and CC7.2 system monitoring; NIST AI RMF Measure 2 and Manage 4 functions; NIST SSDF PW.5, PW.7, and PW.8; OWASP LLM Top 10 LLM09).

What SecPortal does not do

SecPortal is the operating record where misinformation findings, the prompts the user submitted, the confidently wrong outputs the model returned, the ground-truth sources the answers should have grounded against, and the downstream actions the wrong answers triggered land alongside the rest of the security backlog. The product does not act as an LLM evaluation harness in production, does not host a managed hallucination-detection service, does not provide a retrieval-grounding proxy between the application and the model, does not run a managed factuality-check API, does not maintain ground-truth corpora for your domain, and does not act as an AI gateway intercepting prompts between the application and the LLM provider.

SecPortal does not connect to Jira, ServiceNow, Slack, SIEM, SOAR, identity providers (Okta, Entra), or external ticketing systems through packaged integrations. The discipline is the engineering practice on top of the operating record: AppSec, product security, AI engineering, ML platform, and security engineering teams design the retrieval-grounding architecture, enforce citation in the rendering layer, write the post-generation factuality check, surface uncertainty in the user interface, constrain structured generation with schemas and registries, keep humans in the loop for consequential actions, design prompts that permit uncertainty, build the representative evaluation harness, and re-run the regression probe on every model, prompt, retrieval, fine-tune, and tool change.

Related tools and reading

Vulnerability

Sensitive information disclosure in LLM applications (LLM02)

The confidentiality peer. Where LLM02 is the wrong content reaching the user, LLM09 is the right shape with the wrong facts. The two findings often pair on the same RAG pipeline because the retrieval, the grounding, and the citation discipline shape both classes.

Vulnerability

Prompt injection (LLM01)

An adversarial cause of misinformation. An attacker rewrites the model's instructions to emit a confidently wrong answer the user trusts. The eval harness has to include adversarially induced hallucinations, not only honest-user failure modes.

Vulnerability

Indirect prompt injection via RAG

A retrieval-side cause of misinformation. A poisoned source in the retrieval corpus instructs the model to assert a false fact. Grounding and citation enforcement do not catch the defect if the source itself is hostile.

Vulnerability

Data and model poisoning (LLM04)

A training-side cause of misinformation. Poisoned training, fine-tuning, or embedding data shifts the model's baseline factuality. The eval harness regression is the only signal that the new model variant has degraded on the answer types the feature handles.

Vulnerability

Improper output handling (LLM05)

The downstream-sink parallel. LLM05 is the sink that treats trusted-looking output as already safe, LLM09 is the parallel defect on the truthfulness axis. The two findings often share remediation through schema constraints and post-generation checks.

Vulnerability

Excessive agency (LLM06)

The amplifier. When the agent can act on its own output, a hallucinated identifier becomes a tool call with a fabricated argument and the downstream record inherits the defect. Keeping humans in the loop is one of the few controls that breaks the amplification.

Vulnerability

LLM supply chain (LLM03)

The package-confusion adjacency. Hallucinated package names that get later registered by attackers turn a code-assistant misinformation finding into a supply-chain incident. The schema-and-registry constraint on structured generation is the same defence in both classes.

Vulnerability

Vector and embedding weaknesses (LLM08)

The retrieval-layer adjacency. Weak retrieval surfaces poor sources, poor sources weaken grounding, weak grounding raises hallucination rate. The two findings often share the same retrieval pipeline as the root cause.

Blog

OWASP Top 10 for LLM applications explained

The full 2025 LLM Top 10 reading in operating context, including how LLM09 Misinformation sits beside the nine other classes and how the defences compose into one engineering programme.

Blog

Secure code review for AI-generated code

The code-review playbook for the upstream half of AI application security. Hallucinated package names, invented function signatures, and fabricated configuration keys all surface at review time when the reviewer is reading for misinformation, not only for vulnerability classes.

Framework

NIST AI Risk Management Framework

The Measure and Manage functions read directly against misinformation evidence: hallucination rates by answer type, regression on model upgrades, calibration of stated confidence against verified correctness, and post-incident reconstruction of wrong answers that reached production.

Framework

ISO/IEC 42001 AI management system

The control objectives covering AI system lifecycle, performance monitoring, information handling, and impact assessment pair directly to misinformation remediation evidence: representative evaluation, regression baselines, citation and grounding policies, and human-in-the-loop discipline.

Framework

OWASP and the LLM Top 10

The OWASP hub including the 2025 LLM Top 10 list where LLM09 Misinformation sits alongside LLM02 Sensitive Information Disclosure, LLM05 Improper Output Handling, and LLM06 Excessive Agency.

For

SecPortal for AppSec teams

The day-to-day workspace where AppSec engineers run the misinformation probe, the eval-harness regression check, and the remediation track for every LLM feature shipping in the product.

For

SecPortal for product security teams

The workspace where product security teams own the AI feature security posture across releases, with eval harnesses, citation-enforcement checks, and misinformation probes wired into the release process.

Feature

Code scanning

Semgrep-backed SAST and SCA across connected GitHub, GitLab, and Bitbucket repositories. Findings surface at LLM call sites that emit to consequential downstream sinks without paired verification, at prompt-construction sites that suppress hedging, and at structured-output sites that bypass schema validation.

Compliance impact

Track LLM09 misinformation findings against every AI feature

SecPortal records LLM09 findings against the AI feature, attaches the prompt, the asserted-but-false output, the ground-truth source, and the answer-type tag as evidence, generates AI-assisted writeups, accepts bulk import from external eval frameworks, and tracks the fix through retest. Start for free.

No credit card required. Free plan available forever.