Vulnerability

Sensitive Information Disclosure in LLM Applications
detect, understand, remediate

Sensitive information disclosure (OWASP LLM02:2025) is the vulnerability class where an LLM-backed application emits content the data owner never intended to release: PII from training data, secrets baked into prompts or fine-tuning corpora, intellectual property the retrieval layer surfaced, conversation memory from another user, or fields reconstructed through embedding inversion and membership inference. The damage is rarely the model. The damage is the data that reached the response, the log, the vector store, or the third-party observability vendor on the way through.

No credit card required. Free plan available forever.

Severity

High

CWE ID

CWE-200

OWASP Top 10

LLM02:2025 - Sensitive Information Disclosure

CVSS 3.1 Score

7.5

What is sensitive information disclosure in LLM applications?

Sensitive information disclosure in LLM applications is the vulnerability class where the model emits, or can be coaxed into emitting, content that should never reach the caller. The OWASP GenAI Security Project lists it as LLM02:2025 Sensitive Information Disclosure in the 2025 Top 10 for LLM Applications. The leaked content typically falls in one of five buckets: personal data (PII memorised from training, fine-tuning, or retrieval), credentials and secrets (API keys, partner tokens, connection strings, internal endpoints embedded in prompts or fine-tuning data), intellectual property (proprietary algorithms, code, internal documents the retrieval layer pulled in), business-sensitive context (pricing, contracts, customer lists, board materials), and cross-tenant or cross-session data (information from another user's conversation, another tenant's vector store, or another customer's fine-tune slice).

LLM02 sits beside the other LLM Top 10 disclosure classes and the broader application data-handling classes. The system prompt leakage page covers the narrower case where the developer-written prompt itself leaks. The model extraction attack page covers the case where the asset is the model parameters or training-set membership rather than a discrete piece of sensitive data. The sensitive data exposure page covers the traditional application class where the leak surface is an unprotected response, log, or storage location. LLM02 specifically covers the inference surface as a data egress channel: the request reaches the model, the model produces content, and that content carries information from the upstream context the application never authorised the caller to read.

For internal AppSec, product security, data security, AI engineering, and ML platform teams, the operating reality is that any LLM feature is a new data egress surface that needs the same threat model as a search endpoint, an export endpoint, or a customer-facing report renderer. The fix is rarely a single guardrail. It is a stack: keep sensitive data out of training and fine-tuning where possible, redact what cannot be removed, authorise retrieval against the caller identity, filter the output for known sensitive patterns, log requests through a redaction step, and pair every release with an extraction probe regression that runs on every prompt, model, corpus, and retrieval-policy change.

The class shows up in regulator inquiries the moment an LLM feature touches user data. GDPR Article 5, HIPAA 164.514 minimum necessary, PCI DSS Requirement 3 data protection, ISO/IEC 42001 AI management system controls, and the EU AI Act high-risk transparency and data governance articles all read directly against an LLM02 finding. Auditors will ask three questions: what sensitive data could the model have seen during training or retrieval, what controls minimise and redact that data, and what evidence shows the controls actually run on every release. The answers live on the finding, not in a slide deck.

The disclosure surfaces

Training and fine-tuning data memorisation

A foundation model memorises rare strings from its training corpus, then reproduces them when an attacker probes with adjacent context. A fine-tuned model trained on customer support transcripts, internal tickets, or HR records reproduces fragments of those records to any caller. The disclosure happens through normal model output, not through a separate exploit.

Retrieval Augmented Generation corpus exposure

A retrieval layer pulls chunks from a vector store keyed by the user prompt and pastes them into the model context. If the retrieval does not authorise per caller identity, a generic question returns chunks indexed from another tenant, another department, or a document the user never had read access to. The model summarises the chunk faithfully and the caller reads content they were never granted.

Conversation memory bleed across sessions or users

A shared cache, a misrouted session token, a poorly scoped memory feature, or a prompt template that concatenates prior turns from a global buffer lets a new caller read fragments of a prior session. Multi-tenant deployments are the highest-risk shape because one buffer bleed crosses a customer boundary.

Prompt-embedded secrets and personalisation fields

A developer pastes a customer identifier, a partner API key, an internal endpoint URL, a feature-flag list, or a jurisdictional flag into the system prompt for convenience. Any caller who can extract or paraphrase the prompt context reads the secret. The system prompt leakage page covers this surface as its own class and pairs to LLM02 when the leaked secret is itself sensitive data.

Embedding inversion and membership inference

A vector store exposes embeddings through a search endpoint, a debugging surface, or a public sandbox. An attacker reconstructs the source text from the embedding through known inversion techniques, or queries with candidate strings to learn which were in the training set. Both are inference-side disclosures the application did not consider when designing the embedding API.

Tool-call results and structured outputs

An agent calls a downstream tool, a database query helper, or a search service and embeds the raw result into its response. The tool returned more rows or richer fields than the model was supposed to summarise. The agent emits the structured payload verbatim and the caller reads the full record set.

Observability and logging destinations

Application logs, LLM provider traces, third-party observability vendors, and feature-flag evaluation services capture the prompt, the retrieved chunks, and the model response. The sensitive data is now in storage destinations the security team did not enumerate, with retention policies the team did not write, often outside the data residency the contract promised.

Public starter templates, debug routes, and AI gallery surfaces

A team publishes a starter prompt, a demo agent, a fine-tuned model snapshot, or a sample RAG corpus to a public gallery for a launch or a community contribution. The artefact carries internal context the engineering team forgot was sensitive: customer names, vendor relationships, internal product names, or unannounced features. The disclosure happens before any attacker exists.

How it goes wrong

1

PII enters training without a documented minimisation step

A fine-tuning corpus is assembled from customer support transcripts, HR tickets, internal Slack exports, or analytics warehouse extracts. There is no checklist that walks each column against a minimisation rule, no redaction step before the file lands in the training pipeline, and no inventory of what classes of personal data the model has potentially memorised. The auditor sees no evidence the team thought about it.

2

Retrieval is authorised by the prompt, not by the caller identity

The vector store is queried with the user prompt as the search key and returns the top matching chunks. There is no identity-scoped filter that checks whether the calling user had read access to the source document the chunk came from. The disclosure is a routine cross-tenant or cross-department read that happens to flow through the model.

3

Conversation history is concatenated from a global or weakly scoped buffer

A memory feature, a long-context conversation feature, or an agent that maintains state across sessions pulls prior turns from a buffer that is not strictly per-caller, per-session, or per-tenant. A misrouted request reads another caller's turns. The model includes them in the next response as if they were the natural continuation of the current conversation.

4

Outputs are not filtered against known sensitive patterns

There is no post-generation step that checks the model output for credit card numbers, social security patterns, internal email patterns, API key patterns, internal hostname patterns, customer identifiers, or other regex-detectable sensitive content. The model emits whatever it produced and the application returns it to the caller without inspection.

5

Sensitive data sits unredacted in observability and logs

Application logs capture the full prompt, the full retrieved chunks, and the full model response. LLM provider traces, telemetry vendors, prompt-evaluation services, and debug dashboards all receive the same payload. The sensitive content is now stored across many destinations with retention policies the team did not write and access controls the team does not own.

6

Embeddings and fine-tuned snapshots ship without an access boundary

A vector store is exposed through an unauthenticated search endpoint to support a public demo. A fine-tuned model file is published to a public model registry for community access. The team did not consider that the embeddings or the fine-tune carry the source data in inverted form, and that the inversion is feasible with public techniques.

7

Cross-tenant deployment without per-tenant data separation

A multi-tenant SaaS feature uses one shared model, one shared vector store, and one shared cache for every customer. There is no per-tenant scope on the retrieval, the cache, the conversation memory, or the fine-tune target. A single leak exposes every customer. This shape concentrates regulatory risk because one finding affects every tenant in the contract base.

8

No threat model that lists the sensitive data classes the feature handles

The team did not produce, before launch, an inventory of the personal data, the secrets, the intellectual property, the regulated data, and the cross-tenant data the LLM feature might touch on the request path. Without that inventory there is no anchor for the redaction policy, the retrieval authorisation rules, the output filter list, or the extraction probe corpus that the release pipeline should run.

9

Extraction regression is not re-run on every release

A model upgrade, a prompt edit, a new retrieval source, a refresh of the fine-tune corpus, or a new agent tool re-opens a closed disclosure finding. The team has no CI gate that replays the canonical extraction probes and compares the outputs against the known sensitive corpus. The regression slips into production and the next incident is found by an external researcher rather than the release pipeline.

Common causes

Treating the model as a black box rather than a data handler

The team treats the LLM as if the data it sees evaporates at the end of the call. The deployment shape (memorisation in weights, indexed embeddings on disk, traces captured by observability) means the data persists in many places the application owner has to inventory and protect.

Fine-tuning on uncurated internal corpora

A team takes the internal documentation, the support archive, or the analytics warehouse and feeds it into a fine-tune without a minimisation pass. The model now reproduces fragments of those records to any caller. The fine-tune is the slowest path to redact because the data is now in the weights.

Retrieval indexes that ignore caller identity

A RAG pipeline is built with a vector store that returns the closest chunks to the query. There is no identity filter that asserts the caller had read access to the source document. The retrieval is correct but the authorisation is missing.

Personalisation by literal embedding in the prompt

A team embeds the user real name, account identifier, plan tier, jurisdictional flag, customer code, or internal field directly in the system prompt for convenience. The fields become part of the leakable surface every extraction prompt probes for.

Output post-processing as an afterthought

The output filter is a regex list someone wrote during the launch sprint. The list does not cover the customer identifier patterns, the internal endpoint patterns, the partner names, or the regulated data classes the threat model would have surfaced if it had been written. The filter passes the sensitive content through unchanged.

Trusting observability destinations the security team did not contract directly

Engineering teams adopt LLM observability vendors, prompt evaluation services, or feature-flag evaluation services without a redaction agreement. The sensitive prompt content, the retrieved chunks, and the model output now sit with vendors whose data retention, residency, and access controls the security team has not reviewed.

How to detect it

Automated detection

  • SecPortal code scanning runs against connected GitHub, GitLab, and Bitbucket repositories and flags prompt construction sites that embed customer identifiers, partner credentials, or internal endpoints; fine-tuning data pipelines that ingest PII without a documented minimisation step; retrieval helpers that pass indexed documents through to the model without per-identity authorisation; and observability sinks that capture the prompt or response without a redaction step
  • Authenticated scanning drives the LLM-backed endpoint with a curated extraction corpus under a real session: direct PII probes, cross-tenant identifier probes, retrieval-source probes (asking for the full source document behind a summary), conversation-memory probes (asking what the prior session contained), and intellectual-property probes targeting internal documentation fragments that should remain scoped
  • External scanning discovers exposed agent endpoints, public chat surfaces, debug routes, public starter templates, leaked fine-tuned model snapshots, and public documentation that may have published the production prompt, the redaction policy, or the indexed corpus by accident
  • Continuous monitoring re-runs the extraction probe on a defined cadence so a prompt edit, a model upgrade, a refreshed fine-tune, a new retrieval source, or a new agent tool that re-opens a previously closed disclosure surfaces against the baseline rather than waiting for the next pentest cycle
  • Bulk finding import accepts the validated output of dedicated LLM red-team tools, AI safety scanners, or prompt-evaluation services so external probe results land on the same engagement record as the SecPortal-driven probes, with one CVSS 3.1 calibration applied across the LLM02 finding chain

Manual testing

  • Produce the sensitive data inventory for the LLM feature: what personal data, what secrets, what intellectual property, what regulated data, and what cross-tenant data the feature touches on the request path, on the training data pipeline, and on the retrieval index
  • Walk the training and fine-tuning corpus build pipeline and confirm a minimisation step exists, with a documented redaction rule per data class and an audit trail per training run that records what was removed
  • Probe the deployed feature with direct PII extraction prompts that ask the model to recite known patterns, list known users, or recall examples from the training set, and record any verbatim or paraphrased response that confirms memorisation
  • Probe the retrieval layer by asking the agent for the full source document behind a summary, for the metadata of the chunks it cited, for the document title or owner, and for cross-tenant identifiers that the calling identity should not have access to
  • Probe conversation memory by opening a new session under a second identity and asking the model to recall details from the prior session, including arbitrary tokens the prior session contained and unique identifiers that would only appear in a leak
  • Review application logs, LLM provider traces, observability vendor dashboards, prompt evaluation services, and debug dumps for any destination that captures the prompt, the retrieval payload, or the model response without a redaction step
  • Walk every public surface the team has shipped (starter templates, demo agents, public model snapshots, sample corpora) and confirm none of them carry internal context, customer names, partner relationships, or pre-announcement product references

How to fix it

Produce a sensitive data inventory for the LLM feature before launch

List the personal data, the secrets, the intellectual property, the regulated data, and the cross-tenant data the feature touches on the request path, the training pipeline, the fine-tune corpus, the retrieval index, and the observability sinks. The inventory becomes the anchor for the redaction policy, the retrieval authorisation rules, the output filter list, and the extraction probe corpus.

Minimise and redact training and fine-tuning data before it reaches the pipeline

Apply per-class minimisation rules (drop the field, hash the field, tokenise the field, redact with a known marker) before training data lands in the build pipeline. Record the redaction decisions per training run on the operating record so the auditor reads what the team removed and why.

Authorise retrieval against the calling identity, not against the prompt

Pass the caller identity, role, tenant, and scope to the retrieval helper and filter the candidate chunks against the user permission set before pasting them into the model context. The retrieval is correct only when it respects the read permissions the source system enforces.

Scope conversation memory per caller, per session, and per tenant

Memory buffers, agent state, and conversation context must be strictly scoped. A misrouted request must read empty state, not another caller's turns. The buffer key has to include the identity, the tenant, and the session, and the storage layer has to enforce the scope at the access level rather than relying on the application code to filter.

Filter the model output for known sensitive patterns

Apply a post-generation step that scans the response for credit card patterns, social security patterns, internal email patterns, API key patterns, hostname patterns, customer identifiers, and other regulated data classes the threat model lists. Block, redact, or surface the response as a finding when a match is found.

Redact prompts, retrieval payloads, and responses on every observability destination

Every log destination, every LLM provider trace, every observability vendor, every prompt-evaluation service, and every debug dump needs a redaction step that replaces sensitive content with a hash or placeholder. The redaction agreement has to be in the vendor contract, not just in the application code.

Treat embedding indexes and fine-tuned snapshots as sensitive assets with access control

Vector stores and fine-tuned model files carry the source data in transformed form. They need the same access boundary as the source database. Public sandboxes, demo embeddings, and public model registries must use a separately curated, demonstrably non-sensitive corpus.

Run the extraction probe regression on every prompt, model, corpus, and policy change

A model upgrade, a prompt edit, a refreshed fine-tune, a new retrieval source, a new agent tool, or a refactored output filter is a release event that needs the canonical extraction probe corpus replayed against it. The release pipeline blocks until the probe outputs are within the known-clean threshold.

Surface every disclosure attempt as a finding with the prompt, response, and leaked content category attached

Block-and-log is not enough. Each blocked extraction attempt, each redacted output, and each cross-tenant retrieval refusal becomes a structured record the security team can review for pattern, intent, and frequency. The record carries the actor identity, the prompt, the response, and the content class the system classified.

Map every LLM02 finding to the regulatory class it concerns

A PII disclosure is a GDPR Article 5 issue. A health record disclosure is a HIPAA 164.514 issue. A cardholder data disclosure is a PCI DSS Requirement 3 issue. An internal IP disclosure is an ISO 27001 Annex A 8.10 to 8.12 issue. The finding has to carry the regulatory mapping so the GRC team has the evidence ready when the auditor or regulator asks.

What this looks like in SecPortal

Finding with the extraction prompt, response, and content class

The finding captures the extraction prompt the attacker used, the model response, the substring that leaked, the upstream data source (training, fine-tune, retrieval, memory, prompt), and the content class (PII, secret, IP, regulated data, cross-tenant). AppSec, product security, and data security read the same record the engineering team uses to reproduce the disclosure and ship the fix.

Code scanning across LLM data handling sites

Code scanning runs Semgrep against connected GitHub, GitLab, and Bitbucket repositories. Findings surface at prompt construction sites embedding personalisation fields or partner credentials, fine-tuning data pipelines without a minimisation step, retrieval helpers that bypass identity scope, and observability sinks that capture the prompt or response without redaction.

Authenticated scanning with the extraction probe corpus

Authenticated scanning runs against the LLM-backed endpoint with a curated extraction probe corpus under a real session. Direct PII probes, cross-tenant identifier probes, retrieval-source probes, conversation-memory probes, and intellectual-property probes all execute, and each finding ties the response to the substring of the upstream data source that leaked.

External scanning across exposed AI surfaces

External scanning enumerates public agent endpoints, debug routes, public starter templates, leaked fine-tuned model snapshots, and public documentation that may have published the production prompt, the redaction policy, or the indexed corpus by accident. The finding ties the leaked content back to the public surface the team has to update or take down.

Continuous monitoring against AI feature drift

Continuous monitoring re-runs the extraction probe on the configured cadence. A prompt edit, a model upgrade, a refreshed fine-tune corpus, a new retrieval source, a new agent tool, or a refactored output filter that re-opens a previously closed disclosure surfaces against the baseline rather than waiting for the next pentest cycle.

Bulk import for external LLM red-team output

Bulk finding import accepts CSV intake from dedicated LLM red-team tools, AI safety scanners, and prompt-evaluation services. External probe results land on the same engagement record as the SecPortal probes, with one CVSS 3.1 calibration applied across the LLM02 finding chain and one owner assignment per finding.

Retest after the remediation ships

Once the fix deploys (the training data is re-curated and redacted, the retrieval layer is rewritten to filter by identity, the memory buffer is scoped per tenant, the output filter is extended, the redaction agreement with the observability vendor is in place), a targeted retest replays the original extraction probe corpus and records the post-fix response on the finding. The finding closes against the evidence rather than against a developer assertion.

AI-assisted writeups with verified scope

AI reports generate the writeup, the executive summary, and the developer-facing reproduction steps from the finding record. The narrative stays within the verified evidence on the finding (the extraction prompt, the model response, the leaked substring, the upstream data source, the content class) and does not invent guardrails, sandbox behaviour, or runtime tooling the product does not have.

Document management for the canonical inventory

Document management stores the sensitive data inventory, the redaction policy, the retrieval authorisation rules, the output filter list, the extraction probe corpus, the observability redaction agreement, and the regulatory mapping per finding. Each artefact attaches to the finding so the auditor reads the operating record the engineering programme runs against.

Compliance tracking pairs the fix to control evidence

Compliance tracking maps LLM02 findings to the controls that read against them: ISO 27001 A.5.34 PII, A.8.10 information deletion, A.8.11 data masking, A.8.12 data leakage prevention; SOC 2 CC6.1 logical access, CC6.7 transmission and disposal; PCI DSS Requirement 3 data protection; NIST SSDF PW.5 secure coding; NIST AI RMF Map, Measure, Manage; ISO/IEC 42001 AI management system; OWASP LLM Top 10 LLM02.

What SecPortal does not do

SecPortal is the operating record where LLM02 sensitive information disclosure findings, the extraction prompts the attacker used, the model responses, the substrings that leaked, the upstream data sources, and the content classes land alongside the rest of the security backlog. The product does not act as an AI gateway intercepting prompts between the application and the LLM provider, does not host a managed prompt redaction proxy, does not enforce per-request authorisation inside your retrieval pipeline, does not maintain your fine-tuning corpus, does not run a managed extraction-probe library that updates without your engineering team, and does not act as a data loss prevention engine on the inference channel.

SecPortal does not provide a model training service, a vector database, a fine-tuning service, a managed RAG pipeline, an embedding inversion defense library, a differential privacy training stack, or a model watermarking service. The product does not connect to Jira, ServiceNow, Slack, SIEM, SOAR, identity providers (Okta, Entra), or external ticketing systems through packaged integrations. The discipline is the engineering practice on top of the operating record: AppSec, product security, AI engineering, ML platform, and data security teams write the sensitive data inventory, the training data minimisation pipeline, the retrieval authorisation code, the conversation memory scoping, the output filter, the observability redaction agreement, and the CI gate that re-runs the extraction probe on every prompt, model, and corpus change.

Related tools and reading

Vulnerability

Prompt injection (LLM01)

The input-side hijack that often acts as the unlock for a disclosure attack. Once the attacker rewrites the model's instructions, the next request is often an extraction prompt that pulls sensitive data through the response surface.

Vulnerability

Indirect prompt injection via RAG

A retrieved document instructs the model to disclose sensitive content from the broader corpus without the user ever asking. The disclosure surface extends to every connected corpus once the indirect path is in play.

Vulnerability

System prompt leakage (LLM07)

The narrower case where the developer prompt leaks. When the prompt embeds personalisation fields, customer identifiers, or secrets, the prompt leakage finding and the LLM02 finding pair on the same evidence pack.

Vulnerability

Improper output handling (LLM05)

The downstream sink. When the model emits sensitive content, every renderer, query builder, fetcher, and tool call that consumes the response inherits the data and writes it to a fresh destination.

Vulnerability

Model extraction attack

The wider family covering model stealing, membership inference, and model inversion. Where LLM02 lands discrete sensitive data, model extraction lands the model itself or training-set membership through the same inference surface.

Vulnerability

Data and model poisoning (LLM04)

The upstream control: an attacker plants content in the training, fine-tune, or retrieval corpus that conditions the model to leak on demand. The poisoning finding feeds into the disclosure finding when the planted material is the exfiltration target itself.

Vulnerability

Misinformation in LLM applications (LLM09)

The truthfulness peer. Where LLM02 is the wrong content reaching the user, LLM09 is the right shape of content with the wrong facts. The two findings often share the same RAG pipeline, the same citation discipline, and the same eval harness as the operating record.

Vulnerability

Sensitive data exposure

The traditional application class. LLM02 extends the discipline to the inference channel and adds the training corpus, the fine-tune corpus, the retrieval index, and the embedding store as new disclosure surfaces.

Vulnerability

Hardcoded secrets

The same problem in a new container. A secret in a prompt or a fine-tune corpus is a secret in source. Code scanning catches both shapes at commit time so the disclosure never reaches the inference surface.

Blog

OWASP Top 10 for LLM applications explained

The 2025 LLM Top 10 read in operating context, with LLM02 Sensitive Information Disclosure framed alongside LLM01 Prompt Injection, LLM05 Improper Output Handling, LLM06 Excessive Agency, and LLM07 System Prompt Leakage.

Blog

Secure code review for AI-generated code

The code review playbook for the upstream half of AI application security: prompt templates, retrieval handlers, training data pipelines, output filters, and the observability redaction agreement.

Framework

NIST AI Risk Management Framework

The Map, Measure, and Manage functions read directly against sensitive data inventories, training data minimisation evidence, retrieval authorisation rules, and the extraction probe corpus the engineering programme runs.

Framework

ISO/IEC 42001 AI management system

The AI management system standard. Information handling, training data governance, fine-tune accountability, and accountability for AI feature outputs all read against the LLM02 finding evidence pack.

Framework

GDPR for LLM features

Article 5 lawfulness and minimisation, Article 25 data protection by design, Article 32 security of processing, Article 35 DPIA. Every LLM02 finding that touches personal data has a direct GDPR mapping the data protection officer reads.

Framework

OWASP and the LLM Top 10

The OWASP hub including the 2025 LLM Top 10 list where LLM02 Sensitive Information Disclosure sits alongside the rest of the AI application risk catalogue.

For

SecPortal for AppSec teams

The day-to-day workspace where AppSec engineers run the sensitive data inventory, the retrieval authorisation review, the extraction probe, and the remediation track for every LLM feature shipping in the product.

For

SecPortal for product security teams

The workspace where product security owns the AI feature security posture across releases, with the training data minimisation policy, the retrieval scoping, the output filter, and the extraction probe regression wired into the release process.

For

SecPortal for data security teams

The workspace for the data security function that owns the personal data inventory, the regulated data classification, and the redaction policy across the LLM training, fine-tune, retrieval, and observability surfaces.

Feature

Code scanning

Semgrep-backed SAST and SCA across connected GitHub, GitLab, and Bitbucket repositories. Findings surface at the prompt construction site, the fine-tuning pipeline, the retrieval authorisation gap, and the observability redaction gap.

Compliance impact

Track LLM02 sensitive information disclosure findings end to end

SecPortal records LLM02 findings against the AI feature, attaches the extraction prompt, the model response, the leaked content category, and the upstream data source as evidence, generates AI-assisted writeups, and tracks the fix through retest. Start for free.

No credit card required. Free plan available forever.