← Benteng/case studies
CVE-2025-66516OWASP A01 · Broken Access Control (SSRF)CWE-611 XML External EntityHigh

Apache Tika — unauthenticated SSRF / XXE

A crafted document parsed by Tika made the server fetch attacker-chosen URLs and read local files.

What happened

Apache Tika extracts text from documents server-side. A 2025 flaw let an unauthenticated attacker submit a document whose XML declared an external entity, so the parser reached out to internal URLs (SSRF — e.g. the cloud metadata endpoint) or read local files (XXE). Because Tika runs inside ingestion pipelines, one uploaded file could pivot into the internal network.

The code

✕ VulnerableSSRF / XXE
// XML parser with external entities left ON
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
b.parse(userUpload);   // <!ENTITY x SYSTEM "http://169.254.169.254/..."> resolves
✓ FixedSSRF / XXE
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
f.setFeature("http://xml.org/sax/features/external-general-entities", false);
f.setXIncludeAware(false); f.setExpandEntityReferences(false);
// + allowlist egress so SSRF can't reach 169.254.169.254 / internal IPs
→ Detect this class with Scan a site (Benteng's own fetch is SSRF-guarded the right way)

References

Educational case study. The "vulnerable" snippet is a minimal teaching example, not a working exploit. Benteng · a Palu Gada tool.