- What is xml?
"Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. " - - Wikipedia
It is basically used to store and transport data in a structured way. The XML data is also known as self-describing or self-defining, meaning that the structure of the data is embedded within the data itself, which could be easily parsed by a parser and also human readable. The example of XML file is:
<?xml version="1.0" encoding="UTF-8"?> <office> <employee> <name>John Snow</name> <age>30</age> <salary>5000</salary> </employee> <employee> <name>Bruce Wayne</name> <age>45</age> <salary>10000</salary> </employee> <office>
The above is a typical example of xml file, which contents details of an employee.
- The first line is declaration of xml file.
is the root element, is the child element and , are the sub-child elements, and these elements content values.
The XML structure looks like this :
is the root element.
XML Attributes :
An attribute contains a value related to a particular element or tag. For example :
XML DTD :
An XML DTD ( Document Type Declaration ) is used to check the validity of the structure of XML documents against certain criteria. In other words the DTD defines the structure and legal elements and attributes of an XML document. It basically acts as validation template, which some criteria and rules for certain XML document. The example of DTD structure is as follows :
<root> <child> <subchild>.....</subchild> </child> </root>The xml file contains different elements or tags which contains different types of values. The element tags are case sensitive, so the starting tag is same as closing tag. And an XML document contains only single root element, at above example
XML Attributes :
An attribute contains a value related to a particular element or tag. For example :
<?xml version="1.0" encoding="UTF-8"?> <office> <employee id="001"> <name>John Snow</name> <age>30</age> <salary>5000</salary> </employee> <employee id="002"> <name>Bruce Wayne</name> <age>45</age> <salary>10000</salary> </employee> <office>At above example the element employee has attribute 'id' with value. Also note that the attribute value must be quoted with either single quote or double quote.
XML DTD :
An XML DTD ( Document Type Declaration ) is used to check the validity of the structure of XML documents against certain criteria. In other words the DTD defines the structure and legal elements and attributes of an XML document. It basically acts as validation template, which some criteria and rules for certain XML document. The example of DTD structure is as follows :
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book [ <!ELEMENT book (name,author,price)> <!ELEMENT name (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT price (#PCDATA)> ]> <book> <name>The Art Of War</name> <author>Sun Tzu</author> <price>350</price> </book>The above xml document contains a DTD, which defines of the structure of the document. Where :
- : denotes the root element should be book.
- : denotes that the name, author and price should be child element of root element book.
- : denotes that the name element must be PCDATA type, where PCDATA means Parsed Character Data. And This is similar for the below two tags author and price.
There are two types of DTD :
Internal DTD :
When the structure of DTD is defined alongside the xml file, then it is called internal DTD. The example we saw above is the internal DTD. Lets see another example of internal DTD.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE office [ <!ELEMENT office (employee+)> <!ELEMENT employee (name, age, salary)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT salary (#PCDATA)> ]> <office> <employee> <name>John Snow</name> <age>30</age> <salary>5000</salary> </employee> <employee> <name>Bruce Wayne</name> <age>45</age> <salary>10000</salary> </employee> <office>
External DTD :
When the structure of DTD is defined in a saparate file, then it is called internal DTD. For example :
File : xml.dtd
<!ELEMENT office (employee+)> <!ELEMENT employee (name, age, salary)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT salary (#PCDATA)>File : doc.xml
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE office SYSTEM "http://example.com/files/xml.dtd"> <office> <employee> <name>John Snow</name> <age>30</age> <salary>5000</salary> </employee> <employee> <name>Bruce Wayne</name> <age>45</age> <salary>10000</salary> </employee> <office>In the doc.xml file the URI of dtd file is declared.
XML Entities :
Entities are nothing but the place-holders in XML and used to define shortcuts for special characters or strings. The entity is declared as :
<!ENTITY entity-name "entity-value">
Example : <!ENTITY name "John Snow">
In xml document an entity starts with an '&', and end with ';', and between them the name of entity takes place. For example : <employee>&name;</employee>
Example :<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE employee [ <!ENTITY name "John Snow"> ]> <employee>&name;</employee>At above example in the place of &name;, the string "John Snow" takes place. It was an example of internal entity.
In external entity, the entity declaration is saved on external dtd file, for example xml.dtd, then it will be accessed in the xml file as follows :
<!ENTITY % entity-name SYSTEM "URI/URL">
Note that there are '%' sign is used before the entity name, and now again we need to use the %entity-name; , at the dtd declaration with the xml file. For example :<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE rootElem [ <!ENTITY %dtd SYSTEM "http://example.com/files/xml.dtd"> %dtd; ]>Now we can use the entity with-in xml code. Now look at the below example :
xml.dtd file :
<!ENTITY name "John Snow">doc.xml file :
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE employee [ <!ENTITY %dtd SYSTEM "http://example.com/files/xml.dtd"> %dtd; ]> <employee>&name;</employee>At here the XML parser first fetch the xml.dtd file from the external web server, which is caused by line 3, and line 4 which is "%dtd" implements the content of xml.dtd on the xml file then it use the &name; entity. Also note that this particular functionality is used in various xml attacks, which we are going to look next.
The below table contains common predefined entities used in XML
Entity-name | Character | Decimal reference | Hexadecimal reference |
---|---|---|---|
quot | " | " | " |
amp | & | & | & |
apos | ' | ' | ' |
lt | < | < | < |
gt | > | > | > |
modulo | % | % | % |
Writing A simple XML Parser in PHP :
The code of XML parser is as follows :
<html> <head> <title>XML Parser</title> </head> <body> <h2>Enter XML Code Here :</h2> <form action="" method="post"> <textarea name="xml" rows="10" cols="50"></textarea><br/> <input type="submit" value="Submit"/> </form> <?php if(!empty($_POST["xml"])) { libxml_disable_entity_loader (false); $xmlfile = $_POST["xml"]; $dom = new DOMDocument(); $dom->loadXML($xmlfile, LIBXML_NOENT | LIBXML_DTDLOAD); $book = simplexml_import_dom($dom); $name = $book->name; $author = $book->author; $price = $book->price; echo "Book Details<br/><br/>Name : $name<br/>Author : $author<br/>Price : $price/-"; } ?> </body> </html>At above libxml_disable_entity_loader (false); is used to enable the external entity loading. The above code Parse xml with below structure :
<?xml version="1.0" encoding="UTF-8"?> <book> <name>The Art Of War</name> <author>Sun Tzu</author> <price>350</price> </book>
XML Document with internal entity
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book [ <!ENTITY nm "The Art Of War"> <!ENTITY aut "Sun Tzu"> ]> <book> <name>&nm;</name> <author>&aut;</author> <price>350</price> </book>XML Document with External entity
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE book [ <!ENTITY % dtd SYSTEM "http://serveraddress/xml.dtd"> %dtd; ]> <book> <name>&nm;</name> <author>&aut;</author> <price>≺</price> </book>xml.dtd file
<!ENTITY name "The Art Of War"> <!ENTITY aut "Sun Tzu"> <!ENTITY pr "350">
Codes used above can be downloaded form here : Github_link
Conclusion :
These are some of the basics of XML and its Entities (both Internal/External). In the next post we will look at different types of attack methods and exploitation techniques on XML.
XML Attacks Part 2 : XXE (Xml eXternal Entity ) Attack
XML Attacks Part 3 : Denial Of Service Attacks
XML Attacks Part 4 : Out Of Bound Attacks
Visit the link for more tutorials about Web Security : http://www.sec-art.net/p/web-security.html