Close

XML basics

[Last Updated: Jul 31, 2018]

XML stands for Extensible Markup Language.


How it is Markup?

Markup because it's a way to describe named metadata or nodes or tags. The tags along with zero or more attributes are written in a hierarchical textual structure. It can easily be processed by any programming language.

Tags are words bracketed by the < and > characters and attributes are strings of the form name="value" that are inside of tags.

<myDoc>
<author>Joe</author>
<date>2016-07-01</date>
......
</myDoc>

Above example contains information of author and date of update of a document.

Instead of describing the information into sentences or some other kind of presentation format, XML describes information in more 'granular' form, that means discrete information parts can be used in whatever way a receiver or client wants to use them or present them.


How it is Extensible?

By default no standard tags exist. We have to define our own tags to describe the intended document. It that sense it's extensible because we extend the idea of well formed structure to describe the data we are interested in.



XML is W3C standard

Current version of W3C specification is XML 1.0

We should use version 1.1, if the document contains non-ASCII characters.


Well Formed XML Document

An XML document is well formed if it has all of the followings:

  • A root element.
  • All elements are with closing tag.
  • Case sensitive tags.
  • Properly nested elements.
  • Quoted attribute values.
  • No attribute may appear more than once within the same element.

Valid XML Document

A "valid" XML document must be well formed and it must conform to:

  • DTD - Document Type Definition OR
  • XML Schema - An XML-based alternative to DTD (recommended)

XML declaration

Ideally an XML documents should begin with an optional XML declaration which specifies the version of XML being used.

<?xml version="1.0" encoding="UTF-8"?>

Comments in XML document

Example:

<!--  all characters are allowed in comment except for two or more consecutive dashes -->

CDATA sections

They are used to escape blocks of text containing characters which would otherwise be recognized as markup.

Example

<description>
  <![CDATA[ a record or some event or thing where date created < 2010 ]]>
</description>

XML is platform independent

XML defines a platform-independent data format.

It's a way to write portable documents which can be transported between different types of machines or store somewhere e.g. in databases.


XML vs HTML

HTML has both data and representation logic. XML typically has only data, it's not designed to display data but some extension can use it to render data as well e.g. XHTML (alternative to HTML and has to be well formed per XML rules). Also HTML can be easy to ignore well formed tags and still acceptable by the browsers but XML document has to be well formed.

XML, generally speaking, has pieces of data and leaves the interpretation of the data to the application that uses it. In other words, XML defines only the structure of the document and does not define any of the presentation semantics of that document.


Where XML is used?

Exchanging/sharing/storing data between different software systems or different machine architecture.

It's is used by a parser or software program to consume it locally/remotely during runtime and apply some business or programming logic based on the tagged data and attributes it contains. In that sense it acts like a Macro .


XML brief History

XML is based on SGML (Standard Generalized Markup Language) which was developed in early 1980s prior to the rise of the Internet and became an ISO standard in 1986. SGML has been widely used for the projects. The designers of XML took the best parts of SGML, used their experience as a guide and produced a technology that is just as powerful as SGML, but much simpler and easier to use. W3C activities regarding XML started in mid 1990s. XML 1.0 became a W3C Recommendation in 1998.

See Also