XML Information Set

[Updated: May 25, 2017, Created: Jul 7, 2016]

The contents of an XML document can be described using an abstract data set known as XML information set or XML infoset.

Or in other words an XML infoset is an abstract set of concepts such as attributes and elements that can be used to describe an XML document.


Infoset is W3C standard

W3C Specifications.


Why do we need infoset?

It is helpful to explain or understand an XML document in terms of infoset. The definitions in the XML Information Set specification are meant to be used in an XML document or even in other specifications that need to refer to the information in a well-formed XML document

Every XML documents which is well-formed has an information set. The document doesn't necessarily require to be valid in order to have infoset.


What Infoset consists of?

An information set can be following information items:

  1. The document information item: represents the entire XML document.

  2. Tags/nodes/elements: e.g. <myTag>....</myTag>

  3. Tags attributes: e.g. <myTag someAttribute="attributeValue">....</myTag>

  4. Processing instruction information: intended to carry instructions to the application. An XML processing instruction is enclosed within <? and ?> e.g.
    <?xml-stylesheet type="text/xsl" href="style.xsl"?>
    <?php echo $someStr;?>

    Note that the XML Declaration at the beginning of an XML document (shown below) is not a processing instruction even though it has the similar syntax:
    <?xml version="1.0" encoding="UTF-8" ?>

  5. Unexpanded Entity Reference Information Items: It serves as a placeholder by which an XML processor can indicate that it has not expanded an external parsed entity.
    The XML standard includes the idea of an an external entity.
    For any XML parser that doesn't read all external entities, possibly because it was configured not to do so or because it didn't choose to implement that feature, we need to indicate that when an entity that would normally be parsed wasn't actually processed.

  6. Character Information Items: Each data character that appears in the document, whether literally, as a character reference, or within a CDATA section . Each character is also considered to have a character code (ISO 10646) property.

  7. Comment Information Items: Each XML comment, except for those appearing in the DTD (which are not represented).

  8. The Document Type Declaration Information Item: These items are DTD declarations.

  9. Unparsed Entity Information Items: An unparsed entity is the one which is not parsed by the parser.

  10. Notation Information Items: NOTATIONS are used to identify the format of unparsed entities. An example here

  11. Namespace Information Items: XML namespaces information items provide a way to avoid element name conflicts.

    Each element in the document can have a namespace information item for each namespace that is in scope for that element.


See Also