Some notes about XML

XML document

XML declaration

<?xml version='1.0' enconding='ASCII' standalone='yes' ?>

  • set version as 1.0;
  • encoding specifies the charset to use (defaults to UTF-8);
  • standalone tells if there's an external DTD to be read.

XML declaration is not required, but if it present it must be the first line of the document.

Comments

<!-- comment -->

Processing instructions

<? target and parameters ?>

Processing instructions pass information to the parser of the document.

Tags

<tag> ... </tag>

<tag/>

  • exactly one root element exists;
  • can exist a parent/children relationship (one to many) between the tags;
  • can contain mixed content (text and other tags).

<tag attribute = "value"> ... </tag>

Tags can have attributes; values must be quoted.

<![CDATA[ raw text ]]>

Can contain CDATA sections for raw content.

Document Type Definition

DTD lists all elements, attributes and entities the document uses and the contexts in which they are used.

Can be included in the prolog of the XML document as an external resource:

<!DOCTYPE <element> SYSTEM "<url of the DTD>">

or inline:

<!DOCTYPE person [
<!ELEMENT ... >
<!ELEMENT ...>
...
]>

DTD inclusion must be after the XML declaration and before the root element.

Elements

<!ELEMENT <name> (content)>

Available content types

  • #PCDATA

    Parsed character data.

  • Child element

    Another element or an instance of the same element.

  • Sequence

    A list of two or more comma separated children. Children can have modification suffixes (?, *, + with regexp like meaning and choices are supported (| meaning OR). () parenthesis are supported as well.

    The list is ordered.

  • Mixed content

    <!ELEMENT <name> (#PCDATA|child 1|...|child n)*>

    Text and child elements are mixed.

    #PCDATA must come first in the mixed content declaration. child 1, …, child n are the elements and each element must have its own definition within the DTD. The operator (*) must follow the mixed content declaration if children elements are included. The #PCDATA and children element declarations must be separated by the | operator.

  • Nothing (empty)

    <!ELEMENT <name> EMPTY>

    An empty element has only attributes.

  • Anything

    <!ELEMENT <name> ANY>

Attributes

<!ATTLIST <element> <attribute> <type> <default>>

Multiple attributes can be declared in a single ATTLIST.

Type

  • CDATA

    Any string of text.

  • NMTOKEN

    XML name token.

  • NMTOKENS

    A space separated list of one or more NMTOKEN.

  • Enumeration

    A | separated list of all possibile values for the attribute.

  • ID

    An XML name unique within the XML document.

  • IDREF

    An XML name that refers to the ID attribute of another element.

  • IDREFS

    A space separated list of one or more IDREF.

  • ENTITY

    The name of an unparsed entity declared elsewhere in the DTD.

  • ENTITIES

    A space separated list of one or more ENTITY.

  • NOTATION

    The name of a notation declared in the document DTD.

Default

  • #IMPLIED

    Optional attribute.

  • #REQUIRED

    Required attribute.

  • #FIXED

    Constant and immutable.

  • <literal>

    Default value given as a quoted string.

Entities

<!ENTITY <name> (value)>

Defined in the DTD, is referenced as: &<entity name>. Value will be replace only in the XML document.

<!ENTITY <name> SYSTEM (URL)>

Included and parsed from external URL.

Non parsed entities

<!ENTITY <name> SYSTEM (URL) NDATA (notation)>

NDATA specifies a notation defined elsewhere in the DTD:

<!NOTATION <notation> SYSTEM (identifier)>

These entites are included using their name (without &).

Parameter entities

<!ENTITY % <name> (value)>

Referenced as: %<name>, value will be replaced only in the DTD. Can be redefined (but internal DTD definition takes precedence).

Can be included from an external DTD:

<!ENTITY % <name> SYSTEM (URL)>

Namespaces

Namespaces distinguish between elements with the same name but different meanings, by assigning a unique URI to each element. Since the URI is just a formal identifier, it doesn't need to be a valid one.

Prefixes

xmlns:<prefix>="<URI>"

Used because URIs seldomly are valid XML names. Usually declared in the root element for convenince. The DTD, if used, must declare elements along with their prefix.

Usage: <<prefix>:element ... >

Default namespace

Defined by attaching an xmlns attribute with no prefix to the top element; all unprefixed descendants will be part of that namespace.

Internationalization

<?xml version="1.0" standalone="yes" encoding="<encoding>"?>

The encoding declaration tells in which character set the document is written.

Text declaration

<?xml version="1.0" encoding="<encoding>"?>

Used in document fragments with external parsed entities.

xml:lang

Used at element level, specifies the enconding of the element; must be declared as attribute.

XPath

Location paths

Identify a set of nodes in the document.

Root node

<xsl:template match="/">

Selects the root node of the document.

Child element

<xsl:template match="[name]">

Selects all child elements of the node with a specified [name].

Attribute

<xsl:value-of select="@[attribute]">

Selects the [attribute] value of the current node.

Comment

<xsl:template match="comment()">

Select any comment or text node child of the current node. Each comment is a separate node.

Text

<xsl:template match="text()">

Text node contain the maximum continguous text.

Processing-instructions

<xsl:template match="processing-instruction([target])">

Selects all processing instructions children of the current node. target is optional.

Wildcards

** | [namespace]:*

Matches any node regardless of name. Does not match attributes, processing-instructions, text or comment nodes.

*node()

Matches any node regardless of name and type.

*@*

Matches all attribute nodes.

Multiple matches

Combine location paths using | (or)

Compound location paths

  • Forward slash /

    Creates a location paths in a filesystem like fashion. If the paths starts with / is absolute, otherwise relative to the context node.

  • Double forward slash //

    Selects the context node and all its descendants.

  • Doble period ..

    Selects the parent of the current node.

  • Single period .

    Selects the context node.

Predicates

< ... /.../node[attribute='value']/.../>

Selects the node(s) whose attribute match 'value'.Supports the usual set of relational operators.

Unabbreviated location paths

Allow a more fine grained selection of nodes:

[axis]::[node]

axis can be:

  • self
  • ancestor
  • following-sibling
  • preceding-sibling
  • following
  • preceding
  • namespace
  • descendant
  • anchestor-or-self

General expressions

Return specific values.

  • Numbers

    Basic arithmetic operators: +, -, *, /, div, mod

  • Strings

    Comparison operators: =, !=

  • Booleans

    Created with true(), false() and not() functions or created by comparison of other objects.

  • Functions

    Operate on node sets.

X-Links

Simple link

< ...
xlink:type = "simple"
xlink:href = <URI>
>

Defines a one way connection between the document and another resource. URI need not to be an URL.

Semantic

xlink:title

Describes the remote resource.

xlink:role

Contains an URI that indicates the meaning of the link.

Behaviour

xlink:show

In which context the resource should be displayed (new, replace, embed, other, none).

xlink:actuate

When the link should be followed (onLoad, onRequest, other, none).

Extended links

< ... xlink:type = "extended">

A collection of resources and a collection of paths between them.

Locators

< ...
xlink:type = "locator"
xlink:href = <URI>
>

Locate a particular resource and provides additional semantic attributes: xlink:label, xlink:title, xlink:role

Arcs

< ...
xlink:type = "arc"
xlink:from = <xlink:label>
xlink:to = <xlink:label>
>

Represent a path between two resources: xlink:from and xlink:to are the endpoints, xlink:label matches the label associated to one of the locators of the extended link.

Provide extended attributes:

  • title: human readable arc description;
  • arcrole: an absolute URI identifying the nature of the arc.

Local resources

< ...
xlink:type = "resource"
xlink:label = <xlink:label>
>

Represents a resource contained inside the extended link; xlink:label matches the label of the locator in the extended link.

Titles

<... xlink:type = "title">

Provide a title to the whole extended link.

X-Links & DTDs

All xlink attributes used must be declared in a DTD.

Base URL

xml:base = "<URL>"

Defines a base URL for relative URIs. URL can be relative and resolved against the base URL of the containing entity.

XPointers

Identify a location inside an XML document.

XPointers in URLs

xpointer(<XPath expression>)

Use one xpointer after the other(s) to specify a backup location. Not necessarily refers to a single element.

XPointers in Links

Point only to XML documents ans can be used in internal links as well.

Shorthand pointers

...#<id>

<id> is an attribute declared to have an ID type in the document DTD.

Child sequences

xpointer(/child::*[position( ) = 1]/child::*[ position( ) = 2]/child::*[ position( ) = 3])

Selects the third child of the second child of the root element of the document.

Namespaces

xmlns(...)[,xmlns(...),...] xpointer(...)

Namespace prefixes to bind must be specified before the XPointer part. Xpointer handles namespaces on its own.

Points

Zero dimensional locations inside a node.

xpointer(start-point(//<node>))

Identifies the first point inside a node (after the > character of the node's tag).

xpointer(end-point(//<node>))

Identifies the last point inside a node (before the < character of the node's tag).

Ranges

A span of parsed characters between two points.

range()

Takes an XPointer expression that returns a location set and returns a range exactly covering the location (tags included, one range for each location in the set).

range-inside()

Takes an XPointer expression that returns a location set and returns a range exactly covering the location (tags excluded, one range for each location in the set).

range-to()

Evaluated in respect to a context node, takes one location and returns one or more ranges. start-point is the starting point of the context node, end-point is the ending point of the argument.

string-range()

Operates on the text of the document (tags are stripped); takes an XPointer expression indentifying locations and a substring to match against the text of the location. Returns one range for each non-overlapping match exactly covering the matched string.

Relative XPointers

here()

Refers to the node containing the XPointer or the element containing the node if the node is a text node.

origin()

Refers to the node from which the user started traversal.

XInclude

Include element

<xi:include>...</xi:include>

Attributes

href

Points to the document to include. The document is assumed to be well-formed.

parse="text"

Allows inclusion of a plain text document (thus not well-formed).

encoding="<encoding>"

Specifies the encoding of the document - defaults to UTF-8.

accept="<MIME type>"

Specifies the accepted MIME type of the document.

accept-language="<lang>"

Specifies the accepted language for the document.

xpointer="<xpointer>"

Allowed only when parsing xml documents (parse="xml"), indicates which part of the document referenced by the href attribute should be included (if href is absent refers to the current document).

Fallback

<xi:fallback>...</xi:fallback>

Alternate content if the document can't be loaded (only one fallback is allowed).


© Alessandro Dotti Contra :: VAT # IT03617481209 :: This site uses no cookies, read our privacy policy for more information.