relax-ng message

Subject: Updated tutorial

From: James Clark <jjc@jclark.com>

To: RELAX NG Mailing List <relax-ng@lists.oasis-open.org>

Date: Fri, 01 Jun 2001 14:27:25 +0700

I've updated the tutorial to match the decisions we took at yesterday's tutorial. James

Title: RELAX NG Tutorial

RELAX NG Tutorial

Working Draft 1 June 2001

This version:: Working Draft: 1 June 2001

Editor:: James Clark <jjc@jclark.com>

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Abstract

RELAX NG is a simple schema language for XML, based on RELAX and TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema thus identifies a class of XML documents consisting of those documents that match the pattern. A RELAX NG schema is itself an XML document.

Status of this Document

This is a working draft constructed by the editor. It is not an official committee work product and may not reflect the consensus opinion of the committee. Comments on this document may be sent to relax-ng-comment@lists.oasis-open.org.

1 Getting started

2 Choice

7 Lists

9.1 Referencing external patterns
9.2 Merging grammars

10 Namespaces

10.1 Using the namespace attribute
10.2 Qualified names

11 Name classes

12 Cross references

13 Annotations

14 Nested grammars

15 Non-restrictions

16 Non-features

17 Differences from TREX

1. Getting started

Consider a simple XML representation of an email address book:

<addressBook>
  <card>
    <name>John Smith</name>
    <email>js@example.com</email>
  </card>
  <card>
    <name>Fred Bloggs</name>
    <email>fb@example.net</email>
  </card>
</addressBook>

The DTD would be as follows:

<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>

A RELAX NG pattern for this could be written as follows:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

If the addressBook is required to be non-empty, then we can use oneOrMore instead of zeroOrMore:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9">
  <oneOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </oneOrMore>
</element>

Now let's change it to allow each card to have an optional note element.

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
      <optional>
	<element name="note">
	  <text/>
	</element>
      </optional>
    </element>
  </zeroOrMore>
</element>

Note that the text pattern matches arbitrary text, including empty text. Note also that whitespace separating tags is ignored when matching against a pattern.

All the elements specifying the pattern must be namespace qualified by the namespace URI:

http://relaxng.org/ns/structure/0.9

The examples above use a default namespace declaration xmlns="http://relaxng.org/ns/structure/0.9" for this. A namespace prefix is equally acceptable:

<rng:element name="addressBook" xmlns:rng="http://relaxng.org/ns/structure/0.9">
  <rng:zeroOrMore>
    <rng:element name="card">
      <rng:element name="name">
        <rng:text/>
      </rng:element>
      <rng:element name="email">
        <rng:text/>
      </rng:element>
    </rng:element>
  </rng:zeroOrMore>
</rng:element>
</rng:div>

For the remainder of this document, the default namespace declaration will be left out of examples.

2. Choice

Now suppose we want to allow the name to be broken down into a givenName and a familyName, allowing an addressBook like this:

<addressBook>
  <card>
    <givenName>John</givenName>
    <familyName>Smith</familyName>
    <email>js@example.com</name>
  </card>
  <card>
    <name>Fred Bloggs</name>
    <email>fb@example.net</email>
  </card>
</addressBook>

We can use the following pattern:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <choice>
        <element name="name">
          <text/>
        </element>
        <group>
          <element name="givenName">
            <text/>
          </element>
          <element name="familyName">
            <text/>
          </element>
        </group>
      </choice>
      <element name="email">
        <text/>
      </element>
      <optional>
	<element name="note">
	  <text/>
	</element>
      </optional>
    </element>
  </zeroOrMore>
</element>

This corresponds to the following DTD:

<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card ((name | (givenName, familyName)), email, note?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT givenName (#PCDATA)>
<!ELEMENT familyName (#PCDATA)>
<!ELEMENT note (#PCDATA)>
]>

3. Attributes

Suppose we want the card element to have attributes rather than child elements. The DTD might look like this

<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card EMPTY>
<!ATTLIST card
  name CDATA #REQUIRED
  email CDATA #REQUIRED>
]>

Just change each element pattern to an attribute pattern:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <attribute name="name">
        <text/>
      </attribute>
      <attribute name="email">
        <text/>
      </attribute>
    </element>
  </zeroOrMore>
</element>

In XML, the order of attributes is traditionally not significant. RELAX NG follows this tradition. The above pattern would match both

<card name="John Smith" email="js@example.com"/>

and

<card email="js@example.com" name="John Smith"/>

In contrast, the order of elements is significant. The pattern

<element name="card">
  <element name="name">
    <text/>
  </element>
  <element name="email">
    <text/>
  </element>
</element>

would not match:

<card><email>js@example.com</email><name>John Smith</name></card>

Note that an attribute element by itself indicates a required attribute, just as an element element by itself indicates a required element. To specify an optional attribute, use optional just as with element:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <attribute name="name">
        <text/>
      </attribute>
      <attribute name="email">
        <text/>
      </attribute>
      <optional>
        <attribute name="note">
          <text/>
        </attribute>
      </optional>
    </element>
  </zeroOrMore>
</element>

The group and choice patterns can be applied to attribute elements in the same way they are applied to element patterns. For example, if we wanted to allow either a name attribute or both a givenName and a familyName attribute, we can specify this in the same way that we would if we were using elements:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <choice>
        <attribute name="name">
          <text/>
        </attribute>
        <group>
          <attribute name="givenName">
            <text/>
          </attribute>
          <attribute name="familyName">
            <text/>
          </attribute>
        </group>
      </choice>
      <attribute name="email">
        <text/>
      </attribute>
    </element>
  </zeroOrMore>
</element>

There are no restrictions on how element elements and attribute elements can be combined. For example, the following pattern would allow a choice of elements and attributes independently for both the name and the email part of a card:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <choice>
	<element name="name">
	  <text/>
	</element>
	<attribute name="name">
	  <text/>
	</attribute>
      </choice>
      <choice>
	<element name="email">
	  <text/>
	</element>
	<attribute name="email">
	  <text/>
	</attribute>
      </choice>
    </element>
  </zeroOrMore>
</element>

As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:

<card name="John Smith" email="js@example.com"/>
<card email="js@example.com" name="John Smith"/>
<card email="js@example.com"><name>John Smith</name></card>
<card name="John Smith"><email>js@example.com</email></card>
<card><name>John Smith</name><email>js@example.com</email></card>

However, it would not match

<card><email>js@example.com</email><name>John Smith</name></card>

because the pattern for card requires any email child element to follow any name child element.

There is one difference between attribute and element patterns: <text/> is the default for the content of an attribute pattern, whereas an element pattern is not allowed to be empty. For example,

<attribute name="email"/>

is short for

<attribute name="email">
  <text/>
</attribute>

It might seem natural that

<element name="x"/>

matched an x element with no attributes and no content. However, this would make the meaning of empty content inconsistent between the element pattern and the attribute pattern, so RELAX NG does not allow the element pattern to be empty. A pattern that matches an element with no attributes and no children must use <empty/> explicitly:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
      <optional>
        <element name="prefersHTML">
          <empty/>
        </element>
      </optional>
    </element>
  </zeroOrMore>
</element>

4. Named patterns

For a non-trivial RELAX NG pattern, it is often convenient to be able to give names to parts of the pattern. Instead of

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
	<text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

we can write

<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <ref name="cardContent"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="cardContent">
    <element name="name">
      <text/>
    </element>
    <element name="email">
      <text/>
    </element>
  </define>

</grammar>

A grammar element has a single start child element, and zero or more define child elements. The start and define elements contain patterns. These patterns can contain ref elements that refer to patterns defined by any of the define elements in that grammar element. A grammar pattern is matched by matching the pattern contained in the start element.

We can use the grammar element to write patterns in a style similar to DTDs:

<grammar>

  <start>
    <ref name="AddressBook"/>
  </start>

  <define name="AddressBook">
    <element name="addressBook">
      <zeroOrMore>
        <ref name="Card"/>
      </zeroOrMore>
    </element>
  </define>

  <define name="Card">
    <element name="card">
      <ref name="Name"/>
      <ref name="Email"/>
    </element>
  </define>

  <define name="Name">
    <element name="name">
      <text/>
    </element>
  </define>

  <define name="Email">
    <element name="email">
      <text/>
    </element>
  </define>

</grammar>

Recursive references are allowed. For example

<define name="inline">
  <zeroOrMore>
    <choice>
      <text/>
      <element name="bold">
        <ref name="inline"/>
      </element>
      <element name="italic">
        <ref name="inline"/>
      </element>
      <element name="span">
        <optional>
          <attribute name="style"/>
        </optional>
        <ref name="inline"/>
      </element>
    </choice>
  </zeroOrMore>
</define>

However, recursive references must be within an element. Thus, the following is not allowed:

<define name="inline">
  <choice>
    <text/>
    <element name="bold">
      <ref name="inline"/>
    </element>
    <element name="italic">
      <ref name="inline"/>
    </element>
    <element name="span">
      <optional>
	<attribute name="style"/>
      </optional>
      <ref name="inline"/>
    </element>
  </choice>
  <optional>
    <ref name="inline"/>
  </optional>
</define>

A start element may also have a name attribute. This is a shorthand for a define with that name together with a start element referencing that definition. For example

<grammar>
  <start name="inline">
    <zeroOrMore>
      <choice>
	<text/>
	<element name="bold">
	  <ref name="inline"/>
	</element>
      </choice>
    </zeroOrMore>
  </start>
</grammar>

is short for

<grammar>
  <start>
    <ref name="inline"/>
  </start>
  <define name="inline">
    <zeroOrMore>
      <choice>
	<text/>
	<element name="bold">
	  <ref name="inline"/>
	</element>
      </choice>
    </zeroOrMore>
  </define>
</grammar>

5. Datatyping

RELAX NG allows patterns to reference externally-defined datatypes, such as those defined by W3C XML Schema Part 2. RELAX NG implementations may differ in what datatypes they support. You must use datatypes that are supported by the implementation you plan to use.

The data pattern matches a string that represents a value of a named datatype. The datatypeNamespace attribute contains a URI identifying the collection of datatypes being used. The datatype collection defined W3C XML Schema Part 2 would be identified by the URI http://www.w3.org/2001/XMLSchema-datatypes. The type attribute specifies the name of the datatype in the collection identified by the datatypeNamespace attribute. For example, if a RELAX NG implementation supported the built-in datatypes of W3C XML Schema Part 2, you could use:

<element name="number">
  <data type="integer" datatypeNamespace="http://www.w3.org/2001/XMLSchema-datatypes"/>
</element>

It is inconvenient to specify the datatypeNamespace attribute on every data element, so RELAX NG allows the datatypeNamespace attribute to be inherited. The datatypeNamespace attribute can be specified on any RELAX NG element. If a data element does not have a datatypeNamespace attribute, it will use the value from the closest ancestor that has a datatypeNamespace attribute. Typically, the datatypeNamespace attribute is specified on the root element of the RELAX NG pattern. For example:

<element name="point" datatypeNamespace="http://www.w3.org/2001/XMLSchema-datatypes">
  <element name="x">
    <data type="double"/>
  </element>
  <element name="y">
    <data type="double"/>
  </element>
</element>

If the children of an element or an attribute match a data pattern, then complete content of the element or attribute must match that data pattern. It is not permitted to have a pattern which allows part of the content to match a data pattern, and another part to match another pattern. For example, the following pattern is not allowed:

<element name="bad">
  <data type="int"/>
  <element name="note">
    <text/>
  </element>
</element>

However, this would be fine:

<element name="ok">
  <data type="int"/>
  <attribute name="note">
    <text/>
  </attribute>
</element>

Note that this restriction does not apply to the text pattern.

Datatypes may have parameters. For example, a string datatype may have a parameter controlling the length of the string. The parameters applicable to any particular datatype are determined by the datatyping vocabulary. Parameters are specified by adding one or more param elements as children of the data element. For example, the following constrains the email element to contain a string at most 127 characters long:

<element name="email">
  <data type="string">
    <param name="maxLength">127</param>
  </data>
</element>

6. Enumerations

Many markup vocabularies have attributes whose value is constrained to be one of set of specified values. The value pattern matches a string that has a specified value. For example,

<element name="card">
  <attribute name="name"/>
  <attribute name="email"/>
  <attribute name="preferredFormat">
    <choice>
      <value>html</value>
      <value>text</value>
    </choice>
  </attribute>
</element>

allows the preferredFormat attribute to have the value html or text. This corresponds to the DTD

<!DOCTYPE card [
<!ELEMENT card EMPTY>
<!ATTLIST card
  name CDATA #REQUIRED
  email CDATA #REQUIRED
  preferredFormat (html|text) #REQUIRED>
]>

The value pattern is not restricted to attribute values. For example, the following is allowed:

<element name="card">
  <element name="name">
    <text/>
  </element>
  <element name="email">
    <text/>
  </element>
  <element name="preferredFormat">
    <choice>
      <value>html</value>
      <value>text</value>
    </choice>
  </element>
</element>

The prohibition against a data pattern's matching only part of the content of an element also applies to value patterns.

By default, the value pattern will consider the string in the pattern to match the string in the document if the two strings are the same after the whitespace in both strings is normalized. Whitespace normalization strips leading and trailing white-space characters, and collapses sequences of one or more white-space characters to a single space character. This corresponds to the behaviour of an XML parser for an attribute that is declared as other than CDATA. Thus the above pattern will match any of

<card name="John Smith" email="js@example.com" preferredFormat="html"/>
<card name="John Smith" email="js@example.com" prefersFormat="  html  "/>

The way that the value pattern compares the pattern string with the document string can be controlled by specifying a type attribute and optionally a datatypeNamespace attribute, which identify a datatype in the same way as for the data pattern. The pattern string matches the document string if they both represent the same value of the specified datatype. Thus, whereas the data pattern matches an arbitrary value of a datatype, the value pattern matches a specific value of a datatype.

If there is no ancestor element with a datatypeNamespace element, the datatype namespace defaults to the RELAX NG namespace. This provides two datatypes, string and token. The datatype token corresponds to the default comparison behavior of the value pattern. The datatype string compares strings without any normalization (other than that performed by XML). For example,

<element name="card">
  <attribute name="name"/>
  <attribute name="email"/>
  <attribute name="preferredFormat">
    <choice>
      <value type="string">html</value>
      <value type="string">text</value>
    </choice>
  </attribute>
</element>

will not match

<card name="John Smith" email="js@example.com" prefersHTML="  html  "/>

7. Lists

The list pattern matches a whitespace-separated sequence of tokens; it contains a pattern that the sequence of individual tokens must match. The list pattern splits a string into a list of strings, and then matches the resulting list of strings against the pattern inside the list pattern.

For example, suppose we want to have a vector element that contains two floating point numbers separated by whitespace. We could use list as follows:

<element name="vector">
  <list>
    <data type="float"/>
    <data type="float"/>
  </list>
</element>

Or suppose we want the vector element to contain a list of one or more floating point numbers separated by whitespace:

<element name="vector">
  <list>
    <oneOrMore>
      <data type="double"/>
    </oneOrMore>
  </list>
</element>

Or suppose we want a path element containing an even number of floating point numbers:

<element name="path">
  <list>
    <oneOrMore>
      <data type="double"/>
      <data type="double"/>
    </oneOrMore>
  </list>
</element>

The list pattern must not contain element or attribute patterns.

8. Interleaving

The interleave pattern allows child elements to occur in any order. For example, the following would allow the card element to contain the name and email elements in any order:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <interleave>
	<element name="name">
	  <text/>
	</element>
	<element name="email">
	  <text/>
	</element>
      </interleave>
    </element>
  </zeroOrMore>
</element>

The pattern is called interleave because of how it works with patterns that match more than one element. Suppose we want to write a pattern for the HTML head element which requires exactly one title element, at most one base element and zero or more style, script, link and meta elements and suppose we are writing a grammar pattern that has one definition for each element. Then we could define the pattern for head as follows:

<define name="head">
  <element name="head">
    <interleave>
      <ref name="title"/>
      <optional>
        <ref name="base"/>
      </optional>
      <zeroOrMore>
        <ref name="style"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="script"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="link"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="meta"/>
      </zeroOrMore>
    </interleave>
  </element>
</define>

Suppose we had a head element that contained a meta element, followed by a title element, followed by a meta element. This would match the pattern because it is an interleaving of a sequence of two meta elements, which match the child pattern

      <zeroOrMore>
        <ref name="meta"/>
      </zeroOrMore>

and a sequence of one title element, which matches the child pattern

      <ref name="title"/>

The semantics of the interleave pattern are that a sequence of elements matches an interleave pattern if it is an interleaving of sequences that match the child patterns of the interleave pattern. Note that this is different from the & connector in SGML: A* & B matches the sequence of elements A A B or the sequence of elements B A A but not the sequence of elements A B A.

One special case of interleave is very common: interleaving <text/> with a pattern p represents a pattern that matches what p matches but also allows characters to occur as children. The mixed element is a shorthand for this.

<mixed> p </mixed>

is short for

<interleave> <text/> p </interleave>

9. Modularity

9.1. Referencing external patterns

The externalRef pattern can be used to reference a pattern defined in a separate file. The externalRef element has a required href attribute that specifies the URL of a file containing the pattern. The externalRef matches if the pattern contained in the specified URL matches. Suppose for example, you have a RELAX NG pattern that matches HTML inline content stored in inline.rng:

<grammar>
  <start name="inline">
    <zeroOrMore>
      <choice>
        <text/>
        <element name="code">
          <ref name="inline"/>
        </element>
        <element name="em">
          <ref name="inline"/>
        </element>
        <!-- etc -->
      </choice>
    </zeroOrMore>
  </start>
</grammar>

Then we could allow the note element to contain inline HTML markup by using externalRef as follows:

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
      <optional>
	<element name="note">
	  <externalRef href="inline.rng"/>
	</element>
      </optional>
    </element>
  </zeroOrMore>
</element>

For another example, suppose you have two RELAX NG patterns stored in files pattern1.rng and pattern2.rng. Then the following is a pattern that which match anything matched by one of those patterns:

<choice>
  <externalRef href="pattern1.rng"/>
  <externalRef href="pattern2.rng"/>
</choice>

9.2. Merging grammars

The include element allows grammars to be merged together. A grammar pattern may have include elements as children. An include element has a required href attribute that specifies the URL of a file containing a grammar pattern. The referenced grammar pattern will be merged with the grammar pattern containing the include element.

@@@ Add example

Normally, duplicate definitions (two definitions with the same name) result in an error. However, define elements may be put inside the include element to indicate that they are to replace definitions in the included grammar pattern.

Suppose the file addressBook.rng contains the following grammar pattern:

<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <element name="name">
	    <text/>
	  </element>
	  <element name="email">
	    <text/>
	  </element>
          <ref name="card.local"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="card.local">
    <empty/>
  </define>

</grammar>

Another pattern could customize addressBook.rng as follows:

<grammar>

  <include href="addressBook.rng">

    <define name="card.local">
      <optional>
	<element name="note">
	  <text/>
	</element>
      </optional>
    </define>

  </include>

</grammar>

This would be equivalent to:

<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <element name="name">
	    <text/>
	  </element>
	  <element name="email">
	    <text/>
	  </element>
          <ref name="card.local"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="card.local">
    <optional>
      <element name="note">
	<text/>
      </element>
    </optional>
  </define>

</grammar>

which is equivalent to

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
	<text/>
      </element>
      <element name="email">
	<text/>
      </element>
      <optional>
	<element name="note">
	  <text/>
	</element>
      </optional>
    </element>
  </zeroOrMore>
</element>

It is also possible to combine together duplicate definitions from separate files by adding a combine attribute to the define elements. The combine attribute specifies how the definitions should be combined; it may have the value choice or interleave. For example, we could have written our customization as:

<grammar>

  <include href="addressBook.rng"/>

  <define name="card.local" combine="choice">
    <!-- no optional element needed this time -->
    <element name="note">
      <text/>
    </element>
  </define>

</grammar>

This would be equivalent to:

<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <element name="name">
	    <text/>
	  </element>
	  <element name="email">
	    <text/>
	  </element>
          <ref name="card.local"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="card.local">
    <choice>
      <empty/>
      <element name="note">
        <text/>
      </element>
    </choice>
  </define>

</grammar>

This has the same meaning as before, since an optional pattern is equivalent to a choice between the pattern and empty.

We could also have used combine="interleave" here:

<grammar>

  <include href="addressBook.rng"/>

  <define name="card.local" combine="interleave">
    <optional>
      <element name="note">
	<text/>
      </element>
    </optional>
  </define>

</grammar>

This would be equivalent to:

<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <element name="name">
	    <text/>
	  </element>
	  <element name="email">
	    <text/>
	  </element>
          <ref name="card.local"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="card.local">
    <interleave>
      <empty/>
      <optional>
        <element name="note">
	  <text/>
        </element>
      </optional>
    </interleave>
  </define>

</grammar>

This has the same meaning as before, since adding an empty pattern to the content of a interleave pattern does not make any difference to what the interleave pattern matches.

@@@ Add example of combine="interleave" with attributes.

The notAllowed pattern never matches anything. Just as adding empty to a group makes no difference, so adding notAllowed to a choice makes no difference. It is typically used in a definition that is referenced in a choice element to allow an including pattern to specify additional choices. For example, suppose a RELAX NG pattern inline.rng provides a pattern for inline content, which allows bold and italic elements arbitrarily nested:

<grammar>

  <start name="inline">
    <zeroOrMore>
      <choice>
	<text/>
	<element name="bold">
	  <ref name="inline"/>
	</element>
	<element name="italic">
	  <ref name="inline"/>
	</element>
        <ref name="local.inline"/>
      </choice>
    </zeroOrMore>
  </start>

  <define name="local.inline">
    <notAllowed/>
  </define>

</grammar>

Another RELAX NG pattern could use inline.rng and add code and em to the set of inline elements as follows:

<grammar>

  <include href="inline.rng">

    <define name="local.inline">
      <choice>
	<element name="code">
	  <ref name="inline">
	</element>
	<element name="em">
	  <ref name="inline">
	</element>
      </choice>
    </define>

  </include>  

  <start>
    <element name="doc">
      <zeroOrMore>
	<element name="p">
	  <ref name="inline"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

</grammar>

We could instead have used combine="choice". In this case, inline.rng would need to separate out the choices as a separate definition:

<grammar>

  <start name="inline">
    <zeroOrMore>
      <ref name="inline.class"/>
    </zeroOrMore>
  </start>

  <define name="inline.class">
    <choice>
      <text/>
      <element name="bold">
	<ref name="inline"/>
      </element>
      <element name="italic">
	<ref name="inline"/>
      </element>
    </choice>
  </define>

</grammar>

and the customization would add to those choices:

<grammar>

  <include href="inline.rng"/>

  <start>
    <element name="doc">
      <zeroOrMore>
	<element name="p">
	  <ref name="inline"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="inline.class" combine="choice">
    <choice>
      <element name="code">
	<ref name="inline">
      </element>
      <element name="em">
	<ref name="inline">
      </element>
    </choice>
  </define>
  
</grammar>

10. Namespaces

RELAX NG is namespace-aware. Thus, it considers an element or attribute to have both a local name and a namespace URI which together constitute the name of that element or attribute.

10.1. Using the `namespace` attribute

The element pattern uses a namespace attribute to specify the namespace URI of the elements that it matches. For example

<element name="foo" namespace="http://www.example.com">
  <empty/>
</element>

would match any of

<foo xmlns="http://www.example.com"/>
<e:foo xmlns:e="http://www.example.com"/>
<example:foo xmlns:example="http://www.example.com"/>

but not any of

<foo/>
<e:foo xmlns:e="http://WWW.EXAMPLE.COM"/>
<example:foo xmlns:example="http://www.example.net"/>

A value of an empty string for the namespace attribute indicates a null or absent namespace URI (just as with the xmlns attribute). Thus, the pattern

<element name="foo" namespace="">
  <empty/>
</element>

matches any of

<foo xmlns=""/>
<foo/>

but not any of

<foo xmlns="http://www.example.com"/>
<e:foo xmlns:e="http://www.example.com"/>

It is tedious and error-prone to specify the namespace attribute on every element, so RELAX NG allows it to be defaulted. If an element pattern does not specify a namespace attribute, then it defaults to the value of the namespace attribute of the nearest ancestor that has a namespace attribute, or the empty string if there is no such ancestor. Thus

<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

is equivalent to

<element name="addressBook" namespace="">
  <zeroOrMore>
    <element name="card" namespace="">
      <element name="name" namespace="">
        <text/>
      </element>
      <element name="email" namespace="">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

and

<element name="addressBook" namespace="http://www.example.com">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

is equivalent to

<element name="addressBook" namespace="http://www.example.com">
  <zeroOrMore>
    <element name="card" namespace="http://www.example.com">
      <element name="name" namespace="http://www.example.com">
        <text/>
      </element>
      <element name="email" namespace="http://www.example.com">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

The attribute pattern also takes a namespace attribute. However, there is a difference in how it defaults. This is because of the fact that the XML Namespaces Recommendation does not apply the default namespace to attributes. If a namespace attribute is not specified on the attribute pattern, then it defaults to the empty string. Thus

<element name="addressBook" namespace="http://www.example.com">
  <zeroOrMore>
    <element name="card">
      <attribute name="name"/>
      <attribute name="email"/>
    </element>
  </zeroOrMore>
</element>

is equivalent to

<element name="addressBook" namespace="http://www.example.com">
  <zeroOrMore>
    <element name="card" namespace="http://www.example.com">
      <attribute name="name" namespace=""/>
      <attribute name="email" namespace=""/>
    </element>
  </zeroOrMore>
</element>

and so will match

<addressBook xmlns="http://www.example.com">
  <card name="John Smith" email="js@example.com"/>
</addressBook>

<example:addressBook xmlns:example="http://www.example.com">
  <example:card name="John Smith" email="js@example.com"/>
</example:addressBook>

but not

<example:addressBook xmlns:example="http://www.example.com">
  <example:card example:name="John Smith" example:email="js@example.com"/>
</example:addressBook>

To match this last example, the attribute patterns must specify global="true":

<element name="addressBook" namespace="http://www.example.com">
  <zeroOrMore>
    <element name="card">
      <attribute name="name" global="true"/>
      <attribute name="email" global="true"/>
    </element>
  </zeroOrMore>
</element>

This is equivalent to:

<element name="addressBook" namespace="http://www.example.com">
  <zeroOrMore>
    <element name="card" namespace="http://www.example.com">
      <attribute name="name" namespace="http://www.example.com"/>
      <attribute name="email" namespace="http://www.example.com"/>
    </element>
  </zeroOrMore>
</element>

Thus, specifying global="true" on an attribute pattern makes the namespace attribute default in the same way that it does on an element pattern.

The namespace attribute is allowed on any element in a RELAX NG pattern. The global attribute is allowed only on an attribute pattern.

10.2. Qualified names

When a pattern matches elements and attributes from multiple namespaces, using the namespace attribute would require repeating namespace URIs in different places in the pattern. This is error-prone and hard to maintain, so RELAX NG also allows the element and attribute patterns to use a prefix in the value of the name attribute to specify the namespace URI. In this case, the prefix specifies the namespace URI to which that prefix is bound by the namespace declarations in scope on the element or attribute pattern. Thus

<element name="e:addressBook" xmlns:e="http://www.example.com">
  <zeroOrMore>
    <element name="e:card">
      <element name="e:name">
        <text/>
      </element>
      <element name="e:email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

is equivalent to

<element name="addressBook" namespace="http://www.example.com">
  <zeroOrMore>
    <element name="card" namespace="http://www.example.com">
      <element name="name" namespace="http://www.example.com">
        <text/>
      </element>
      <element name="email" namespace="http://www.example.com">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

If a prefix is specified in the value of the name attribute of an element or attribute pattern, then that prefix determines the namespace URI of the elements or attributes that will be matched by that pattern, regardless of the value of any namespace attribute.

Note that the XML default namespace (as specified by the xmlns attribute) is not used in determining the namespace URI of elements and attributes that element and attribute patterns match.

11. Name classes

Normally, the name of the element to be matched by an element element is specified by a name attribute. An element element can instead start with an element specifying a name-class. In this case, the element pattern will only match an element if the name of the element is a member of the name-class. The simplest name-class is anyName, which any name at all is a member of, regardless of its local name and its namespace URI. For example, the following pattern matches any well-formed XML document:

<grammar>

  <start name="anyElement">
    <element>
      <anyName/>
      <zeroOrMore>
	<choice>
	  <attribute>
	    <anyName/>
	  </attribute>
	  <text/>
	  <ref name="anyElement"/>
	</choice>
      </zeroOrMore>
    </element>
  </start>

</grammar>

The namespaceName name-class contains any name with the namespace URI specified by the namespace attribute, which defaults in the same way as the namespace attribute on the element pattern.

The choice name-class matches any name that is a member of any of its child name-classes.

The not name-classes contains any name that is not a member of the child name-class.

For example

<element name="card" namespace="http://www.example.com">
  <zeroOrMore>
    <attribute>
      <not>
        <choice>
          <namespaceName/>
          <namespaceName namespace=""/>
        </choice>
      </not>
    </attribute>
  </zeroOrMore>
  <text/>
</element>

would allow the card element to have any number of namespace-qualified attributes provided that they were qualified with namespace other than that of the card element.

Note that an attribute pattern matches a single attribute even if it has a name-class that contains multiple names. To match zero or more attributes, the zeroOrMore element must be used.

The difference name-class contains any name that is a member of the first child name-class, but not a member of any of the following name-classes. The not name-class is, in fact, a shorthand for difference:

<not> name-class </not>

is short for

<difference> <anyName/> name-class </difference>

The name name-class contains a single name. The content of the name element specifies the name in the same way as the name attribute of the element pattern. The namespace attribute specifies the namespace URI in the same way as the element pattern.

Some schema languages have a concept of lax validation, where an element or attribute is validated against a definition only if there is one. We can implement this concept in RELAX NG with name classes that uses difference and name. Suppose, for example, we wanted to allow an element to have any attribute with a qualified name, but we still wanted to ensure that if there was an xml:space attribute, it had the value default or preserve. It wouldn't work to use:

<element name="example">
  <zeroOrMore>
    <attribute>
      <anyName/>
    </attribute>
  </zeroOrMore>
  <optional>
    <attribute name="xml:space">
      <choice>
        <value>default</value>
        <value>preserve</value>
      </choice>
    </attribute>
  </optional>
</element>

because an xml:space attribute with a value other than default or preserve would match

    <attribute>
      <anyName/>
    </attribute>

even though it did not match

    <attribute name="xml:space">
      <choice>
        <value>default</value>
        <value>preserve</value>
      </choice>
    </attribute>

The solution is to use name together with difference:

<element name="example">
  <zeroOrMore>
    <attribute>
      <difference>
        <anyName/>
        <name>xml:space</name>
      </difference>
    </attribute>
  </zeroOrMore>
  <optional>
    <attribute name="xml:space">
      <choice>
        <value>default</value>
        <value>preserve</value>
      </choice>
    </attribute>
  </optional>
</element>

Note that the define element cannot contain a name-class; it can only contain a pattern.

12. Cross references

RELAX NG generalizes the ID/IDREF feature of XML. A data pattern may have either a key or a keyRef attribute. A data pattern with a key attribute behaves like an XML ID; a data pattern with a keyRef attribute type behaves like an XML IDREF. Whereas XML has a single symbol-space of IDs and IDREFs, RELAX NG has an unlimited number of named symbol-spaces. The value of the key or keyRef is an unprefixed name identifying the symbol-space. An element or attribute that matches a data pattern with a key attribute is called a key; an element or attribute that matches a data pattern with a keyRef attribute is called a key-reference. A document is invalid if it has two distinct keys in the same symbol-space with same value; it is also invalid if it contains a key-reference that does not have a corresponding key in the same symbol-space in the same document with the same value.

Whereas in XML IDs and IDREFs must be names, in RELAX NG keys and key-references may have any datatype; whether an element or attribute is a key or key-reference is orthogonal to its datatype.The values of keys and key-references are compared using the datatype specified by the data pattern. All data patterns sharing the same symbol space must specify the same value for the type attribute.

For example, suppose a document contains termref elements referencing defined terms:

<element name="termref">
  <data type="token" keyRef="term"/>
</element>

For each such defined term, there is a corresponding dt, dd pair in a glossary element:

<element name="glossary">
  <zeroOrMore>
    <element name="dt">
      <data type="token" key="term"/>
    </element>
    <element name="dd">
      <text/>
    </element>
  </zeroOrMore>
</element>

The above example is using the builtin token datatype introduced in the Enumerations section.

It must be possible to determine for any element or attribute whether it is a key or key reference and, if so, the symbol space of the key or key reference, by examining just the name of the element or attribute and the names of the ancestors of that element or attribute. For example, it is not permitted to have the pattern:

<element name="bad">
  <choice>
    <data type="string" key="x"/>
    <data type="string" key="y"/>
  </choice>
</element>

13. Annotations

If a RELAX NG element has an attribute or child element with a namespace URI other than the RELAX NG namespace, then that attribute or element is ignored. Thus, you can add annotations to RELAX NG patterns simply by using an attribute or element in a separate namespace:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9" xmlns:a="http://www.example.com/annotation">
  <zeroOrMore>
    <element name="card">
      <a:documentation>Information about a single email address.</a:documentation>
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>

14. Nested grammars

There is no prohibition against nesting grammar patterns. A ref pattern refers to a definition from nearest grammar ancestor. There is also a parentRef element that escapes out of the current grammar and references a definition from the parent of the current grammar.

Imagine the problem of writing a pattern for tables. The pattern for tables only cares about the structure of tables; it doesn't care about what goes inside a table cell. First, we create a RELAX NG pattern table.rng as follows:

<grammar>

<define name="cell.content">
  <notAllowed/>
</define>

<start>
  <element name="table">
    <oneOrMore>
      <element name="tr">
        <oneOrMore>
	  <element name="td">
	    <ref name="cell.content"/>
	  </element>
        </oneOrMore>
      </element>
    </oneOrMore>
  </element>
</start>

</grammar>

Patterns that include table.rng must redefine cell.content. By using a nested grammar pattern containing a parentRef pattern, the including pattern can redefine cell.content to be a pattern defined in the including pattern's grammar, thus effectively importing a pattern from the parent grammar into the child grammar:

<grammar>

<start>
  <element name="doc">
    <zeroOrMore>
      <choice>
	<element name="p">
	  <ref name="inline"/>
	</element>
	<grammar>
	  <include href="table.rng">
	    <define name="cell.content">
	      <parentRef name="inline"/>
	    </define>
          </include>
	</grammar>
      </choice>
    </zeroOrMore>
  </element>
</start>

<define name="inline">
  <zeroOrMore>
    <choice>
      <text/>
      <element name="em">
        <ref name="inline"/>
      </element>
    </choice>
  </zeroOrMore>
</define>

</grammar>

Of course, in a trivial case like this, there is no advantage in nesting the grammars: we could simply have have included table.rng within the outer grammar element. However, when the included grammar has many definitions, nesting it avoids the possibility of name conflicts between the including grammar and the included grammar.

15. Non-restrictions

RELAX NG does not require patterns to be "deterministic" or "unambiguous".

Suppose we wanted to write the email address book in HTML, but use class attributes to specify the structure.

<element name="html">
  <element name="head">
    <element name="title">
      <text/>
    </element>
  </element>
  <element name="body">
    <element name="table">
      <attribute name="class">
        <value>addressBook</value>
      </attribute>
      <oneOrMore>
        <element name="tr">
	  <attribute name="class">
	    <value>card</value>
	  </attribute>
          <element name="td">
	    <attribute name="class">
	      <value>name</value>
	    </attribute>
            <interleave>
              <text/>
              <optional>
                <element name="span">
                  <attribute name="class">
                    <value>givenName</value>
                  </attribute>
                  <text/>
                </element>
              </optional>
              <optional>
                <element name="span">
                  <attribute name="class">
                    <value>familyName</value>
                  </attribute>
                  <text/>
                </element>
              </optional>
            </interleave>
          </element>
          <element name="td">
	    <attribute name="class">
	      <value>email</value>
	    </attribute>
            <text/>
          </element>
        </element>
      </oneOrMore>
    </element>
  </element>
</element>

This would match a XML document such as:

<html>
  <head>
    <title>Example Address Book</title>
  </head>
  <body>
    <table class="addressBook">
      <tr class="card">
        <td class="name">
          <span class="givenName">John</span>
          <span class="familyName">Smith</span>
        </td>
        <td class="email">js@example.com</td>
      </tr>
    </table>
  </body>
</html>

but not

<html>
  <head>
    <title>Example Address Book</title>
  </head>
  <body>
    <table class="addressBook">
      <tr class="card">
        <td class="name">
          <span class="givenName">John</span>
          <!-- Note the incorrect class attribute -->
          <span class="givenName">Smith</span>
        </td>
        <td class="email">js@example.com</td>
      </tr>
    </table>
  </body>
</html>

16. Non-features

The role of RELAX NG is simply to specify a class of documents, not to assist in interpretation of the documents belonging to the class. It does not change the infoset of the document. In particular, RELAX NG

does not allow defaults for attributes to be specified
does allow entities to be specified
does allow notations to be specified
does not specify whether white-space is significant

Also RELAX NG does not define a way for an XML document to associate itself with a RELAX NG pattern.

17. Differences from TREX

the concur pattern has been removed
the string pattern has been replaced by the value pattern
the anyString pattern has been renamed to text
the namespace URI is different
pattern elements must be namespace qualified
anonymous datatypes have been removed
the data pattern can have parameters specified by param child elements
the list pattern has been added for matching whitespace-separated lists of tokens
the data pattern can have a key or keyRef attribute
the replace and group values for the combine attribute have been removed
an include element in a grammar may contain define elements that replace included definitions
an include element occurring as a pattern has been renamed to externalRef; an include element is now allowed only as a child of the grammar element
the parent attribute on the ref element has been replaced by a new parentRef element
the ns attribute has been renamed to namespace
the nsName element has been renamed to namespaceName
the type attribute of the data element is an unqualified name; the data element uses the datatypeNamespace attribute rather than the ns attribute to identify the namespace of the datatype

<?xml version="1.0" encoding="iso-8859-1"?> <article status="Working Draft"> <articleinfo> <releaseinfo>$Id: tutorial.xml,v 1.25 2001/06/01 07:23:37 jjc Exp $</releaseinfo> <title>RELAX NG Tutorial</title> <authorgroup> <editor> <firstname>James</firstname><surname>Clark</surname> <affiliation> <address><email>jjc@jclark.com</email></address> </affiliation> </editor> </authorgroup> <pubdate>1 June 2001</pubdate> <releaseinfo role="meta"> $Id: tutorial.xml,v 1.25 2001/06/01 07:23:37 jjc Exp $ </releaseinfo> <copyright><year>2001</year><holder>OASIS</holder></copyright> <legalnotice> <para>Copyright © The Organization for the Advancement of Structured Information Standards [OASIS] 2001. All Rights Reserved.</para> <para>This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.</para> <para>The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.</para> <para>This document and the information contained herein is provided on an <quote>AS IS</quote> basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.</para> </legalnotice> <legalnotice role="status"><title>Status of this Document</title> <para>This is a working draft constructed by the editor. It is not an official committee work product and may not reflect the consensus opinion of the committee. Comments on this document may be sent to <ulink url="mailto:relax-ng-comment@lists.oasis-open.org" >relax-ng-comment@lists.oasis-open.org</ulink>.</para> </legalnotice> <abstract> <para>RELAX NG is a simple schema language for XML, based on <ulink url="http://www.xml.gr.jp/relax/">RELAX</ulink> and <ulink url="http://www.thaiopensource.com/trex/">TREX</ulink>. A RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema thus identifies a class of XML documents consisting of those documents that match the pattern. A RELAX NG schema is itself an XML document. </para> </abstract> <revhistory> <revision> <revnumber>Working Draft</revnumber> <date>1 June 2001</date> </revision> </revhistory> </articleinfo> <section> <title>Getting started</title> <para>Consider a simple XML representation of an email address book:</para> <programlisting><![CDATA[<addressBook> <card> <name>John Smith</name> <email>js@example.com</email> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card> </addressBook>]]></programlisting> <para>The DTD would be as follows:</para> <programlisting><![CDATA[<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card (name, email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> ]>]]></programlisting> <para>A RELAX NG pattern for this could be written as follows:</para> <programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>If the <literal>addressBook</literal> is required to be non-empty, then we can use <literal>oneOrMore</literal> instead of <literal>zeroOrMore</literal>:</para> <programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9"> <oneOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </oneOrMore> </element>]]></programlisting> <para>Now let's change it to allow each <literal>card</literal> to have an optional <literal>note</literal> element.</para> <programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>]]></programlisting> <para>Note that the <literal>text</literal> pattern matches arbitrary text, including empty text. Note also that whitespace separating tags is ignored when matching against a pattern.</para> <para>All the elements specifying the pattern must be namespace qualified by the namespace URI:</para> <programlisting>http://relaxng.org/ns/structure/0.9</programlisting> <para>The examples above use a default namespace declaration <literal>xmlns="http://relaxng.org/ns/structure/0.9"</literal> for this. A namespace prefix is equally acceptable:</para> <programlisting><![CDATA[<rng:element name="addressBook" xmlns:rng="http://relaxng.org/ns/structure/0.9"> <rng:zeroOrMore> <rng:element name="card"> <rng:element name="name"> <rng:text/> </rng:element> <rng:element name="email"> <rng:text/> </rng:element> </rng:element> </rng:zeroOrMore> </rng:element> </rng:div>]]></programlisting> <para>For the remainder of this document, the default namespace declaration will be left out of examples.</para> </section> <section> <title>Choice</title> <para>Now suppose we want to allow the <literal>name</literal> to be broken down into a <literal>givenName</literal> and a <literal>familyName</literal>, allowing an <literal>addressBook</literal> like this:</para> <programlisting><![CDATA[<addressBook> <card> <givenName>John</givenName> <familyName>Smith</familyName> <email>js@example.com</name> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card> </addressBook>]]></programlisting> <para>We can use the following pattern:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <group> <element name="givenName"> <text/> </element> <element name="familyName"> <text/> </element> </group> </choice> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>]]></programlisting> <para>This corresponds to the following DTD:</para> <programlisting><![CDATA[<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card ((name | (givenName, familyName)), email, note?)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT givenName (#PCDATA)> <!ELEMENT familyName (#PCDATA)> <!ELEMENT note (#PCDATA)> ]>]]></programlisting> </section> <section> <title>Attributes</title> <para>Suppose we want the <literal>card</literal> element to have attributes rather than child elements. The DTD might look like this</para> <programlisting><![CDATA[<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card EMPTY> <!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED> ]>]]></programlisting> <para>Just change each <literal>element</literal> pattern to an <literal>attribute</literal> pattern:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore> </element>]]></programlisting> <para>In XML, the order of attributes is traditionally not significant. RELAX NG follows this tradition. The above pattern would match both</para> <programlisting><![CDATA[<card name="John Smith" email="js@example.com"/>]]></programlisting> <para>and</para> <programlisting><![CDATA[<card email="js@example.com" name="John Smith"/>]]></programlisting> <para>In contrast, the order of elements is significant. The pattern</para> <programlisting><![CDATA[<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element>]]></programlisting> <para>would <emphasis role="strong">not</emphasis> match:</para> <programlisting><![CDATA[<card><email>js@example.com</email><name>John Smith</name></card>]]></programlisting> <para>Note that an <literal>attribute</literal> element by itself indicates a required attribute, just as an <literal>element</literal> element by itself indicates a required element. To specify an optional attribute, use <literal>optional</literal> just as with <literal>element</literal>:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> <optional> <attribute name="note"> <text/> </attribute> </optional> </element> </zeroOrMore> </element>]]></programlisting> <para>The <literal>group</literal> and <literal>choice</literal> patterns can be applied to <literal>attribute</literal> elements in the same way they are applied to <literal>element</literal> patterns. For example, if we wanted to allow either a <literal>name</literal> attribute or both a <literal>givenName</literal> and a <literal>familyName</literal> attribute, we can specify this in the same way that we would if we were using elements:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <attribute name="name"> <text/> </attribute> <group> <attribute name="givenName"> <text/> </attribute> <attribute name="familyName"> <text/> </attribute> </group> </choice> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore> </element>]]></programlisting> <para>There are no restrictions on how <literal>element</literal> elements and <literal>attribute</literal> elements can be combined. For example, the following pattern would allow a choice of elements and attributes independently for both the <literal>name</literal> and the <literal>email</literal> part of a <literal>card</literal>:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <attribute name="name"> <text/> </attribute> </choice> <choice> <element name="email"> <text/> </element> <attribute name="email"> <text/> </attribute> </choice> </element> </zeroOrMore> </element>]]></programlisting> <para>As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:</para> <programlisting><![CDATA[<card name="John Smith" email="js@example.com"/> <card email="js@example.com" name="John Smith"/> <card email="js@example.com"><name>John Smith</name></card> <card name="John Smith"><email>js@example.com</email></card> <card><name>John Smith</name><email>js@example.com</email></card>]]></programlisting> <para>However, it would not match</para> <programlisting><![CDATA[<card><email>js@example.com</email><name>John Smith</name></card>]]></programlisting> <para>because the pattern for <literal>card</literal> requires any <literal>email</literal> child element to follow any <literal>name</literal> child element.</para> <para>There is one difference between <literal>attribute</literal> and <literal>element</literal> patterns: <literal><text/></literal> is the default for the content of an <literal>attribute</literal> pattern, whereas an <literal>element</literal> pattern is not allowed to be empty. For example,</para> <programlisting><![CDATA[<attribute name="email"/>]]></programlisting> <para>is short for</para> <programlisting><![CDATA[<attribute name="email"> <text/> </attribute>]]></programlisting> <para>It might seem natural that</para> <programlisting><![CDATA[<element name="x"/>]]></programlisting> <para>matched an <literal>x</literal> element with no attributes and no content. However, this would make the meaning of empty content inconsistent between the <literal>element</literal> pattern and the <literal>attribute</literal> pattern, so RELAX NG does not allow the <literal>element</literal> pattern to be empty. A pattern that matches an element with no attributes and no children must use <literal><empty/></literal> explicitly:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="prefersHTML"> <empty/> </element> </optional> </element> </zeroOrMore> </element>]]></programlisting> </section> <section> <title>Named patterns</title> <para>For a non-trivial RELAX NG pattern, it is often convenient to be able to give names to parts of the pattern. Instead of</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>we can write</para> <programlisting><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <ref name="cardContent"/> </element> </zeroOrMore> </element> </start> <define name="cardContent"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </define> </grammar>]]></programlisting> <para>A <literal>grammar</literal> element has a single <literal>start</literal> child element, and zero or more <literal>define</literal> child elements. The <literal>start</literal> and <literal>define</literal> elements contain patterns. These patterns can contain <literal>ref</literal> elements that refer to patterns defined by any of the <literal>define</literal> elements in that <literal>grammar</literal> element. A <literal>grammar</literal> pattern is matched by matching the pattern contained in the <literal>start</literal> element.</para> <para>We can use the <literal>grammar</literal> element to write patterns in a style similar to DTDs:</para> <programlisting><![CDATA[<grammar> <start> <ref name="AddressBook"/> </start> <define name="AddressBook"> <element name="addressBook"> <zeroOrMore> <ref name="Card"/> </zeroOrMore> </element> </define> <define name="Card"> <element name="card"> <ref name="Name"/> <ref name="Email"/> </element> </define> <define name="Name"> <element name="name"> <text/> </element> </define> <define name="Email"> <element name="email"> <text/> </element> </define> </grammar>]]></programlisting> <para>Recursive references are allowed. For example</para> <programlisting><![CDATA[<define name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional> <attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> </zeroOrMore> </define>]]></programlisting> <para>However, recursive references must be within an <literal>element</literal>. Thus, the following is <emphasis role="strong">not</emphasis> allowed:</para> <programlisting><![CDATA[<define name="inline"> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional> <attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> <optional> <ref name="inline"/> </optional> </define>]]></programlisting> <para>A <literal>start</literal> element may also have a <literal>name</literal> attribute. This is a shorthand for a <literal>define</literal> with that <literal>name</literal> together with a <literal>start</literal> element referencing that definition. For example</para> <programlisting><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> </choice> </zeroOrMore> </start> </grammar>]]></programlisting> <para>is short for</para> <programlisting><![CDATA[<grammar> <start> <ref name="inline"/> </start> <define name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> </choice> </zeroOrMore> </define> </grammar>]]></programlisting> </section> <section> <title>Datatyping</title> <para>RELAX NG allows patterns to reference externally-defined datatypes, such as those defined by W3C XML Schema Part 2. RELAX NG implementations may differ in what datatypes they support. You must use datatypes that are supported by the implementation you plan to use.</para> <para>The <literal>data</literal> pattern matches a string that represents a value of a named datatype. The <literal>datatypeNamespace</literal> attribute contains a URI identifying the collection of datatypes being used. The datatype collection defined W3C XML Schema Part 2 would be identified by the URI <literal>http://www.w3.org/2001/XMLSchema-datatypes</literal>. The <literal>type</literal> attribute specifies the name of the datatype in the collection identified by the <literal>datatypeNamespace</literal> attribute. For example, if a RELAX NG implementation supported the built-in datatypes of W3C XML Schema Part 2, you could use:</para> <programlisting><![CDATA[<element name="number"> <data type="integer" datatypeNamespace="http://www.w3.org/2001/XMLSchema-datatypes"/> </element>]]></programlisting> <para>It is inconvenient to specify the <literal>datatypeNamespace</literal> attribute on every <literal>data</literal> element, so RELAX NG allows the <literal>datatypeNamespace</literal> attribute to be inherited. The <literal>datatypeNamespace</literal> attribute can be specified on any RELAX NG element. If a <literal>data</literal> element does not have a <literal>datatypeNamespace</literal> attribute, it will use the value from the closest ancestor that has a <literal>datatypeNamespace</literal> attribute. Typically, the <literal>datatypeNamespace</literal> attribute is specified on the root element of the RELAX NG pattern. For example:</para> <programlisting><![CDATA[<element name="point" datatypeNamespace="http://www.w3.org/2001/XMLSchema-datatypes"> <element name="x"> <data type="double"/> </element> <element name="y"> <data type="double"/> </element> </element>]]></programlisting> <para>If the children of an element or an attribute match a <literal>data</literal> pattern, then complete content of the element or attribute must match that <literal>data</literal> pattern. It is not permitted to have a pattern which allows part of the content to match a <literal>data</literal> pattern, and another part to match another pattern. For example, the following pattern is <emphasis role="strong">not</emphasis> allowed:</para> <programlisting><![CDATA[<element name="bad"> <data type="int"/> <element name="note"> <text/> </element> </element>]]></programlisting> <para>However, this would be fine:</para> <programlisting><![CDATA[<element name="ok"> <data type="int"/> <attribute name="note"> <text/> </attribute> </element>]]></programlisting> <para>Note that this restriction does not apply to the <literal>text</literal> pattern.</para> <para>Datatypes may have parameters. For example, a string datatype may have a parameter controlling the length of the string. The parameters applicable to any particular datatype are determined by the datatyping vocabulary. Parameters are specified by adding one or more <literal>param</literal> elements as children of the <literal>data</literal> element. For example, the following constrains the <literal>email</literal> element to contain a string at most 127 characters long:</para> <programlisting><![CDATA[<element name="email"> <data type="string"> <param name="maxLength">127</param> </data> </element>]]></programlisting> </section> <section> <title>Enumerations</title> <para>Many markup vocabularies have attributes whose value is constrained to be one of set of specified values. The <literal>value</literal> pattern matches a string that has a specified value. For example,</para> <programlisting><![CDATA[<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </attribute> </element>]]></programlisting> <para>allows the <literal>preferredFormat</literal> attribute to have the value <literal>html</literal> or <literal>text</literal>. This corresponds to the DTD</para> <programlisting><![CDATA[<!DOCTYPE card [ <!ELEMENT card EMPTY> <!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED preferredFormat (html|text) #REQUIRED> ]>]]></programlisting> <para>The <literal>value</literal> pattern is not restricted to attribute values. For example, the following is allowed:</para> <programlisting><![CDATA[<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <element name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </element> </element>]]></programlisting> <para>The prohibition against a <literal>data</literal> pattern's matching only part of the content of an element also applies to <literal>value</literal> patterns.</para> <para>By default, the <literal>value</literal> pattern will consider the string in the pattern to match the string in the document if the two strings are the same after the whitespace in both strings is normalized. Whitespace normalization strips leading and trailing white-space characters, and collapses sequences of one or more white-space characters to a single space character. This corresponds to the behaviour of an XML parser for an attribute that is declared as other than CDATA. Thus the above pattern will match any of</para> <programlisting><![CDATA[<card name="John Smith" email="js@example.com" preferredFormat="html"/> <card name="John Smith" email="js@example.com" prefersFormat=" html "/>]]></programlisting> <para>The way that the <literal>value</literal> pattern compares the pattern string with the document string can be controlled by specifying a <literal>type</literal> attribute and optionally a <literal>datatypeNamespace</literal> attribute, which identify a datatype in the same way as for the <literal>data</literal> pattern. The pattern string matches the document string if they both represent the same value of the specified datatype. Thus, whereas the <literal>data</literal> pattern matches an arbitrary value of a datatype, the <literal>value</literal> pattern matches a specific value of a datatype.</para> <para>If there is no ancestor element with a <literal>datatypeNamespace</literal> element, the datatype namespace defaults to the RELAX NG namespace. This provides two datatypes, <literal>string</literal> and <literal>token</literal>. The datatype <literal>token</literal> corresponds to the default comparison behavior of the <literal>value</literal> pattern. The datatype <literal>string</literal> compares strings without any normalization (other than that performed by XML). For example,</para> <programlisting><![CDATA[<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value type="string">html</value> <value type="string">text</value> </choice> </attribute> </element>]]></programlisting> <para>will <emphasis role="strong">not</emphasis> match</para> <programlisting><![CDATA[<card name="John Smith" email="js@example.com" prefersHTML=" html "/>]]></programlisting> </section> <section> <title>Lists</title> <para>The <literal>list</literal> pattern matches a whitespace-separated sequence of tokens; it contains a pattern that the sequence of individual tokens must match. The <literal>list</literal> pattern splits a string into a list of strings, and then matches the resulting list of strings against the pattern inside the <literal>list</literal> pattern.</para> <para>For example, suppose we want to have a <literal>vector</literal> element that contains two floating point numbers separated by whitespace. We could use <literal>list</literal> as follows:</para> <programlisting><![CDATA[<element name="vector"> <list> <data type="float"/> <data type="float"/> </list> </element>]]></programlisting> <para>Or suppose we want the <literal>vector</literal> element to contain a list of one or more floating point numbers separated by whitespace:</para> <programlisting><![CDATA[<element name="vector"> <list> <oneOrMore> <data type="double"/> </oneOrMore> </list> </element>]]></programlisting> <para>Or suppose we want a <literal>path</literal> element containing an even number of floating point numbers:</para> <programlisting><![CDATA[<element name="path"> <list> <oneOrMore> <data type="double"/> <data type="double"/> </oneOrMore> </list> </element>]]></programlisting> <para>The <literal>list</literal> pattern must not contain <literal>element</literal> or <literal>attribute</literal> patterns.</para> </section> <section> <title>Interleaving</title> <para>The <literal>interleave</literal> pattern allows child elements to occur in any order. For example, the following would allow the <literal>card</literal> element to contain the <literal>name</literal> and <literal>email</literal> elements in any order:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <interleave> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </interleave> </element> </zeroOrMore> </element>]]></programlisting> <para>The pattern is called <literal>interleave</literal> because of how it works with patterns that match more than one element. Suppose we want to write a pattern for the HTML <literal>head</literal> element which requires exactly one <literal>title</literal> element, at most one <literal>base</literal> element and zero or more <literal>style</literal>, <literal>script</literal>, <literal>link</literal> and <literal>meta</literal> elements and suppose we are writing a <literal>grammar</literal> pattern that has one definition for each element. Then we could define the pattern for <literal>head</literal> as follows:</para> <programlisting><![CDATA[<define name="head"> <element name="head"> <interleave> <ref name="title"/> <optional> <ref name="base"/> </optional> <zeroOrMore> <ref name="style"/> </zeroOrMore> <zeroOrMore> <ref name="script"/> </zeroOrMore> <zeroOrMore> <ref name="link"/> </zeroOrMore> <zeroOrMore> <ref name="meta"/> </zeroOrMore> </interleave> </element> </define>]]></programlisting> <para>Suppose we had a <literal>head</literal> element that contained a <literal>meta</literal> element, followed by a <literal>title</literal> element, followed by a <literal>meta</literal> element. This would match the pattern because it is an interleaving of a sequence of two <literal>meta</literal> elements, which match the child pattern</para> <programlisting><![CDATA[ <zeroOrMore> <ref name="meta"/> </zeroOrMore>]]></programlisting> <para>and a sequence of one <literal>title</literal> element, which matches the child pattern</para> <programlisting><![CDATA[ <ref name="title"/>]]></programlisting> <para>The semantics of the <literal>interleave</literal> pattern are that a sequence of elements matches an <literal>interleave</literal> pattern if it is an interleaving of sequences that match the child patterns of the <literal>interleave</literal> pattern. Note that this is different from the <literal>&</literal> connector in SGML: <literal>A* & B</literal> matches the sequence of elements <literal>A A B</literal> or the sequence of elements <literal>B A A</literal> but not the sequence of elements <literal>A B A</literal>.</para> <para>One special case of <literal>interleave</literal> is very common: interleaving <literal><text/></literal> with a pattern <replaceable>p</replaceable> represents a pattern that matches what <replaceable>p</replaceable> matches but also allows characters to occur as children. The <literal>mixed</literal> element is a shorthand for this.</para> <programlisting><![CDATA[<mixed> ]]><replaceable>p</replaceable><![CDATA[ </mixed>]]></programlisting> <para>is short for</para> <programlisting><![CDATA[<interleave> <text/> ]]><replaceable>p</replaceable><![CDATA[ </interleave>]]></programlisting> </section> <section> <title>Modularity</title> <section> <title>Referencing external patterns</title> <para>The <literal>externalRef</literal> pattern can be used to reference a pattern defined in a separate file. The <literal>externalRef</literal> element has a required <literal>href</literal> attribute that specifies the URL of a file containing the pattern. The <literal>externalRef</literal> matches if the pattern contained in the specified URL matches. Suppose for example, you have a RELAX NG pattern that matches HTML inline content stored in <literal>inline.rng</literal>:</para> <programlisting><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="code"> <ref name="inline"/> </element> <element name="em"> <ref name="inline"/> </element>  </choice> </zeroOrMore> </start> </grammar>]]></programlisting> <para>Then we could allow the <literal>note</literal> element to contain inline HTML markup by using <literal>externalRef</literal> as follows:</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <externalRef href="inline.rng"/> </element> </optional> </element> </zeroOrMore> </element>]]></programlisting> <para>For another example, suppose you have two RELAX NG patterns stored in files <literal>pattern1.rng</literal> and <literal>pattern2.rng</literal>. Then the following is a pattern that which match anything matched by one of those patterns:</para> <programlisting><![CDATA[<choice> <externalRef href="pattern1.rng"/> <externalRef href="pattern2.rng"/> </choice>]]></programlisting> </section> <section> <title>Merging grammars</title> <para>The <literal>include</literal> element allows grammars to be merged together. A <literal>grammar</literal> pattern may have <literal>include</literal> elements as children. An <literal>include</literal> element has a required <literal>href</literal> attribute that specifies the URL of a file containing a <literal>grammar</literal> pattern. The referenced <literal>grammar</literal> pattern will be merged with the <literal>grammar</literal> pattern containing the <literal>include</literal> element.</para> <para>@@@ Add example</para> <para>Normally, duplicate definitions (two definitions with the same name) result in an error. However, <literal>define</literal> elements may be put inside the <literal>include</literal> element to indicate that they are to replace definitions in the included <literal>grammar</literal> pattern.</para> <para>Suppose the file <literal>addressBook.rng</literal> contains the following grammar pattern:</para> <programlisting><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <empty/> </define> </grammar>]]></programlisting> <para>Another pattern could customize <literal>addressBook.rng</literal> as follows:</para> <programlisting><![CDATA[<grammar> <include href="addressBook.rng"> <define name="card.local"> <optional> <element name="note"> <text/> </element> </optional> </define> </include> </grammar>]]></programlisting> <para>This would be equivalent to:</para> <programlisting><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <optional> <element name="note"> <text/> </element> </optional> </define> </grammar>]]></programlisting> <para>which is equivalent to</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>]]></programlisting> <para>It is also possible to combine together duplicate definitions from separate files by adding a <literal>combine</literal> attribute to the <literal>define</literal> elements. The <literal>combine</literal> attribute specifies how the definitions should be combined; it may have the value <literal>choice</literal> or <literal>interleave</literal>. For example, we could have written our customization as:</para> <programlisting><![CDATA[<grammar> <include href="addressBook.rng"/> <define name="card.local" combine="choice">  <element name="note"> <text/> </element> </define> </grammar>]]></programlisting> <para>This would be equivalent to:</para> <programlisting><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <choice> <empty/> <element name="note"> <text/> </element> </choice> </define> </grammar>]]></programlisting> <para>This has the same meaning as before, since an optional pattern is equivalent to a choice between the pattern and empty.</para> <para>We could also have used <literal>combine="interleave"</literal> here:</para> <programlisting><![CDATA[<grammar> <include href="addressBook.rng"/> <define name="card.local" combine="interleave"> <optional> <element name="note"> <text/> </element> </optional> </define> </grammar>]]></programlisting> <para>This would be equivalent to:</para> <programlisting><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <interleave> <empty/> <optional> <element name="note"> <text/> </element> </optional> </interleave> </define> </grammar>]]></programlisting> <para>This has the same meaning as before, since adding an <literal>empty</literal> pattern to the content of a <literal>interleave</literal> pattern does not make any difference to what the <literal>interleave</literal> pattern matches.</para> <para>@@@ Add example of combine="interleave" with attributes.</para> <para>The <literal>notAllowed</literal> pattern never matches anything. Just as adding <literal>empty</literal> to a <literal>group</literal> makes no difference, so adding <literal>notAllowed</literal> to a <literal>choice</literal> makes no difference. It is typically used in a definition that is referenced in a <literal>choice</literal> element to allow an including pattern to specify additional choices. For example, suppose a RELAX NG pattern <literal>inline.rng</literal> provides a pattern for inline content, which allows <literal>bold</literal> and <literal>italic</literal> elements arbitrarily nested:</para> <programlisting><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <ref name="local.inline"/> </choice> </zeroOrMore> </start> <define name="local.inline"> <notAllowed/> </define> </grammar>]]></programlisting> <para>Another RELAX NG pattern could use <literal>inline.rng</literal> and add <literal>code</literal> and <literal>em</literal> to the set of inline elements as follows:</para> <programlisting><![CDATA[<grammar> <include href="inline.rng"> <define name="local.inline"> <choice> <element name="code"> <ref name="inline"> </element> <element name="em"> <ref name="inline"> </element> </choice> </define> </include> <start> <element name="doc"> <zeroOrMore> <element name="p"> <ref name="inline"/> </element> </zeroOrMore> </element> </start> </grammar>]]></programlisting> <para>We could instead have used <literal>combine="choice"</literal>. In this case, <literal>inline.rng</literal> would need to separate out the choices as a separate definition:</para> <programlisting><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <ref name="inline.class"/> </zeroOrMore> </start> <define name="inline.class"> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> </choice> </define> </grammar>]]></programlisting> <para>and the customization would add to those choices:</para> <programlisting><![CDATA[<grammar> <include href="inline.rng"/> <start> <element name="doc"> <zeroOrMore> <element name="p"> <ref name="inline"/> </element> </zeroOrMore> </element> </start> <define name="inline.class" combine="choice"> <choice> <element name="code"> <ref name="inline"> </element> <element name="em"> <ref name="inline"> </element> </choice> </define> </grammar>]]></programlisting> </section> </section> <section> <title>Namespaces</title> <para>RELAX NG is namespace-aware. Thus, it considers an element or attribute to have both a local name and a namespace URI which together constitute the name of that element or attribute.</para> <section> <title>Using the <literal>namespace</literal> attribute</title> <para>The <literal>element</literal> pattern uses a <literal>namespace</literal> attribute to specify the namespace URI of the elements that it matches. For example</para> <programlisting><![CDATA[<element name="foo" namespace="http://www.example.com"> <empty/> </element>]]></programlisting> <para>would match any of</para> <programlisting><![CDATA[<foo xmlns="http://www.example.com"/> <e:foo xmlns:e="http://www.example.com"/> <example:foo xmlns:example="http://www.example.com"/>]]></programlisting> <para>but not any of</para> <programlisting><![CDATA[<foo/> <e:foo xmlns:e="http://WWW.EXAMPLE.COM"/> <example:foo xmlns:example="http://www.example.net"/>]]></programlisting> <para>A value of an empty string for the <literal>namespace</literal> attribute indicates a null or absent namespace URI (just as with the <literal>xmlns</literal> attribute). Thus, the pattern</para> <programlisting><![CDATA[<element name="foo" namespace=""> <empty/> </element>]]></programlisting> <para>matches any of</para> <programlisting><![CDATA[<foo xmlns=""/> <foo/>]]></programlisting> <para>but not any of</para> <programlisting><![CDATA[<foo xmlns="http://www.example.com"/> <e:foo xmlns:e="http://www.example.com"/>]]></programlisting> <para>It is tedious and error-prone to specify the <literal>namespace</literal> attribute on every <literal>element</literal>, so RELAX NG allows it to be defaulted. If an <literal>element</literal> pattern does not specify a <literal>namespace</literal> attribute, then it defaults to the value of the <literal>namespace</literal> attribute of the nearest ancestor that has a <literal>namespace</literal> attribute, or the empty string if there is no such ancestor. Thus</para> <programlisting><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>is equivalent to</para> <programlisting><![CDATA[<element name="addressBook" namespace=""> <zeroOrMore> <element name="card" namespace=""> <element name="name" namespace=""> <text/> </element> <element name="email" namespace=""> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>and</para> <programlisting><![CDATA[<element name="addressBook" namespace="http://www.example.com"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>is equivalent to</para> <programlisting><![CDATA[<element name="addressBook" namespace="http://www.example.com"> <zeroOrMore> <element name="card" namespace="http://www.example.com"> <element name="name" namespace="http://www.example.com"> <text/> </element> <element name="email" namespace="http://www.example.com"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>The <literal>attribute</literal> pattern also takes a <literal>namespace</literal> attribute. However, there is a difference in how it defaults. This is because of the fact that the XML Namespaces Recommendation does not apply the default namespace to attributes. If a <literal>namespace</literal> attribute is not specified on the <literal>attribute</literal> pattern, then it defaults to the empty string. Thus</para> <programlisting><![CDATA[<element name="addressBook" namespace="http://www.example.com"> <zeroOrMore> <element name="card"> <attribute name="name"/> <attribute name="email"/> </element> </zeroOrMore> </element>]]></programlisting> <para>is equivalent to</para> <programlisting><![CDATA[<element name="addressBook" namespace="http://www.example.com"> <zeroOrMore> <element name="card" namespace="http://www.example.com"> <attribute name="name" namespace=""/> <attribute name="email" namespace=""/> </element> </zeroOrMore> </element>]]></programlisting> <para>and so will match</para> <programlisting><![CDATA[<addressBook xmlns="http://www.example.com"> <card name="John Smith" email="js@example.com"/> </addressBook>]]></programlisting> <para>or</para> <programlisting><![CDATA[<example:addressBook xmlns:example="http://www.example.com"> <example:card name="John Smith" email="js@example.com"/> </example:addressBook>]]></programlisting> <para>but not</para> <programlisting><![CDATA[<example:addressBook xmlns:example="http://www.example.com"> <example:card example:name="John Smith" example:email="js@example.com"/> </example:addressBook>]]></programlisting> <para>To match this last example, the <literal>attribute</literal> patterns must specify <literal>global="true"</literal>:</para> <programlisting><![CDATA[<element name="addressBook" namespace="http://www.example.com"> <zeroOrMore> <element name="card"> <attribute name="name" global="true"/> <attribute name="email" global="true"/> </element> </zeroOrMore> </element>]]></programlisting> <para>This is equivalent to:</para> <programlisting><![CDATA[<element name="addressBook" namespace="http://www.example.com"> <zeroOrMore> <element name="card" namespace="http://www.example.com"> <attribute name="name" namespace="http://www.example.com"/> <attribute name="email" namespace="http://www.example.com"/> </element> </zeroOrMore> </element>]]></programlisting> <para>Thus, specifying <literal>global="true"</literal> on an <literal>attribute</literal> pattern makes the <literal>namespace</literal> attribute default in the same way that it does on an <literal>element</literal> pattern.</para> <para>The <literal>namespace</literal> attribute is allowed on any element in a RELAX NG pattern. The <literal>global</literal> attribute is allowed only on an <literal>attribute</literal> pattern.</para> </section> <section> <title>Qualified names</title> <para>When a pattern matches elements and attributes from multiple namespaces, using the <literal>namespace</literal> attribute would require repeating namespace URIs in different places in the pattern. This is error-prone and hard to maintain, so RELAX NG also allows the <literal>element</literal> and <literal>attribute</literal> patterns to use a prefix in the value of the <literal>name</literal> attribute to specify the namespace URI. In this case, the prefix specifies the namespace URI to which that prefix is bound by the namespace declarations in scope on the <literal>element</literal> or <literal>attribute</literal> pattern. Thus</para> <programlisting><![CDATA[<element name="e:addressBook" xmlns:e="http://www.example.com"> <zeroOrMore> <element name="e:card"> <element name="e:name"> <text/> </element> <element name="e:email"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>is equivalent to</para> <programlisting><![CDATA[<element name="addressBook" namespace="http://www.example.com"> <zeroOrMore> <element name="card" namespace="http://www.example.com"> <element name="name" namespace="http://www.example.com"> <text/> </element> <element name="email" namespace="http://www.example.com"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> <para>If a prefix is specified in the value of the <literal>name</literal> attribute of an <literal>element</literal> or <literal>attribute</literal> pattern, then that prefix determines the namespace URI of the elements or attributes that will be matched by that pattern, regardless of the value of any <literal>namespace</literal> attribute.</para> <para>Note that the XML default namespace (as specified by the <literal>xmlns</literal> attribute) is not used in determining the namespace URI of elements and attributes that <literal>element</literal> and <literal>attribute</literal> patterns match.</para> </section> </section> <section> <title>Name classes</title> <para>Normally, the name of the element to be matched by an <literal>element</literal> element is specified by a <literal>name</literal> attribute. An <literal>element</literal> element can instead start with an element specifying a <emphasis>name-class</emphasis>. In this case, the <literal>element</literal> pattern will only match an element if the name of the element is a member of the name-class. The simplest name-class is <literal>anyName</literal>, which any name at all is a member of, regardless of its local name and its namespace URI. For example, the following pattern matches any well-formed XML document:</para> <programlisting><![CDATA[<grammar> <start name="anyElement"> <element> <anyName/> <zeroOrMore> <choice> <attribute> <anyName/> </attribute> <text/> <ref name="anyElement"/> </choice> </zeroOrMore> </element> </start> </grammar>]]></programlisting> <para>The <literal>namespaceName</literal> name-class contains any name with the namespace URI specified by the <literal>namespace</literal> attribute, which defaults in the same way as the <literal>namespace</literal> attribute on the <literal>element</literal> pattern.</para> <para>The <literal>choice</literal> name-class matches any name that is a member of any of its child name-classes.</para> <para>The <literal>not</literal> name-classes contains any name that is not a member of the child name-class.</para> <para>For example</para> <programlisting><![CDATA[<element name="card" namespace="http://www.example.com"> <zeroOrMore> <attribute> <not> <choice> <namespaceName/> <namespaceName namespace=""/> </choice> </not> </attribute> </zeroOrMore> <text/> </element>]]></programlisting> <para>would allow the <literal>card</literal> element to have any number of namespace-qualified attributes provided that they were qualified with namespace other than that of the <literal>card</literal> element.</para> <para>Note that an <literal>attribute</literal> pattern matches a single attribute even if it has a name-class that contains multiple names. To match zero or more attributes, the <literal>zeroOrMore</literal> element must be used.</para> <para>The <literal>difference</literal> name-class contains any name that is a member of the first child name-class, but not a member of any of the following name-classes. The <literal>not</literal> name-class is, in fact, a shorthand for <literal>difference</literal>:</para> <programlisting><![CDATA[<not> ]]><replaceable>name-class</replaceable><![CDATA[ </not>]]></programlisting> <para>is short for</para> <programlisting><![CDATA[<difference> <anyName/> ]]><replaceable>name-class</replaceable><![CDATA[ </difference>]]></programlisting> <para>The <literal>name</literal> name-class contains a single name. The content of the <literal>name</literal> element specifies the name in the same way as the <literal>name</literal> attribute of the <literal>element</literal> pattern. The <literal>namespace</literal> attribute specifies the namespace URI in the same way as the <literal>element</literal> pattern.</para> <para>Some schema languages have a concept of <emphasis>lax</emphasis> validation, where an element or attribute is validated against a definition only if there is one. We can implement this concept in RELAX NG with name classes that uses <literal>difference</literal> and <literal>name</literal>. Suppose, for example, we wanted to allow an element to have any attribute with a qualified name, but we still wanted to ensure that if there was an <literal>xml:space</literal> attribute, it had the value <literal>default</literal> or <literal>preserve</literal>. It wouldn't work to use:</para> <programlisting><![CDATA[<element name="example"> <zeroOrMore> <attribute> <anyName/> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional> </element>]]></programlisting> <para>because an <literal>xml:space</literal> attribute with a value other than <literal>default</literal> or <literal>preserve</literal> would match</para> <programlisting><![CDATA[ <attribute> <anyName/> </attribute>]]></programlisting> <para>even though it did not match</para> <programlisting><![CDATA[ <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute>]]></programlisting> <para>The solution is to use <literal>name</literal> together with <literal>difference</literal>:</para> <programlisting><![CDATA[<element name="example"> <zeroOrMore> <attribute> <difference> <anyName/> <name>xml:space</name> </difference> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional> </element>]]></programlisting> <para>Note that the <literal>define</literal> element cannot contain a name-class; it can only contain a pattern.</para> </section> <section> <title>Cross references</title> <para>RELAX NG generalizes the ID/IDREF feature of XML. A <literal>data</literal> pattern may have a <literal>key</literal> or a <literal>keyRef</literal> attribute. A <literal>data</literal> pattern with a <literal>key</literal> attribute behaves like an XML ID; a <literal>data</literal> pattern with a <literal>keyRef</literal> attribute type behaves like an XML IDREF. Whereas XML has a single symbol-space of IDs and IDREFs, RELAX NG has an unlimited number of named symbol-spaces. The value of the <literal>key</literal> or <literal>keyRef</literal> is an unprefixed name identifying the symbol-space. An element or attribute that matches a <literal>data</literal> pattern with a <literal>key</literal> attribute is called a <emphasis>key</emphasis>; an element or attribute that matches a <literal>data</literal> pattern with a <literal>keyRef</literal> attribute is called a <emphasis>key-reference</emphasis>. A document is invalid if it has two distinct keys in the same symbol-space with same value; it is also invalid if it contains a key-reference that does not have a corresponding key in the same symbol-space in the same document with the same value.</para> <para>Whereas in XML IDs and IDREFs must be names, in RELAX NG keys and key-references may have any datatype; whether an element or attribute is a key or key-reference is orthogonal to its datatype.The values of keys and key-references are compared using the datatype specified by the <literal>data</literal> pattern. All <literal>data</literal> patterns sharing the same symbol space must specify the same value for the <literal>type</literal> attribute.</para> <para>For example, suppose a document contains <literal>termref</literal> elements referencing defined terms:</para> <programlisting><![CDATA[<element name="termref"> <data type="token" keyRef="term"/> </element>]]></programlisting> <para>For each such defined term, there is a corresponding <literal>dt</literal>, <literal>dd</literal> pair in a <literal>glossary</literal> element:</para> <programlisting><![CDATA[<element name="glossary"> <zeroOrMore> <element name="dt"> <data type="token" key="term"/> </element> <element name="dd"> <text/> </element> </zeroOrMore> </element>]]></programlisting> <para>The above example is using the builtin <literal>token</literal> datatype introduced in the <ulink url="#Enumerations">Enumerations</ulink> section.</para> <para>It must be possible to determine for any element or attribute whether it is a key or key reference and, if so, the symbol space of the key or key reference, by examining just the name of the element or attribute and the names of the ancestors of that element or attribute. For example, it is not permitted to have the pattern:</para> <programlisting><![CDATA[<element name="bad"> <choice> <data type="string" key="x"/> <data type="string" key="y"/> </choice> </element>]]></programlisting> </section> <section> <title>Annotations</title> <para>If a RELAX NG element has an attribute or child element with a namespace URI other than the RELAX NG namespace, then that attribute or element is ignored. Thus, you can add annotations to RELAX NG patterns simply by using an attribute or element in a separate namespace:</para> <programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9" xmlns:a="http://www.example.com/annotation"> <zeroOrMore> <element name="card"> <a:documentation>Information about a single email address.</a:documentation> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></programlisting> </section> <section> <title>Nested grammars</title> <para>There is no prohibition against nesting grammar patterns. A <literal>ref</literal> pattern refers to a definition from nearest <literal>grammar</literal> ancestor. There is also a <literal>parentRef</literal> element that escapes out of the current grammar and references a definition from the parent of the current grammar.</para> <para>Imagine the problem of writing a pattern for tables. The pattern for tables only cares about the structure of tables; it doesn't care about what goes inside a table cell. First, we create a RELAX NG pattern <literal>table.rng</literal> as follows:</para> <programlisting><![CDATA[<grammar> <define name="cell.content"> <notAllowed/> </define> <start> <element name="table"> <oneOrMore> <element name="tr"> <oneOrMore> <element name="td"> <ref name="cell.content"/> </element> </oneOrMore> </element> </oneOrMore> </element> </start> </grammar>]]></programlisting> <para>Patterns that include <literal>table.rng</literal> must redefine <literal>cell.content</literal>. By using a nested <literal>grammar</literal> pattern containing a <literal>parentRef</literal> pattern, the including pattern can redefine <literal>cell.content</literal> to be a pattern defined in the including pattern's grammar, thus effectively importing a pattern from the parent grammar into the child grammar:</para> <programlisting><![CDATA[<grammar> <start> <element name="doc"> <zeroOrMore> <choice> <element name="p"> <ref name="inline"/> </element> <grammar> <include href="table.rng"> <define name="cell.content"> <parentRef name="inline"/> </define> </include> </grammar> </choice> </zeroOrMore> </element> </start> <define name="inline"> <zeroOrMore> <choice> <text/> <element name="em"> <ref name="inline"/> </element> </choice> </zeroOrMore> </define> </grammar>]]></programlisting> <para>Of course, in a trivial case like this, there is no advantage in nesting the grammars: we could simply have have included <literal>table.rng</literal> within the outer <literal>grammar</literal> element. However, when the included grammar has many definitions, nesting it avoids the possibility of name conflicts between the including grammar and the included grammar.</para> </section> <section> <title>Non-restrictions</title> <para>RELAX NG does not require patterns to be "deterministic" or "unambiguous".</para> <para>Suppose we wanted to write the email address book in HTML, but use class attributes to specify the structure.</para> <programlisting><![CDATA[<element name="html"> <element name="head"> <element name="title"> <text/> </element> </element> <element name="body"> <element name="table"> <attribute name="class"> <value>addressBook</value> </attribute> <oneOrMore> <element name="tr"> <attribute name="class"> <value>card</value> </attribute> <element name="td"> <attribute name="class"> <value>name</value> </attribute> <interleave> <text/> <optional> <element name="span"> <attribute name="class"> <value>givenName</value> </attribute> <text/> </element> </optional> <optional> <element name="span"> <attribute name="class"> <value>familyName</value> </attribute> <text/> </element> </optional> </interleave> </element> <element name="td"> <attribute name="class"> <value>email</value> </attribute> <text/> </element> </element> </oneOrMore> </element> </element> </element>]]></programlisting> <para>This would match a XML document such as:</para> <programlisting><![CDATA[<html> <head> <title>Example Address Book</title> </head> <body> <table class="addressBook"> <tr class="card"> <td class="name"> <span class="givenName">John</span> <span class="familyName">Smith</span> </td> <td class="email">js@example.com</td> </tr> </table> </body> </html>]]></programlisting> <para>but not</para> <programlisting><![CDATA[<html> <head> <title>Example Address Book</title> </head> <body> <table class="addressBook"> <tr class="card"> <td class="name"> <span class="givenName">John</span>  <span class="givenName">Smith</span> </td> <td class="email">js@example.com</td> </tr> </table> </body> </html>]]></programlisting> </section> <section> <title>Non-features</title> <para>The role of RELAX NG is simply to specify a class of documents, not to assist in interpretation of the documents belonging to the class. It does not change the infoset of the document. In particular, RELAX NG</para> <itemizedlist> <listitem><para>does not allow defaults for attributes to be specified</para></listitem> <listitem><para>does allow entities to be specified</para></listitem> <listitem><para>does allow notations to be specified</para></listitem> <listitem><para>does not specify whether white-space is significant</para></listitem> </itemizedlist> <para>Also RELAX NG does not define a way for an XML document to associate itself with a RELAX NG pattern.</para> </section> <section> <title>Differences from TREX</title> <orderedlist> <listitem><para>the <literal>concur</literal> pattern has been removed</para></listitem> <listitem><para>the <literal>string</literal> pattern has been replaced by the <literal>value</literal> pattern</para></listitem> <listitem><para>the <literal>anyString</literal> pattern has been renamed to <literal>text</literal></para></listitem> <listitem><para>the namespace URI is different</para></listitem> <listitem><para>pattern elements must be namespace qualified</para></listitem> <listitem><para>anonymous datatypes have been removed</para></listitem> <listitem><para>the <literal>data</literal> pattern can have parameters specified by <literal>param</literal> child elements</para></listitem> <listitem><para>the <literal>list</literal> pattern has been added for matching whitespace-separated lists of tokens</para></listitem> <listitem><para>the <literal>data</literal> pattern can have a <literal>key</literal> or <literal>keyRef</literal> attribute</para></listitem> <listitem><para>the <literal>replace</literal> and <literal>group</literal> values for the <literal>combine</literal> attribute have been removed</para></listitem> <listitem><para>an <literal>include</literal> element in a grammar may contain <literal>define</literal> elements that replace included definitions</para></listitem> <listitem><para>an <literal>include</literal> element occurring as a pattern has been renamed to <literal>externalRef</literal>; an <literal>include</literal> element is now allowed only as a child of the <literal>grammar</literal> element</para></listitem> <listitem><para>the <literal>parent</literal> attribute on the <literal>ref</literal> element has been replaced by a new <literal>parentRef</literal> element</para></listitem> <listitem><para>the <literal>ns</literal> attribute has been renamed to <literal>namespace</literal></para></listitem> <listitem><para>the <literal>nsName</literal> element has been renamed to <literal>namespaceName</literal></para></listitem> <listitem><para>the <literal>type</literal> attribute of the <literal>data</literal> element is an unqualified name; the <literal>data</literal> element uses the <literal>datatypeNamespace</literal> attribute rather than the <literal>ns</literal> attribute to identify the namespace of the datatype</para></listitem> </orderedlist> </section> </article>

Follow-Ups:

Re: Updated tutorial
- From: James Clark <jjc@jclark.com>
Re: Updated tutorial
- From: James Clark <jjc@jclark.com>

relax-ng message

RELAX NG Tutorial

Working Draft 1 June 2001

Abstract

Status of this Document

Table of Contents

1. Getting started

2. Choice

3. Attributes

4. Named patterns

5. Datatyping

6. Enumerations

7. Lists

8. Interleaving

9. Modularity

9.1. Referencing external patterns

9.2. Merging grammars

10. Namespaces

10.1. Using the namespace attribute

10.2. Qualified names

11. Name classes

12. Cross references

13. Annotations

14. Nested grammars

15. Non-restrictions

16. Non-features

17. Differences from TREX

10.1. Using the `namespace` attribute