[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Tutorial
Attached is a RELAX NG tutorial based on my TREX tutorial. Send comments to the list, or, if they're just typos, to me. JamesTitle: RELAX NG tutorial
Copyright © 2001 OASIS
RELAX NG is a simple schema language for XML, based on RELAX and TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema thus identifies a class of XML documents consisting of those documents that match the pattern. A RELAX NG schema is itself an XML document.
ns
attributeConsider a simple XML representation of an email address book:
<addressBook> <card> <name>John Smith</name> <email>js@example.com</email> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card> </addressBook>
The DTD would be as follows:
<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card (name, email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> ]>
A RELAX NG pattern for this could be written as follows:
<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>
If the addressBook
is required to be non-empty, then
we can use oneOrMore
instead of
zeroOrMore
:
<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1"> <oneOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </oneOrMore> </element>
Now let's change it to allow each card
to have an
optional note
element.
<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>
Note that the text
pattern matches arbitrary text,
including empty text. Note also that whitespace separating tags is
ignored when matching against a pattern.
All the elements specifying the pattern must be namespace qualified by the namespace URI:
http://relaxng.org/main/ns/0.1
The examples above use a default namespace declaration
xmlns="http://relaxng.org/main/ns/0.1"
for this. A
namespace prefix is equally acceptable:
<rng:element name="addressBook" xmlns:rng="http://relaxng.org/main/ns/0.1"> <rng:zeroOrMore> <rng:element name="card"> <rng:element name="name"> <rng:text/> </rng:element> <rng:element name="email"> <rng:text/> </rng:element> </rng:element> </rng:zeroOrMore> </rng:element> </rng:div>
For the remainder of this document, the default namespace declaration will be left out of examples.
Now suppose we want to allow the name
to be broken
down into a givenName
and a familyName
,
allowing an addressBook
like this:
<addressBook> <card> <givenName>John</givenName> <familyName>Smith</familyName> <email>js@example.com</name> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card> </addressBook>
We can use the following pattern:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <group> <element name="givenName"> <text/> </element> <element name="familyName"> <text/> </element> </group> </choice> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>
This corresponds to the following DTD:
<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card ((name | (givenName, familyName)), email, note?)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT givenName (#PCDATA)> <!ELEMENT familyName (#PCDATA)> <!ELEMENT note (#PCDATA)> ]>
Suppose we want the card
element to have attributes
rather than child elements. The DTD might look like this
<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card EMPTY> <!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED> ]>
Just change each element
pattern to an
attribute
pattern:
<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore> </element>
In XML, the order of attributes is traditionally not significant. RELAX NG follows this tradition. The above pattern would match both
<card name="John Smith" email="js@example.com"/>
and
<card email="js@example.com" name="John Smith"/>
In contrast, the order of elements is significant. The pattern
<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element>
would not match:
<card><email>js@example.com</email><name>John Smith</name></card>
Note that an attribute
element by itself indicates a
required attribute, just as an element
element by itself
indicates a required element. To specify an optional attribute, use
optional
just as with element
:
<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> <optional> <attribute name="note"> <text/> </attribute> </optional> </element> </zeroOrMore> </element>
The group
and choice
patterns can be
applied to attribute
elements in the same way they are
applied to element
patterns. For example, if we wanted
to allow either a name
attribute or both a
givenName
and a familyName
attribute, we can
specify this in the same way that we would if we were using
elements:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <attribute name="name"> <text/> </attribute> <group> <attribute name="givenName"> <text/> </attribute> <attribute name="familyName"> <text/> </attribute> </group> </choice> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore> </element>
There are no restrictions on how element
elements and
attribute
elements can be combined. For example, the
following pattern would allow a choice of elements and attributes
independently for both the name
and the
email
part of a card
:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <attribute name="name"> <text/> </attribute> </choice> <choice> <element name="email"> <text/> </element> <attribute name="email"> <text/> </attribute> </choice> </element> </zeroOrMore> </element>
As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:
<card name="John Smith" email="js@example.com"/> <card email="js@example.com" name="John Smith"/> <card email="js@example.com"><name>John Smith</name></card> <card name="John Smith"><email>js@example.com</email></card> <card><name>John Smith</name><email>js@example.com</email></card>
However, it would not match
<card><email>js@example.com</email><name>John Smith</name></card>
because the pattern for card
requires any
email
child element to follow any name
child
element.
There is one difference between attribute
and
element
patterns: <text/>
is the default for the content of an attribute
pattern,
whereas an element
pattern is not allowed to be
empty. For example,
<attribute name="email"/>
is short for
<attribute name="email"> <text/> </attribute>
It might seem natural that
<element name="x"/>
matched an x
element with no attributes and no
content. However, this would make the meaning of empty content
inconsistent between the element
pattern and the
attribute
pattern, so RELAX NG does not allow the
element
pattern to be empty. A pattern that matches an
element with no attributes and no children must use
<empty/>
explicitly:
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="prefersHTML"> <empty/> </element> </optional> </element> </zeroOrMore> </element>
For a non-trivial RELAX NG pattern, it is often convenient to be able to give names to parts of the pattern. Instead of
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>
we can write
<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <ref name="cardContent"/> </element> </zeroOrMore> </element> </start> <define name="cardContent"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </define> </grammar>
A grammar
element has a single start
child element, and zero or more define
child elements.
The start
and define
elements contain
patterns. These patterns can contain ref
elements that
refer to patterns defined by any of the define
elements
in that grammar
element. A grammar
pattern
is matched by matching the pattern contained in the start
element.
We can use the grammar
element to write patterns in a
style similar to DTDs:
<grammar> <start> <ref name="AddressBook"/> </start> <define name="AddressBook"> <element name="addressBook"> <zeroOrMore> <ref name="Card"/> </zeroOrMore> </element> </define> <define name="Card"> <element name="card"> <ref name="Name"/> <ref name="Email"/> </element> </define> <define name="Name"> <element name="name"> <text/> </element> </define> <define name="Email"> <element name="email"> <text/> </element> </define> </grammar>
Recursive references are allowed. For example
<define name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional> <attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> </zeroOrMore> </define>
However, recursive references must be within an
element
. Thus, the following is not
allowed:
<define name="inline"> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional> <attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> <optional> <ref name="inline"/> </optional> </define>
A start
element may also have a name
attribute. This is a shorthand for a define
with that
name
together with a start
element
referencing that definition. For example
<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> </choice> </zeroOrMore> </start> </grammar>
is short for
<grammar> <start> <ref name="inline"/> </start> <define name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> </choice> </zeroOrMore> </define> </grammar>
RELAX NG does not have any system of datatypes built in. Rather it expects to partner with a datatyping vocabulary, such as Part 2 of the W3C's XML Schema language. RELAX NG implementations may differ in the datatyping vocabularies they support. You must pick a datatyping vocabulary that is supported by the implementation you plan to use.
The data
pattern matches a string that represents a
value of a named datatype. The type
attribute contains
the qualified name of the datatype. For example, if a RELAX NG
implementation supported the built-in datatypes of the W3C's XML
Schema Language, you could use:
<element name="number" xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes"> <data type="xsd:integer"/> </element>
The data
can use an ns
attribute to
specify explicitly the namespace URI of the datatype, instead of using
a prefix within the value of the type
attribute.
<element name="number"> <data type="integer" ns="http://www.w3.org/2001/XMLSchema-datatypes"/> </element>
If the children of an element or an attribute match a
data
pattern, then complete content of the element or
attribute must match that data
pattern. It is not
permitted to have a pattern which allows part of the content to match
a data
pattern, and another part to match another
pattern. For example, the following pattern is not
allowed:
<element name="bad"> <data type="xsd:int"/> <element name="note"> <text/> </element> </element>
However, this would be fine:
<element name="ok"> <data type="xsd:int"/> <attribute name="note"> <text/> </attribute> </element>
Note that this restriction does not apply to the
text
pattern.
Datatypes may have parameters. For example, a string datatype may
have a parameter controlling the length of the string. The parameters
applicable to any particular datatype are determined by the datatyping
vocabulary. Parameters are specified by adding one or more
param
elements as children of the data
element. For example, the following constrains the email
element to contain a string at most 127 characters long:
<element name="email"> <data type="xsd:string"> <param name="maxLength">127</param> </data> </element>
Many markup vocabularies have attributes whose value is constrained
to be one of set of specified values. The value
pattern
matches a string that has a specified value. For example,
<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </attribute> </element>
allows the preferredFormat
attribute to have the value
html
or text
. This corresponds to the
DTD
<!DOCTYPE card [ <!ELEMENT card EMPTY> <!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED preferredFormat (html|text) #REQUIRED> ]>
The value
pattern is not restricted to attribute
values. For example, the following is allowed:
<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <element name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </element> </element>
The prohibition against a data
pattern's matching
only part of the content of an element also applies to
value
patterns.
By default, the value
pattern will consider the string
in the pattern to match the string in the document if the two strings
are the same after the whitespace in both strings is normalized.
Whitespace normalization strips leading and trailing white-space
characters, and collapses sequences of one or more white-space
characters to a single space character. This corresponds to the
behaviour of an XML parser for an attribute that is declared as other
than CDATA. Thus the above pattern will match any of
<card name="John Smith" email="js@example.com" preferredFormat="html"/>
<card name="John Smith" email="js@example.com" prefersFormat=" html "/>
The way that the value
pattern compares the pattern
string with the document string can be controlled by specifying a
type
attribute specifying a datatype. The
type
attribute contains a qualified name identifying the
datatype. The pattern string matches the document string if they both
represent the same value of the specified datatype. Thus, whereas the
data
pattern matches an arbitrary value of a datatype,
the value
pattern matches a specific value of a
datatype.
RELAX NG provides two builtin datatypes that are useful with the
value
pattern. These datatypes are specified by using an
unprefixed name as the value of the type
attribute. The
two builtin datatypes are string
and token
.
The builtin datatype token
corresponds to the default
comparison behavior of the value
pattern. The builtin
datatype string
compares strings without any
normalization (other than that performed by XML). For example,
<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value type="string">html</value> <value type="string">text</value> </choice> </attribute> </element>
will not match
<card name="John Smith" email="js@example.com" prefersHTML=" html "/>
The oneOrMoreTokens
and zeroOrMoreTokens
patterns match a whitespace-separated sequence of tokens; they each
contain a pattern that the individual tokens must match. For example,
the extension-element-prefixes
attribute in XSLT contains
a whitespace-separated list of zero or more namespace prefixes, where
each namespace prefix is either an NCName or the special value
#default
:
<attribute name="extension-element-prefixes"> <zeroOrMoreTokens> <choice> <data type="xsd:NCName"/> <value>#default</value> </choice> </zeroOrMoreTokens> </attribute>
The oneOrMoreTokens
and zeroOrMoreTokens
patterns must not contain element
or
attribute
patterns.
The interleave
pattern allows child elements to occur
in any order. For example, the following would allow the
card
element to contain the name
and
email
elements in any order:
<element name="addressBook"> <zeroOrMore> <element name="card"> <interleave> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </interleave> </element> </zeroOrMore> </element>
The pattern is called interleave
because of how it
works with patterns that match more than one element. Suppose we want
to write a pattern for the HTML head
element which
requires exactly one title
element, at most one
base
element and zero or more style
,
script
, link
and meta
elements
and suppose we are writing a grammar
pattern that has one
definition for each element. Then we could define the pattern for
head
as follows:
<define name="head"> <element name="head"> <interleave> <ref name="title"/> <optional> <ref name="base"/> </optional> <zeroOrMore> <ref name="style"/> </zeroOrMore> <zeroOrMore> <ref name="script"/> </zeroOrMore> <zeroOrMore> <ref name="link"/> </zeroOrMore> <zeroOrMore> <ref name="meta"/> </zeroOrMore> </interleave> </element> </define>
Suppose we had a head
element that contained a
meta
element, followed by a title
element,
followed by a meta
element. This would match the pattern
because it is an interleaving of a sequence of two meta
elements, which match the child pattern
<zeroOrMore> <ref name="meta"/> </zeroOrMore>
and a sequence of one title
element, which matches
the child pattern
<ref name="title"/>
The semantics of the interleave
pattern are that a
sequence of elements matches an interleave
pattern if it
is an interleaving of sequences that match the child patterns of the
interleave
pattern. Note that this is different from the
&
connector in SGML: A* & B
matches
the sequence of elements A A B
or the sequence of
elements B A A
but not the sequence of elements A B
A
.
One special case of interleave
is very common:
interleaving <text/>
with a pattern
p represents a pattern that matches what p
matches but also allows characters to occur as children. The
mixed
element is a shorthand for this.
<mixed> p </mixed>
is short for
<interleave> <text/> p </interleave>
The include
element can be used to allow a pattern to
be divided amongst multiple files. The include
element
has a required href
attribute that specifies the URL of a
file to be included in place of the include
element.
The include
element can be used as a pattern. In this
case, it will match if the pattern contained in the specified URL
matches. Suppose for example, you have a RELAX NG pattern that matches
HTML inline content stored in inline.rng
:
<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="code"> <ref name="inline"/> </element> <element name="em"> <ref name="inline"/> </element> <!-- etc --> </choice> </zeroOrMore> </start> </grammar>
Then we could allow the note
element to contain
inline HTML markup by using include
as follows:
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <include href="inline.rng"/> </element> </optional> </element> </zeroOrMore> </element>
For another example, suppose you have two RELAX NG patterns stored in
files pattern1.rng
and pattern2.rng
. Then
the following is a pattern that which match anything matched
by one of those patterns:
<choice> <include href="pattern1.rng"/> <include href="pattern2.rng"/> </choice>
The include
element is also allowed as a child of a
grammar
pattern. In this case the specified URL must
contain a grammar
pattern, and the included
grammar
will be merged with the including
grammar
.
Normally, duplicate definitions (two definitions with the same
name) result in an error. However, define
elements may
be put inside the include
element to indicate that they
are to replace definitions in the included grammar
pattern.
Suppose the file addressBook.rng
contains the
following grammar pattern:
<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <empty/> </define> </grammar>
Another pattern could customize addressBook.rng
as
follows:
<grammar> <include href="addressBook.rng"> <define name="card.local"> <optional> <element name="note"> <text/> </element> </optional> </define> </include> </grammar>
This would be equivalent to:
<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <optional> <element name="note"> <text/> </element> </optional> </define> </grammar>
which is equivalent to
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>
It is also possible to combine together duplicate definitions from
separate files by adding a combine
attribute to the
define
elements. The combine
attribute
specifies how the definitions should be combined; it may have the
value choice
or interleave
. For example, we
could have written our customization as:
<grammar> <include href="addressBook.rng"/> <define name="card.local" combine="choice"> <!-- no optional element needed this time --> <element name="note"> <text/> </element> </define> </grammar>
This would be equivalent to:
<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <choice> <empty/> <element name="note"> <text/> </element> </choice> </define> </grammar>
This has the same meaning as before, since an optional pattern is equivalent to a choice between the pattern and empty.
We could also have used combine="interleave"
here:
<grammar> <include href="addressBook.rng"/> <define name="card.local" combine="interleave"> <optional> <element name="note"> <text/> </element> </optional> </define> </grammar>
This would be equivalent to:
<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <interleave> <empty/> <optional> <element name="note"> <text/> </element> </optional> </interleave> </define> </grammar>
This has the same meaning as before, since adding an
empty
pattern to the content of a interleave
pattern does not make any difference to what the
interleave
pattern matches.
@@@ Add example of combine="interleave" with attributes.
The notAllowed
pattern never matches anything. Just
as adding empty
to a group
makes no
difference, so adding notAllowed
to a choice
makes no difference. It is typically used in a definition that is
referenced in a choice
element to allow an including
pattern to specify additional choices. For example, suppose a RELAX NG
pattern inline.rng
provides a pattern for inline
content, which allows bold
and italic
elements arbitrarily nested:
<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <ref name="local.inline"/> </choice> </zeroOrMore> </start> <define name="local.inline"> <notAllowed/> </define> </grammar>
Another RELAX NG pattern could use inline.rng
and add
code
and em
to the set of inline elements as
follows:
<grammar> <include href="inline.rng"> <define name="local.inline"> <choice> <element name="code"> <ref name="inline"> </element> <element name="em"> <ref name="inline"> </element> </choice> </define> </include> <start> <element name="doc"> <zeroOrMore> <element name="p"> <ref name="inline"/> </element> </zeroOrMore> </element> </start> </grammar>
We could instead have used combine="choice"
. In this
case, inline.rng
would need to separate out the choices
as a separate definition:
<grammar> <start name="inline"> <zeroOrMore> <ref name="inline.class"/> </zeroOrMore> </start> <define name="inline.class"> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> </choice> </define> </grammar>
and the customization would add to those choices:
<grammar> <include href="inline.rng"/> <start> <element name="doc"> <zeroOrMore> <element name="p"> <ref name="inline"/> </element> </zeroOrMore> </element> </start> <define name="inline.class" combine="choice"> <choice> <element name="code"> <ref name="inline"> </element> <element name="em"> <ref name="inline"> </element> </choice> </define> </grammar>
RELAX NG is namespace-aware. Thus, it considers an element or attribute to have both a local name and a namespace URI which together constitute the name of that element or attribute.
ns
attribute
The element
pattern uses an ns
attribute
to specify the namespace URI of the elements that it matches. For
example
<element name="foo" ns="http://www.example.com"> <empty/> </element>
would match any of
<foo xmlns="http://www.example.com"/>
<e:foo xmlns:e="http://www.example.com"/>
<example:foo xmlns:example="http://www.example.com"/>
but not any of
<foo/>
<e:foo xmlns:e="http://WWW.EXAMPLE.COM"/>
<example:foo xmlns:example="http://www.example.net"/>
A value of an empty string for the ns
attribute
indicates a null or absent namespace URI (just as with the
xmlns
attribute). Thus, the pattern
<element name="foo" ns=""> <empty/> </element>
matches any of
<foo xmlns=""/>
<foo/>
but not any of
<foo xmlns="http://www.example.com"/>
<e:foo xmlns:e="http://www.example.com"/>
It is tedious and error-prone to specify the ns
attribute on every element
, so RELAX NG allows it to be
defaulted. If an element
pattern does not specify an
ns
attribute, then it defaults to the value of the
ns
attribute of the nearest ancestor that has an
ns
attribute, or the empty string if there is no such
ancestor. Thus
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>
is equivalent to
<element name="addressBook" ns=""> <zeroOrMore> <element name="card" ns=""> <element name="name" ns=""> <text/> </element> <element name="email" ns=""> <text/> </element> </element> </zeroOrMore> </element>
and
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>
is equivalent to
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <element name="name" ns="http://www.example.com"> <text/> </element> <element name="email" ns="http://www.example.com"> <text/> </element> </element> </zeroOrMore> </element>
The attribute
pattern also takes an ns
attribute. However, there is a difference in how it defaults. This
is because of the fact that the XML Namespaces Recommendation does not
apply the default namespace to attributes. If an ns
attribute is not specified on the attribute
pattern, then
it defaults to the empty string. Thus
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <attribute name="name"/> <attribute name="email"/> </element> </zeroOrMore> </element>
is equivalent to
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <attribute name="name" ns=""/> <attribute name="email" ns=""/> </element> </zeroOrMore> </element>
and so will match
<addressBook xmlns="http://www.example.com"> <card name="John Smith" email="js@example.com"/> </addressBook>
or
<example:addressBook xmlns:example="http://www.example.com"> <example:card name="John Smith" email="js@example.com"/> </example:addressBook>
but not
<example:addressBook xmlns:example="http://www.example.com"> <example:card example:name="John Smith" example:email="js@example.com"/> </example:addressBook>
To match this last example, the attribute
patterns
must specify global="true"
:
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <attribute name="name" global="true"/> <attribute name="email" global="true"/> </element> </zeroOrMore> </element>
This is equivalent to:
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <attribute name="name" ns="http://www.example.com"/> <attribute name="email" ns="http://www.example.com"/> </element> </zeroOrMore> </element>
Thus, specifying global="true"
on an
attribute
pattern makes the ns
attribute
default in the same way that it does on an element
pattern.
The ns
attribute is allowed on any element in a RELAX NG
pattern. The global
attribute is allowed only on an
attribute
pattern.
When a pattern matches elements and attributes from multiple
namespaces, using the ns
attribute would require
repeating namespace URIs in different places in the pattern. This is
error-prone and hard to maintain, so RELAX NG also allows the
element
and attribute
patterns to use a
prefix in the value of the name
attribute to specify the
namespace URI. In this case, the prefix specifies the namespace URI to
which that prefix is bound by the namespace declarations in scope on
the element
or attribute
pattern. Thus
<element name="e:addressBook" xmlns:e="http://www.example.com"> <zeroOrMore> <element name="e:card"> <element name="e:name"> <text/> </element> <element name="e:email"> <text/> </element> </element> </zeroOrMore> </element>
is equivalent to
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <element name="name" ns="http://www.example.com"> <text/> </element> <element name="email" ns="http://www.example.com"> <text/> </element> </element> </zeroOrMore> </element>
If a prefix is specified in the value of the name
attribute of an element
or attribute
pattern, then that prefix determines the namespace URI of the elements
or attributes that will be matched by that pattern, regardless of
the value of any ns
attribute.
Note that the XML default namespace (as specified by the
xmlns
attribute) is not used in determining the namespace
URI of elements and attributes that element
and
attribute
patterns match.
Normally, the name of the element to be matched by an
element
element is specified by a name
attribute. An element
element can instead start with an
element specifying a name-class. In this case, the
element
pattern will only match an element if the name of
the element is a member of the name-class. The simplest name-class is
anyName
, which any name at all is a member of, regardless
of its local name and its namespace URI. For example, the following
pattern matches any well-formed XML document:
<grammar> <start name="anyElement"> <element> <anyName/> <zeroOrMore> <choice> <attribute> <anyName/> </attribute> <text/> <ref name="anyElement"/> </choice> </zeroOrMore> </element> </start> </grammar>
The nsName
name-class contains any name with the
namespace URI specified by the ns
attribute, which
defaults in the same way as the ns
attribute on the
element
pattern.
The choice
name-class matches any name that is a
member of any of its child name-classes.
The not
name-classes contains any name that is not
a member of the child name-class.
For example
<element name="card" ns="http://www.example.com"> <zeroOrMore> <attribute> <not> <choice> <nsName/> <nsName ns=""/> </choice> </not> </attribute> </zeroOrMore> <text/> </element>
would allow the card
element to have any number of
namespace-qualified attributes provided that they were qualified with
namespace other than that of the card
element.
Note that an attribute
pattern matches a single
attribute even if it has a name-class that contains multiple names.
To match zero or more attributes, the zeroOrMore
element
must be used.
The difference
name-class contains any name that is a
member of the first child name-class, but not a member of any of the
following name-classes. The not
name-class is, in
fact, a shorthand for difference
:
<not> name-class </not>
is short for
<difference> <anyName/> name-class </difference>
The name
name-class contains a single name. The
content of the name
element specifies the name in the
same way as the name
attribute of the
element
pattern. The ns
attribute specifies
the namespace URI in the same way as the element
pattern.
Some schema languages have a concept of lax validation,
where an element or attribute is validated against a definition only
if there is one. We can implement this concept in RELAX NG with name
classes that uses difference
and name
.
Suppose, for example, we wanted to allow an element to have any
attribute with a qualified name, but we still wanted to ensure that if
there was an xml:space
attribute, it had the value
default
or preserve
. It wouldn't work to
use:
<element name="example"> <zeroOrMore> <attribute> <anyName/> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional> </element>
because an xml:space
attribute with a value
other than default
or preserve
would match
<attribute> <anyName/> </attribute>
even though it did not match
<attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute>
The solution is to use name
together with
difference
:
<element name="example"> <zeroOrMore> <attribute> <difference> <anyName/> <name>xml:space</name> </difference> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional> </element>
Note that the define
element cannot contain a
name-class; it can only contain a pattern.
RELAX NG generalizes the ID/IDREF feature of XML. A
data
pattern may have either a key
or a
keyRef
attribute. A data
pattern with a
key
attribute behaves like an XML ID; a data
pattern with a keyRef
attribute type behaves like an XML
IDREF. Whereas XML has a single symbol-space of IDs and IDREFs, RELAX
NG has an unlimited number of named symbol-spaces. The value of the
key
or keyRef
is an unprefixed name
identifying the symbol-space. An element or attribute that matches a
data
pattern with a key
attribute is called
a key; an element or attribute that matches a data
pattern with a keyRef
attribute is called a
key-reference. A document is invalid if it has two distinct
keys in the same symbol-space with same value; it is also invalid if
it contains a key-reference that does not have a corresponding key in
the same symbol-space in the same document with the same value.
Whereas in XML IDs and IDREFs must be names, in RELAX NG keys and
key-references may have any datatype; whether an element or attribute
is a key or key-reference is orthogonal to its datatype.The values of
keys and key-references are compared using the datatype specified by
the data
pattern. All data
patterns sharing
the same symbol space must specify the same value for the
type
attribute.
For example, suppose a document contains termref
elements referencing defined terms:
<element name="termref"> <data type="token" keyRef="term"/> </element>
For each such defined term, there is a corresponding
dt
, dd
pair in a glossary
element:
<element name="glossary"> <zeroOrMore> <element name="dt"> <data type="token" key="term"/> </element> <element name="dd"> <text/> </element> </zeroOrMore> </element>
The above example is using the builtin token
datatype
introduced in the Enumerations
section.
It must be possible to determine for any element or attribute whether it is a key or key reference and, if so, the symbol space of the key or key reference, by examining just the name of the element or attribute and the names of the ancestors of that element or attribute. For example, it is not permitted to have the pattern:
<element name="bad"> <choice> <data type="string" key="x"/> <data type="string" key="y"/> </choice> </element>
If a RELAX NG element has an attribute or child element with a namespace URI other than the RELAX NG namespace, then that attribute or element is ignored. Thus, you can add annotations to RELAX NG patterns simply by using an attribute or element in a separate namespace:
<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1" xmlns:a="http://www.example.com/annotation"> <zeroOrMore> <element name="card"> <a:documentation>Information about a single email address.</a:documentation> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>
There is no prohibition against nesting grammar patterns. A
ref
pattern refers to the definition from nearest
grammar
ancestor. However, by putting a
parent="true"
attribute on ref
, it is
possible to escape out of the current grammar and reference its parent
grammar.
Imagine the problem of writing a pattern for tables. The pattern
for tables only cares about the structure of tables; it doesn't care
about what goes inside a table cell. First, we create a RELAX NG pattern
table.rng
as follows:
<grammar> <define name="cell.content"> <notAllowed/> </define> <start> <element name="table"> <oneOrMore> <element name="tr"> <oneOrMore> <element name="td"> <ref name="cell.content"/> </element> </oneOrMore> </element> </oneOrMore> </element> </start> </grammar>
Patterns that include table.rng
must redefine
cell.content
. By using a nested grammar
pattern containing a ref
pattern with
parent="true"
, the including pattern can redefine
cell.content
to be a pattern defined in the including
pattern's grammar, thus effectively importing a pattern from the
parent grammar into the child grammar:
<grammar> <start> <element name="doc"> <zeroOrMore> <choice> <element name="p"> <ref name="inline"/> </element> <grammar> <include href="table.rng"/> <define name="cell.content" combine="replace"> <ref name="inline" parent="true"/> </define> </grammar> </choice> </zeroOrMore> </element> </start> <define name="inline"> <zeroOrMore> <choice> <text/> <element name="em"> <ref name="inline"/> </element> </choice> </zeroOrMore> </define> </grammar>
Of course, in a trivial case like this, there is no advantage in
nesting the grammars: we could simply have have included
table.rng
within the outer grammar
element.
However, when the included grammar has many definitions, nesting it
avoids the possibility of name conflicts between the including grammar
and the included grammar.
RELAX NG does not require patterns to be "deterministic" or "unambiguous".
Suppose we wanted to write the email address book in HTML, but use class attributes to specify the structure.
<element name="html"> <element name="head"> <element name="title"> <text/> </element> </element> <element name="body"> <element name="table"> <attribute name="class"> <value>addressBook</value> </attribute> <oneOrMore> <element name="tr"> <attribute name="class"> <value>card</value> </attribute> <element name="td"> <attribute name="class"> <value>name</value> </attribute> <interleave> <text/> <optional> <element name="span"> <attribute name="class"> <value>givenName</value> </attribute> <text/> </element> </optional> <optional> <element name="span"> <attribute name="class"> <value>familyName</value> </attribute> <text/> </element> </optional> </interleave> </element> <element name="td"> <attribute name="class"> <value>email</value> </attribute> <text/> </element> </element> </oneOrMore> </element> </element> </element>
This would match a XML document such as:
<html> <head> <title>Example Address Book</title> </head> <body> <table class="addressBook"> <tr class="card"> <td class="name"> <span class="givenName">John</span> <span class="familyName">Smith</span> </td> <td class="email">js@example.com</td> </tr> </table> </body> </html>
but not
<html> <head> <title>Example Address Book</title> </head> <body> <table class="addressBook"> <tr class="card"> <td class="name"> <span class="givenName">John</span> <!-- Note the incorrect class attribute --> <span class="givenName">Smith</span> </td> <td class="email">js@example.com</td> </tr> </table> </body> </html>
The role of RELAX NG is simply to specify a class of documents, not to assist in interpretation of the documents belonging to the class. It does not change the infoset of the document. In particular, RELAX NG
Also RELAX NG does not define a way for an XML document to associate itself with a RELAX NG pattern.
concur
pattern has been removed
string
pattern has been replaced by the
value
pattern
anyString
pattern has been renamed to
text
data
pattern can have parameters specified by
param
child elements
oneOrMoreTokens
and zeroOrMoreTokens
patterns have been added for matching whitespace-separated sequences
of tokens
data
pattern can have a key
or
keyRef
attribute
replace
and group
values for the
combine
attribute have been removed
include
element in a grammar may contain
define
elements that replace included definitions
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output encoding="iso-8859-1" method="html"/> <xsl:template match="/|*|@*|comment()"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="@xml:space"/> <xsl:template match="h1"> <xsl:copy-of select="."/> </xsl:template> <xsl:template match="h2|h3|h4"> <xsl:copy> <a name="{translate(.,' ','_')}"/> <xsl:number format="1.1" count="div" level="multiple"/> <xsl:text> </xsl:text> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="h2|h3|h4" mode="toc"> <xsl:apply-templates select="." mode="indent"/> <xsl:number format="1.1" count="div" level="multiple"/> <xsl:text> </xsl:text> <a href="#{translate(.,' ','_')}"> <xsl:apply-templates/> </a> <br/> </xsl:template> <xsl:template match="h2" mode="indent"></xsl:template> <xsl:template match="h3" mode="indent">  </xsl:template> <xsl:template match="h4" mode="indent">    </xsl:template> <xsl:template match="body"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates select="*[not(self::div)]"/> <div> <h2>Table of contents</h2> <xsl:apply-templates mode="toc" select="div//h2|div//h3|div//h4"/> </div> <xsl:apply-templates select="div"/> </xsl:copy> </xsl:template> <xsl:template match="p[@class='abstract']"> <div class="abstract"> <h2>Abstract</h2> <p><xsl:apply-templates/></p> </div> </xsl:template> </xsl:stylesheet>
<?xml version="1.0"?> <!-- $Id: tutorial.xml,v 1.9 2001/05/24 15:43:25 jjc Exp $ --> <?xml-stylesheet type="text/xsl" href="toc.xsl"?> <html xml:space="preserve"> <head> <title>RELAX NG tutorial</title> </head> <body> <h1>RELAX NG<br/> Tutorial</h1> <p class="author"><b>Editor: </b><br/>     James Clark <<a href="mailto:jjc@jclark.com">jjc@jclark.com</a>><br/> <b>Date:</b><br/>     2001-05-24</p> <p>Copyright © 2001 OASIS</p> <p class="abstract">RELAX NG is a simple schema language for XML, based on <a href="http://www.xml.gr.jp/relax/">RELAX</a> and <a href="http://www.thaiopensource.com/trex/">TREX</a>. A RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema thus identifies a class of XML documents consisting of those documents that match the pattern. A RELAX NG schema is itself an XML document.</p> <div> <h2>Getting started</h2> <p>Consider a simple XML representation of an email address book:</p> <pre><![CDATA[<addressBook> <card> <name>John Smith</name> <email>js@example.com</email> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card> </addressBook>]]></pre> <p>The DTD would be as follows:</p> <pre><![CDATA[<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card (name, email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> ]>]]></pre> <p>A RELAX NG pattern for this could be written as follows:</p> <pre><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>If the <code>addressBook</code> is required to be non-empty, then we can use <code>oneOrMore</code> instead of <code>zeroOrMore</code>:</p> <pre><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1"> <oneOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </oneOrMore> </element>]]></pre> <p>Now let's change it to allow each <code>card</code> to have an optional <code>note</code> element.</p> <pre><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>]]></pre> <p>Note that the <code>text</code> pattern matches arbitrary text, including empty text. Note also that whitespace separating tags is ignored when matching against a pattern.</p> <p>All the elements specifying the pattern must be namespace qualified by the namespace URI:</p> <pre>http://relaxng.org/main/ns/0.1</pre> <p>The examples above use a default namespace declaration <code>xmlns="http://relaxng.org/main/ns/0.1"</code> for this. A namespace prefix is equally acceptable:</p> <pre><![CDATA[<rng:element name="addressBook" xmlns:rng="http://relaxng.org/main/ns/0.1"> <rng:zeroOrMore> <rng:element name="card"> <rng:element name="name"> <rng:text/> </rng:element> <rng:element name="email"> <rng:text/> </rng:element> </rng:element> </rng:zeroOrMore> </rng:element> </rng:div>]]></pre> <p>For the remainder of this document, the default namespace declaration will be left out of examples.</p> </div> <div> <h2>Choice</h2> <p>Now suppose we want to allow the <code>name</code> to be broken down into a <code>givenName</code> and a <code>familyName</code>, allowing an <code>addressBook</code> like this:</p> <pre><![CDATA[<addressBook> <card> <givenName>John</givenName> <familyName>Smith</familyName> <email>js@example.com</name> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card> </addressBook>]]></pre> <p>We can use the following pattern:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <group> <element name="givenName"> <text/> </element> <element name="familyName"> <text/> </element> </group> </choice> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>]]></pre> <p>This corresponds to the following DTD:</p> <pre><![CDATA[<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card ((name | (givenName, familyName)), email, note?)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT givenName (#PCDATA)> <!ELEMENT familyName (#PCDATA)> <!ELEMENT note (#PCDATA)> ]>]]></pre> </div> <div> <h2>Attributes</h2> <p>Suppose we want the <code>card</code> element to have attributes rather than child elements. The DTD might look like this</p> <pre><![CDATA[<!DOCTYPE addressBook [ <!ELEMENT addressBook (card*)> <!ELEMENT card EMPTY> <!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED> ]>]]></pre> <p>Just change each <code>element</code> pattern to an <code>attribute</code> pattern:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore> </element>]]></pre> <p>In XML, the order of attributes is traditionally not significant. RELAX NG follows this tradition. The above pattern would match both</p> <pre><![CDATA[<card name="John Smith" email="js@example.com"/>]]></pre> <p>and</p> <pre><![CDATA[<card email="js@example.com" name="John Smith"/>]]></pre> <p>In contrast, the order of elements is significant. The pattern</p> <pre><![CDATA[<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element>]]></pre> <p>would <em>not</em> match:</p> <pre><![CDATA[<card><email>js@example.com</email><name>John Smith</name></card>]]></pre> <p>Note that an <code>attribute</code> element by itself indicates a required attribute, just as an <code>element</code> element by itself indicates a required element. To specify an optional attribute, use <code>optional</code> just as with <code>element</code>:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> <optional> <attribute name="note"> <text/> </attribute> </optional> </element> </zeroOrMore> </element>]]></pre> <p>The <code>group</code> and <code>choice</code> patterns can be applied to <code>attribute</code> elements in the same way they are applied to <code>element</code> patterns. For example, if we wanted to allow either a <code>name</code> attribute or both a <code>givenName</code> and a <code>familyName</code> attribute, we can specify this in the same way that we would if we were using elements:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <attribute name="name"> <text/> </attribute> <group> <attribute name="givenName"> <text/> </attribute> <attribute name="familyName"> <text/> </attribute> </group> </choice> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore> </element>]]></pre> <p>There are no restrictions on how <code>element</code> elements and <code>attribute</code> elements can be combined. For example, the following pattern would allow a choice of elements and attributes independently for both the <code>name</code> and the <code>email</code> part of a <code>card</code>:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <attribute name="name"> <text/> </attribute> </choice> <choice> <element name="email"> <text/> </element> <attribute name="email"> <text/> </attribute> </choice> </element> </zeroOrMore> </element>]]></pre> <p>As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:</p> <pre><![CDATA[<card name="John Smith" email="js@example.com"/> <card email="js@example.com" name="John Smith"/> <card email="js@example.com"><name>John Smith</name></card> <card name="John Smith"><email>js@example.com</email></card> <card><name>John Smith</name><email>js@example.com</email></card>]]></pre> <p>However, it would not match</p> <pre><![CDATA[<card><email>js@example.com</email><name>John Smith</name></card>]]></pre> <p>because the pattern for <code>card</code> requires any <code>email</code> child element to follow any <code>name</code> child element.</p> <p>There is one difference between <code>attribute</code> and <code>element</code> patterns: <code><![CDATA[<text/>]]></code> is the default for the content of an <code>attribute</code> pattern, whereas an <code>element</code> pattern is not allowed to be empty. For example,</p> <pre><![CDATA[<attribute name="email"/>]]></pre> <p>is short for</p> <pre><![CDATA[<attribute name="email"> <text/> </attribute>]]></pre> <p>It might seem natural that</p> <pre><![CDATA[<element name="x"/>]]></pre> <p>matched an <code>x</code> element with no attributes and no content. However, this would make the meaning of empty content inconsistent between the <code>element</code> pattern and the <code>attribute</code> pattern, so RELAX NG does not allow the <code>element</code> pattern to be empty. A pattern that matches an element with no attributes and no children must use <code><![CDATA[<empty/>]]></code> explicitly:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="prefersHTML"> <empty/> </element> </optional> </element> </zeroOrMore> </element>]]></pre> </div> <div> <h2>Named patterns</h2> <p>For a non-trivial RELAX NG pattern, it is often convenient to be able to give names to parts of the pattern. Instead of</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>we can write</p> <pre><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <ref name="cardContent"/> </element> </zeroOrMore> </element> </start> <define name="cardContent"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </define> </grammar>]]></pre> <p>A <code>grammar</code> element has a single <code>start</code> child element, and zero or more <code>define</code> child elements. The <code>start</code> and <code>define</code> elements contain patterns. These patterns can contain <code>ref</code> elements that refer to patterns defined by any of the <code>define</code> elements in that <code>grammar</code> element. A <code>grammar</code> pattern is matched by matching the pattern contained in the <code>start</code> element.</p> <p>We can use the <code>grammar</code> element to write patterns in a style similar to DTDs:</p> <pre><![CDATA[<grammar> <start> <ref name="AddressBook"/> </start> <define name="AddressBook"> <element name="addressBook"> <zeroOrMore> <ref name="Card"/> </zeroOrMore> </element> </define> <define name="Card"> <element name="card"> <ref name="Name"/> <ref name="Email"/> </element> </define> <define name="Name"> <element name="name"> <text/> </element> </define> <define name="Email"> <element name="email"> <text/> </element> </define> </grammar>]]></pre> <p>Recursive references are allowed. For example</p> <pre><![CDATA[<define name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional> <attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> </zeroOrMore> </define>]]></pre> <p>However, recursive references must be within an <code>element</code>. Thus, the following is <em>not</em> allowed:</p> <pre><![CDATA[<define name="inline"> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional> <attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> <optional> <ref name="inline"/> </optional> </define>]]></pre> <p>A <code>start</code> element may also have a <code>name</code> attribute. This is a shorthand for a <code>define</code> with that <code>name</code> together with a <code>start</code> element referencing that definition. For example</p> <pre><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> </choice> </zeroOrMore> </start> </grammar>]]></pre> <p>is short for</p> <pre><![CDATA[<grammar> <start> <ref name="inline"/> </start> <define name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> </choice> </zeroOrMore> </define> </grammar>]]></pre> </div> <div> <h2>Datatyping</h2> <p>RELAX NG does not have any system of datatypes built in. Rather it expects to partner with a datatyping vocabulary, such as Part 2 of the W3C's XML Schema language. RELAX NG implementations may differ in the datatyping vocabularies they support. You must pick a datatyping vocabulary that is supported by the implementation you plan to use.</p> <p>The <code>data</code> pattern matches a string that represents a value of a named datatype. The <code>type</code> attribute contains the qualified name of the datatype. For example, if a RELAX NG implementation supported the built-in datatypes of the W3C's XML Schema Language, you could use:</p> <pre><![CDATA[<element name="number" xmlns:xsd="http://www.w3.org/2001/XMLSchema-datatypes"> <data type="xsd:integer"/> </element>]]></pre> <p>The <code>data</code> can use an <code>ns</code> attribute to specify explicitly the namespace URI of the datatype, instead of using a prefix within the value of the <code>type</code> attribute.</p> <pre><![CDATA[<element name="number"> <data type="integer" ns="http://www.w3.org/2001/XMLSchema-datatypes"/> </element>]]></pre> <p>If the children of an element or an attribute match a <code>data</code> pattern, then complete content of the element or attribute must match that <code>data</code> pattern. It is not permitted to have a pattern which allows part of the content to match a <code>data</code> pattern, and another part to match another pattern. For example, the following pattern is <em>not</em> allowed:</p> <pre><![CDATA[<element name="bad"> <data type="xsd:int"/> <element name="note"> <text/> </element> </element>]]></pre> <p>However, this would be fine:</p> <pre><![CDATA[<element name="ok"> <data type="xsd:int"/> <attribute name="note"> <text/> </attribute> </element>]]></pre> <p>Note that this restriction does not apply to the <code>text</code> pattern.</p> <p>Datatypes may have parameters. For example, a string datatype may have a parameter controlling the length of the string. The parameters applicable to any particular datatype are determined by the datatyping vocabulary. Parameters are specified by adding one or more <code>param</code> elements as children of the <code>data</code> element. For example, the following constrains the <code>email</code> element to contain a string at most 127 characters long:</p> <pre><![CDATA[<element name="email"> <data type="xsd:string"> <param name="maxLength">127</param> </data> </element>]]></pre> </div> <div> <h2>Enumerations</h2> <p>Many markup vocabularies have attributes whose value is constrained to be one of set of specified values. The <code>value</code> pattern matches a string that has a specified value. For example,</p> <pre><![CDATA[<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </attribute> </element>]]></pre> <p>allows the <code>preferredFormat</code> attribute to have the value <code>html</code> or <code>text</code>. This corresponds to the DTD</p> <pre><![CDATA[<!DOCTYPE card [ <!ELEMENT card EMPTY> <!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED preferredFormat (html|text) #REQUIRED> ]>]]></pre> <p>The <code>value</code> pattern is not restricted to attribute values. For example, the following is allowed:</p> <pre><![CDATA[<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <element name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </element> </element>]]></pre> <p>The prohibition against a <code>data</code> pattern's matching only part of the content of an element also applies to <code>value</code> patterns.</p> <p>By default, the <code>value</code> pattern will consider the string in the pattern to match the string in the document if the two strings are the same after the whitespace in both strings is normalized. Whitespace normalization strips leading and trailing white-space characters, and collapses sequences of one or more white-space characters to a single space character. This corresponds to the behaviour of an XML parser for an attribute that is declared as other than CDATA. Thus the above pattern will match any of</p> <pre><![CDATA[<card name="John Smith" email="js@example.com" preferredFormat="html"/>]]><br/> <![CDATA[<card name="John Smith" email="js@example.com" prefersFormat=" html "/>]]></pre> <p>The way that the <code>value</code> pattern compares the pattern string with the document string can be controlled by specifying a <code>type</code> attribute specifying a datatype. The <code>type</code> attribute contains a qualified name identifying the datatype. The pattern string matches the document string if they both represent the same value of the specified datatype. Thus, whereas the <code>data</code> pattern matches an arbitrary value of a datatype, the <code>value</code> pattern matches a specific value of a datatype.</p> <p>RELAX NG provides two builtin datatypes that are useful with the <code>value</code> pattern. These datatypes are specified by using an unprefixed name as the value of the <code>type</code> attribute. The two builtin datatypes are <code>string</code> and <code>token</code>. The builtin datatype <code>token</code> corresponds to the default comparison behavior of the <code>value</code> pattern. The builtin datatype <code>string</code> compares strings without any normalization (other than that performed by XML). For example,</p> <pre><![CDATA[<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value type="string">html</value> <value type="string">text</value> </choice> </attribute> </element>]]></pre> <p>will <em>not</em> match</p> <pre><![CDATA[<card name="John Smith" email="js@example.com" prefersHTML=" html "/>]]></pre> </div> <div> <h2>Lists</h2> <p>The <code>oneOrMoreTokens</code> and <code>zeroOrMoreTokens</code> patterns match a whitespace-separated sequence of tokens; they each contain a pattern that the individual tokens must match. For example, the <code>extension-element-prefixes</code> attribute in XSLT contains a whitespace-separated list of zero or more namespace prefixes, where each namespace prefix is either an NCName or the special value <code>#default</code>:</p> <pre><![CDATA[<attribute name="extension-element-prefixes"> <zeroOrMoreTokens> <choice> <data type="xsd:NCName"/> <value>#default</value> </choice> </zeroOrMoreTokens> </attribute>]]></pre> <p>The <code>oneOrMoreTokens</code> and <code>zeroOrMoreTokens</code> patterns must not contain <code>element</code> or <code>attribute</code> patterns.</p> </div> <div> <h2>Interleaving</h2> <p>The <code>interleave</code> pattern allows child elements to occur in any order. For example, the following would allow the <code>card</code> element to contain the <code>name</code> and <code>email</code> elements in any order:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <interleave> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </interleave> </element> </zeroOrMore> </element>]]></pre> <p>The pattern is called <code>interleave</code> because of how it works with patterns that match more than one element. Suppose we want to write a pattern for the HTML <code>head</code> element which requires exactly one <code>title</code> element, at most one <code>base</code> element and zero or more <code>style</code>, <code>script</code>, <code>link</code> and <code>meta</code> elements and suppose we are writing a <code>grammar</code> pattern that has one definition for each element. Then we could define the pattern for <code>head</code> as follows:</p> <pre><![CDATA[<define name="head"> <element name="head"> <interleave> <ref name="title"/> <optional> <ref name="base"/> </optional> <zeroOrMore> <ref name="style"/> </zeroOrMore> <zeroOrMore> <ref name="script"/> </zeroOrMore> <zeroOrMore> <ref name="link"/> </zeroOrMore> <zeroOrMore> <ref name="meta"/> </zeroOrMore> </interleave> </element> </define>]]></pre> <p>Suppose we had a <code>head</code> element that contained a <code>meta</code> element, followed by a <code>title</code> element, followed by a <code>meta</code> element. This would match the pattern because it is an interleaving of a sequence of two <code>meta</code> elements, which match the child pattern</p> <pre><![CDATA[ <zeroOrMore> <ref name="meta"/> </zeroOrMore>]]></pre> <p>and a sequence of one <code>title</code> element, which matches the child pattern</p> <pre><![CDATA[ <ref name="title"/>]]></pre> <p>The semantics of the <code>interleave</code> pattern are that a sequence of elements matches an <code>interleave</code> pattern if it is an interleaving of sequences that match the child patterns of the <code>interleave</code> pattern. Note that this is different from the <code>&</code> connector in SGML: <code>A* & B</code> matches the sequence of elements <code>A A B</code> or the sequence of elements <code>B A A</code> but not the sequence of elements <code>A B A</code>.</p> <p>One special case of <code>interleave</code> is very common: interleaving <code><![CDATA[<text/>]]></code> with a pattern <var>p</var> represents a pattern that matches what <var>p</var> matches but also allows characters to occur as children. The <code>mixed</code> element is a shorthand for this.</p> <pre><mixed> <var>p</var> </mixed></pre> <p>is short for</p> <pre><interleave> <text/> <var>p</var> </interleave></pre> </div> <div> <h2>Modularity</h2> <p>The <code>include</code> element can be used to allow a pattern to be divided amongst multiple files. The <code>include</code> element has a required <code>href</code> attribute that specifies the URL of a file to be included in place of the <code>include</code> element.</p> <div> <h3>Including patterns</h3> <p>The <code>include</code> element can be used as a pattern. In this case, it will match if the pattern contained in the specified URL matches. Suppose for example, you have a RELAX NG pattern that matches HTML inline content stored in <code>inline.rng</code>:</p> <pre><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="code"> <ref name="inline"/> </element> <element name="em"> <ref name="inline"/> </element> <!-- etc --> </choice> </zeroOrMore> </start> </grammar>]]></pre> <p>Then we could allow the <code>note</code> element to contain inline HTML markup by using <code>include</code> as follows:</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <include href="inline.rng"/> </element> </optional> </element> </zeroOrMore> </element>]]></pre> <p>For another example, suppose you have two RELAX NG patterns stored in files <code>pattern1.rng</code> and <code>pattern2.rng</code>. Then the following is a pattern that which match anything matched by one of those patterns:</p> <pre><![CDATA[<choice> <include href="pattern1.rng"/> <include href="pattern2.rng"/> </choice>]]></pre> </div> <div> <h3>Merging grammars</h3> <p>The <code>include</code> element is also allowed as a child of a <code>grammar</code> pattern. In this case the specified URL must contain a <code>grammar</code> pattern, and the included <code>grammar</code> will be merged with the including <code>grammar</code>.</p> <p>Normally, duplicate definitions (two definitions with the same name) result in an error. However, <code>define</code> elements may be put inside the <code>include</code> element to indicate that they are to replace definitions in the included <code>grammar</code> pattern.</p> <p>Suppose the file <code>addressBook.rng</code> contains the following grammar pattern:</p> <pre><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <empty/> </define> </grammar>]]></pre> <p>Another pattern could customize <code>addressBook.rng</code> as follows:</p> <pre><![CDATA[<grammar> <include href="addressBook.rng"> <define name="card.local"> <optional> <element name="note"> <text/> </element> </optional> </define> </include> </grammar>]]></pre> <p>This would be equivalent to:</p> <pre><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <optional> <element name="note"> <text/> </element> </optional> </define> </grammar>]]></pre> <p>which is equivalent to</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="note"> <text/> </element> </optional> </element> </zeroOrMore> </element>]]></pre> <p>It is also possible to combine together duplicate definitions from separate files by adding a <code>combine</code> attribute to the <code>define</code> elements. The <code>combine</code> attribute specifies how the definitions should be combined; it may have the value <code>choice</code> or <code>interleave</code>. For example, we could have written our customization as:</p> <pre><![CDATA[<grammar> <include href="addressBook.rng"/> <define name="card.local" combine="choice"> <!-- no optional element needed this time --> <element name="note"> <text/> </element> </define> </grammar>]]></pre> <p>This would be equivalent to:</p> <pre><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <choice> <empty/> <element name="note"> <text/> </element> </choice> </define> </grammar>]]></pre> <p>This has the same meaning as before, since an optional pattern is equivalent to a choice between the pattern and empty.</p> <p>We could also have used <code>combine="interleave"</code> here:</p> <pre><![CDATA[<grammar> <include href="addressBook.rng"/> <define name="card.local" combine="interleave"> <optional> <element name="note"> <text/> </element> </optional> </define> </grammar>]]></pre> <p>This would be equivalent to:</p> <pre><![CDATA[<grammar> <start> <element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <ref name="card.local"/> </element> </zeroOrMore> </element> </start> <define name="card.local"> <interleave> <empty/> <optional> <element name="note"> <text/> </element> </optional> </interleave> </define> </grammar>]]></pre> <p>This has the same meaning as before, since adding an <code>empty</code> pattern to the content of a <code>interleave</code> pattern does not make any difference to what the <code>interleave</code> pattern matches.</p> <p>@@@ Add example of combine="interleave" with attributes.</p> <p>The <code>notAllowed</code> pattern never matches anything. Just as adding <code>empty</code> to a <code>group</code> makes no difference, so adding <code>notAllowed</code> to a <code>choice</code> makes no difference. It is typically used in a definition that is referenced in a <code>choice</code> element to allow an including pattern to specify additional choices. For example, suppose a RELAX NG pattern <code>inline.rng</code> provides a pattern for inline content, which allows <code>bold</code> and <code>italic</code> elements arbitrarily nested:</p> <pre><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <ref name="local.inline"/> </choice> </zeroOrMore> </start> <define name="local.inline"> <notAllowed/> </define> </grammar>]]></pre> <p>Another RELAX NG pattern could use <code>inline.rng</code> and add <code>code</code> and <code>em</code> to the set of inline elements as follows:</p> <pre><![CDATA[<grammar> <include href="inline.rng"> <define name="local.inline"> <choice> <element name="code"> <ref name="inline"> </element> <element name="em"> <ref name="inline"> </element> </choice> </define> </include> <start> <element name="doc"> <zeroOrMore> <element name="p"> <ref name="inline"/> </element> </zeroOrMore> </element> </start> </grammar>]]></pre> <p>We could instead have used <code>combine="choice"</code>. In this case, <code>inline.rng</code> would need to separate out the choices as a separate definition:</p> <pre><![CDATA[<grammar> <start name="inline"> <zeroOrMore> <ref name="inline.class"/> </zeroOrMore> </start> <define name="inline.class"> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> </choice> </define> </grammar>]]></pre> <p>and the customization would add to those choices:</p> <pre><![CDATA[<grammar> <include href="inline.rng"/> <start> <element name="doc"> <zeroOrMore> <element name="p"> <ref name="inline"/> </element> </zeroOrMore> </element> </start> <define name="inline.class" combine="choice"> <choice> <element name="code"> <ref name="inline"> </element> <element name="em"> <ref name="inline"> </element> </choice> </define> </grammar>]]></pre> </div> </div> <div> <h2>Namespaces</h2> <p>RELAX NG is namespace-aware. Thus, it considers an element or attribute to have both a local name and a namespace URI which together constitute the name of that element or attribute.</p> <div> <h3>Using the <code>ns</code> attribute</h3> <p>The <code>element</code> pattern uses an <code>ns</code> attribute to specify the namespace URI of the elements that it matches. For example</p> <pre><![CDATA[<element name="foo" ns="http://www.example.com"> <empty/> </element>]]></pre> <p>would match any of</p> <pre><![CDATA[<foo xmlns="http://www.example.com"/>]]><br/> <![CDATA[<e:foo xmlns:e="http://www.example.com"/>]]><br/> <![CDATA[<example:foo xmlns:example="http://www.example.com"/>]]></pre> <p>but not any of</p> <pre><![CDATA[<foo/>]]><br/> <![CDATA[<e:foo xmlns:e="http://WWW.EXAMPLE.COM"/>]]><br/> <![CDATA[<example:foo xmlns:example="http://www.example.net"/>]]></pre> <p>A value of an empty string for the <code>ns</code> attribute indicates a null or absent namespace URI (just as with the <code>xmlns</code> attribute). Thus, the pattern</p> <pre><![CDATA[<element name="foo" ns=""> <empty/> </element>]]></pre> <p>matches any of</p> <pre><![CDATA[<foo xmlns=""/>]]><br/> <![CDATA[<foo/>]]></pre> <p>but not any of</p> <pre><![CDATA[<foo xmlns="http://www.example.com"/>]]><br/> <![CDATA[<e:foo xmlns:e="http://www.example.com"/>]]></pre> <p>It is tedious and error-prone to specify the <code>ns</code> attribute on every <code>element</code>, so RELAX NG allows it to be defaulted. If an <code>element</code> pattern does not specify an <code>ns</code> attribute, then it defaults to the value of the <code>ns</code> attribute of the nearest ancestor that has an <code>ns</code> attribute, or the empty string if there is no such ancestor. Thus</p> <pre><![CDATA[<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>is equivalent to</p> <pre><![CDATA[<element name="addressBook" ns=""> <zeroOrMore> <element name="card" ns=""> <element name="name" ns=""> <text/> </element> <element name="email" ns=""> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>and</p> <pre><![CDATA[<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>is equivalent to</p> <pre><![CDATA[<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <element name="name" ns="http://www.example.com"> <text/> </element> <element name="email" ns="http://www.example.com"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>The <code>attribute</code> pattern also takes an <code>ns</code> attribute. However, there is a difference in how it defaults. This is because of the fact that the XML Namespaces Recommendation does not apply the default namespace to attributes. If an <code>ns</code> attribute is not specified on the <code>attribute</code> pattern, then it defaults to the empty string. Thus</p> <pre><![CDATA[<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <attribute name="name"/> <attribute name="email"/> </element> </zeroOrMore> </element>]]></pre> <p>is equivalent to</p> <pre><![CDATA[<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <attribute name="name" ns=""/> <attribute name="email" ns=""/> </element> </zeroOrMore> </element>]]></pre> <p>and so will match</p> <pre><![CDATA[<addressBook xmlns="http://www.example.com"> <card name="John Smith" email="js@example.com"/> </addressBook>]]></pre> <p>or</p> <pre><![CDATA[<example:addressBook xmlns:example="http://www.example.com"> <example:card name="John Smith" email="js@example.com"/> </example:addressBook>]]></pre> <p>but not</p> <pre><![CDATA[<example:addressBook xmlns:example="http://www.example.com"> <example:card example:name="John Smith" example:email="js@example.com"/> </example:addressBook>]]></pre> <p>To match this last example, the <code>attribute</code> patterns must specify <code>global="true"</code>:</p> <pre><![CDATA[<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <attribute name="name" global="true"/> <attribute name="email" global="true"/> </element> </zeroOrMore> </element>]]></pre> <p>This is equivalent to:</p> <pre><![CDATA[<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <attribute name="name" ns="http://www.example.com"/> <attribute name="email" ns="http://www.example.com"/> </element> </zeroOrMore> </element>]]></pre> <p>Thus, specifying <code>global="true"</code> on an <code>attribute</code> pattern makes the <code>ns</code> attribute default in the same way that it does on an <code>element</code> pattern.</p> <p>The <code>ns</code> attribute is allowed on any element in a RELAX NG pattern. The <code>global</code> attribute is allowed only on an <code>attribute</code> pattern.</p> </div> <div> <h3>Qualified names</h3> <p>When a pattern matches elements and attributes from multiple namespaces, using the <code>ns</code> attribute would require repeating namespace URIs in different places in the pattern. This is error-prone and hard to maintain, so RELAX NG also allows the <code>element</code> and <code>attribute</code> patterns to use a prefix in the value of the <code>name</code> attribute to specify the namespace URI. In this case, the prefix specifies the namespace URI to which that prefix is bound by the namespace declarations in scope on the <code>element</code> or <code>attribute</code> pattern. Thus</p> <pre><![CDATA[<element name="e:addressBook" xmlns:e="http://www.example.com"> <zeroOrMore> <element name="e:card"> <element name="e:name"> <text/> </element> <element name="e:email"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>is equivalent to</p> <pre><![CDATA[<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <element name="name" ns="http://www.example.com"> <text/> </element> <element name="email" ns="http://www.example.com"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> <p>If a prefix is specified in the value of the <code>name</code> attribute of an <code>element</code> or <code>attribute</code> pattern, then that prefix determines the namespace URI of the elements or attributes that will be matched by that pattern, regardless of the value of any <code>ns</code> attribute.</p> <p>Note that the XML default namespace (as specified by the <code>xmlns</code> attribute) is not used in determining the namespace URI of elements and attributes that <code>element</code> and <code>attribute</code> patterns match.</p> </div> </div> <div> <h2>Name classes</h2> <p>Normally, the name of the element to be matched by an <code>element</code> element is specified by a <code>name</code> attribute. An <code>element</code> element can instead start with an element specifying a <i>name-class</i>. In this case, the <code>element</code> pattern will only match an element if the name of the element is a member of the name-class. The simplest name-class is <code>anyName</code>, which any name at all is a member of, regardless of its local name and its namespace URI. For example, the following pattern matches any well-formed XML document:</p> <pre><![CDATA[<grammar> <start name="anyElement"> <element> <anyName/> <zeroOrMore> <choice> <attribute> <anyName/> </attribute> <text/> <ref name="anyElement"/> </choice> </zeroOrMore> </element> </start> </grammar>]]></pre> <p>The <code>nsName</code> name-class contains any name with the namespace URI specified by the <code>ns</code> attribute, which defaults in the same way as the <code>ns</code> attribute on the <code>element</code> pattern.</p> <p>The <code>choice</code> name-class matches any name that is a member of any of its child name-classes.</p> <p>The <code>not</code> name-classes contains any name that is not a member of the child name-class.</p> <p>For example</p> <pre><![CDATA[<element name="card" ns="http://www.example.com"> <zeroOrMore> <attribute> <not> <choice> <nsName/> <nsName ns=""/> </choice> </not> </attribute> </zeroOrMore> <text/> </element>]]></pre> <p>would allow the <code>card</code> element to have any number of namespace-qualified attributes provided that they were qualified with namespace other than that of the <code>card</code> element.</p> <p>Note that an <code>attribute</code> pattern matches a single attribute even if it has a name-class that contains multiple names. To match zero or more attributes, the <code>zeroOrMore</code> element must be used.</p> <p>The <code>difference</code> name-class contains any name that is a member of the first child name-class, but not a member of any of the following name-classes. The <code>not</code> name-class is, in fact, a shorthand for <code>difference</code>:</p> <pre><not> <var>name-class</var> </not></pre> <p>is short for</p> <pre><difference> <anyName/> <var>name-class</var> </difference></pre> <p>The <code>name</code> name-class contains a single name. The content of the <code>name</code> element specifies the name in the same way as the <code>name</code> attribute of the <code>element</code> pattern. The <code>ns</code> attribute specifies the namespace URI in the same way as the <code>element</code> pattern.</p> <p>Some schema languages have a concept of <i>lax</i> validation, where an element or attribute is validated against a definition only if there is one. We can implement this concept in RELAX NG with name classes that uses <code>difference</code> and <code>name</code>. Suppose, for example, we wanted to allow an element to have any attribute with a qualified name, but we still wanted to ensure that if there was an <code>xml:space</code> attribute, it had the value <code>default</code> or <code>preserve</code>. It wouldn't work to use:</p> <pre><![CDATA[<element name="example"> <zeroOrMore> <attribute> <anyName/> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional> </element>]]></pre> <p>because an <code>xml:space</code> attribute with a value other than <code>default</code> or <code>preserve</code> would match</p> <pre><![CDATA[ <attribute> <anyName/> </attribute>]]></pre> <p>even though it did not match</p> <pre><![CDATA[ <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute>]]></pre> <p>The solution is to use <code>name</code> together with <code>difference</code>:</p> <pre><![CDATA[<element name="example"> <zeroOrMore> <attribute> <difference> <anyName/> <name>xml:space</name> </difference> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional> </element>]]></pre> <p>Note that the <code>define</code> element cannot contain a name-class; it can only contain a pattern.</p> </div> <div> <h2>Cross references</h2> <p>RELAX NG generalizes the ID/IDREF feature of XML. A <code>data</code> pattern may have either a <code>key</code> or a <code>keyRef</code> attribute. A <code>data</code> pattern with a <code>key</code> attribute behaves like an XML ID; a <code>data</code> pattern with a <code>keyRef</code> attribute type behaves like an XML IDREF. Whereas XML has a single symbol-space of IDs and IDREFs, RELAX NG has an unlimited number of named symbol-spaces. The value of the <code>key</code> or <code>keyRef</code> is an unprefixed name identifying the symbol-space. An element or attribute that matches a <code>data</code> pattern with a <code>key</code> attribute is called a <i>key</i>; an element or attribute that matches a <code>data</code> pattern with a <code>keyRef</code> attribute is called a <i>key-reference</i>. A document is invalid if it has two distinct keys in the same symbol-space with same value; it is also invalid if it contains a key-reference that does not have a corresponding key in the same symbol-space in the same document with the same value.</p> <p>Whereas in XML IDs and IDREFs must be names, in RELAX NG keys and key-references may have any datatype; whether an element or attribute is a key or key-reference is orthogonal to its datatype.The values of keys and key-references are compared using the datatype specified by the <code>data</code> pattern. All <code>data</code> patterns sharing the same symbol space must specify the same value for the <code>type</code> attribute.</p> <p>For example, suppose a document contains <code>termref</code> elements referencing defined terms:</p> <pre><![CDATA[<element name="termref"> <data type="token" keyRef="term"/> </element>]]></pre> <p>For each such defined term, there is a corresponding <code>dt</code>, <code>dd</code> pair in a <code>glossary</code> element:</p> <pre><![CDATA[<element name="glossary"> <zeroOrMore> <element name="dt"> <data type="token" key="term"/> </element> <element name="dd"> <text/> </element> </zeroOrMore> </element>]]></pre> <p>The above example is using the builtin <code>token</code> datatype introduced in the <a href="#Enumerations">Enumerations</a> section.</p> <p>It must be possible to determine for any element or attribute whether it is a key or key reference and, if so, the symbol space of the key or key reference, by examining just the name of the element or attribute and the names of the ancestors of that element or attribute. For example, it is not permitted to have the pattern:</p> <pre><![CDATA[<element name="bad"> <choice> <data type="string" key="x"/> <data type="string" key="y"/> </choice> </element>]]></pre> </div> <div> <h2>Annotations</h2> <p>If a RELAX NG element has an attribute or child element with a namespace URI other than the RELAX NG namespace, then that attribute or element is ignored. Thus, you can add annotations to RELAX NG patterns simply by using an attribute or element in a separate namespace:</p> <pre><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/main/ns/0.1" xmlns:a="http://www.example.com/annotation"> <zeroOrMore> <element name="card"> <a:documentation>Information about a single email address.</a:documentation> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore> </element>]]></pre> </div> <div> <h2>Nested grammars</h2> <p>There is no prohibition against nesting grammar patterns. A <code>ref</code> pattern refers to the definition from nearest <code>grammar</code> ancestor. However, by putting a <code>parent="true"</code> attribute on <code>ref</code>, it is possible to escape out of the current grammar and reference its parent grammar.</p> <p>Imagine the problem of writing a pattern for tables. The pattern for tables only cares about the structure of tables; it doesn't care about what goes inside a table cell. First, we create a RELAX NG pattern <code>table.rng</code> as follows:</p> <pre><![CDATA[<grammar> <define name="cell.content"> <notAllowed/> </define> <start> <element name="table"> <oneOrMore> <element name="tr"> <oneOrMore> <element name="td"> <ref name="cell.content"/> </element> </oneOrMore> </element> </oneOrMore> </element> </start> </grammar>]]></pre> <p>Patterns that include <code>table.rng</code> must redefine <code>cell.content</code>. By using a nested <code>grammar</code> pattern containing a <code>ref</code> pattern with <code>parent="true"</code>, the including pattern can redefine <code>cell.content</code> to be a pattern defined in the including pattern's grammar, thus effectively importing a pattern from the parent grammar into the child grammar:</p> <pre><![CDATA[<grammar> <start> <element name="doc"> <zeroOrMore> <choice> <element name="p"> <ref name="inline"/> </element> <grammar> <include href="table.rng"/> <define name="cell.content" combine="replace"> <ref name="inline" parent="true"/> </define> </grammar> </choice> </zeroOrMore> </element> </start> <define name="inline"> <zeroOrMore> <choice> <text/> <element name="em"> <ref name="inline"/> </element> </choice> </zeroOrMore> </define> </grammar>]]></pre> <p>Of course, in a trivial case like this, there is no advantage in nesting the grammars: we could simply have have included <code>table.rng</code> within the outer <code>grammar</code> element. However, when the included grammar has many definitions, nesting it avoids the possibility of name conflicts between the including grammar and the included grammar.</p> </div> <div> <h2>Non-restrictions</h2> <p>RELAX NG does not require patterns to be "deterministic" or "unambiguous".</p> <p>Suppose we wanted to write the email address book in HTML, but use class attributes to specify the structure.</p> <pre><![CDATA[<element name="html"> <element name="head"> <element name="title"> <text/> </element> </element> <element name="body"> <element name="table"> <attribute name="class"> <value>addressBook</value> </attribute> <oneOrMore> <element name="tr"> <attribute name="class"> <value>card</value> </attribute> <element name="td"> <attribute name="class"> <value>name</value> </attribute> <interleave> <text/> <optional> <element name="span"> <attribute name="class"> <value>givenName</value> </attribute> <text/> </element> </optional> <optional> <element name="span"> <attribute name="class"> <value>familyName</value> </attribute> <text/> </element> </optional> </interleave> </element> <element name="td"> <attribute name="class"> <value>email</value> </attribute> <text/> </element> </element> </oneOrMore> </element> </element> </element>]]></pre> <p>This would match a XML document such as:</p> <pre><![CDATA[<html> <head> <title>Example Address Book</title> </head> <body> <table class="addressBook"> <tr class="card"> <td class="name"> <span class="givenName">John</span> <span class="familyName">Smith</span> </td> <td class="email">js@example.com</td> </tr> </table> </body> </html>]]></pre> <p>but not</p> <pre><![CDATA[<html> <head> <title>Example Address Book</title> </head> <body> <table class="addressBook"> <tr class="card"> <td class="name"> <span class="givenName">John</span> <!-- Note the incorrect class attribute --> <span class="givenName">Smith</span> </td> <td class="email">js@example.com</td> </tr> </table> </body> </html>]]></pre> </div> <div> <h2>Non-features</h2> <p>The role of RELAX NG is simply to specify a class of documents, not to assist in interpretation of the documents belonging to the class. It does not change the infoset of the document. In particular, RELAX NG</p> <ul> <li>does not allow defaults for attributes to be specified</li> <li>does allow entities to be specified</li> <li>does allow notations to be specified</li> <li>does not specify whether white-space is significant</li> </ul> <p>Also RELAX NG does not define a way for an XML document to associate itself with a RELAX NG pattern.</p> </div> <div> <h2>Differences from TREX</h2> <ol> <li>the <code>concur</code> pattern has been removed</li> <li>the <code>string</code> pattern has been replaced by the <code>value</code> pattern</li> <li>the <code>anyString</code> pattern has been renamed to <code>text</code></li> <li>the namespace URI is different</li> <li>pattern elements must be namespace qualified</li> <li>anonymous datatypes have been removed</li> <li>the <code>data</code> pattern can have parameters specified by <code>param</code> child elements</li> <li><code>oneOrMoreTokens</code> and <code>zeroOrMoreTokens</code> patterns have been added for matching whitespace-separated sequences of tokens</li> <li>the <code>data</code> pattern can have a <code>key</code> or <code>keyRef</code> attribute</li> <li>the <code>replace</code> and <code>group</code> values for the <code>combine</code> attribute have been removed</li> <li>an <code>include</code> element in a grammar may contain <code>define</code> elements that replace included definitions</li> </ol> </div> </body> </html>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC