[relax-ng] Annotations in the non-XML syntax

The aspect of the non-XML syntax that has caused me by far the most
difficulty is annotations.  I've finally come up with a design that I feel
reasonably happy with.  Attached is a description of the syntax using this
new design.  I've also implemented this, but the code isn't yet ready for
public consumption.

Some of the things I would commend to you about this new syntax are:

- The annotations applicable to a syntactic object appear in a consistent
position (immediately preceding the object)

- There is a nice similarity with C# where annotations also occur in square
brackets before an object

- All annotations expressible in the XML syntax are expressible in the
non-XML syntax (actually there's one case that isn't yet handled, which I
describe below)

- Annotation attributes are written the same wherever they occur

- Annotation elements are written the same wherever they occur

- Square brackets are used in two contexts, but the uses are harmonious: in
each case, the square brackets contain attributes followed by content

- Just as a sequence of definitions is allowed without any connector, so a
sequence of annotation elements is allowed at the definition level without

- Just as adjacent patterns or name classes in a group require a connector
(e.g. |), so annotation elements that are siblings of patterns or name
classes require a connector (>)

- The relationship between annotation attributes and annotation elements
that occur at the top-level outside square brackets is harmonious with the
relationship between those that occur within square brackets: annotation
attributes are followed by annotation elements without any intervening
connector and they will end of as attributes and initial children of the
same parent element

- The implementation doesn't to cope with parsing embedded XML

On the negative side, it's a bit harder to parse:

- In several cases, two tokens of lookahead are required: in several
contexts, when you see a name you have to lookahead to see whether there's a
following "[" in order to know how to proceed

- There's one case where an arbitrary amount of lookahead is required: in
order to determine whether a file contains a sequence of definitions or a
pattern, you may have to lookahead past an annotation in square brackets,
which can consist of arbitrarily many tokens (however, this is easily
implementable in JavaCC without hackery)

There's one kind of annotation that is possible in the XML syntax that can
still not be expressed in the non-XML syntax: annotations that attach to an
<except> element,
more specifically annotations that occur as attributes, initial child
elements or following siblings of <except> elements (in <data>, <nsName> or

I'm not wedded to the choice of ">" as the connector for connecting patterns
and name-classes to following sibling annotation elements.  If you think
another character would be preferable, please say so.


Title: A Non-XML Syntax for RELAX NG

A Non-XML Syntax for RELAX NG

This document describes a non-XML syntax for RELAX NG (a schema language for XML). The design goals of this syntax are:

The syntax is similar to the type syntax in the XQuery 1.0 Formal Semantics W3C Working Draft.


The syntax is defined by the following BNF:

topLevel ::= decl* topLevelBody

topLevelBody ::=
  | prefixedAnnotationAttribute* grammar

decl ::=
  "namespace" identifier "=" (literal | "inherit")
  | "default" "namespace" identifier? "=" (literal | "inherit")
  | "datatypes" identifier "=" literal

pattern ::=
  | particle ("|" particle)+
  | particle ("," particle)+ 
  | particle ("&" particle)+
  | exceptParticle

particle ::= annotations? primary followAnnotations occurrence?

exceptParticle ::=
  annotations? datatypeName params? "-" annotations? primary followAnnotations

primary ::=
  "(" pattern ")"
  | "element" nameClass "{" pattern "}"
  | "attribute" nameClass "{" pattern "}"
  | "mixed" "{" pattern "}"
  | "empty"
  | "notAllowed"
  | "text"
  | "list" "{" pattern "}"
  | datatypeName params?
  | datatypeName? datatypeValue
  | "grammar" "{" grammar "}"
  | ref
  | "parent" ref
  | "externalRef" literal inherit?

occurrence = ("*" | "+" | "?") followAnnotations

nameClass ::=
  basicNameClass followAnnotations
  | basicNameClass followAnnotations ("|" basicNameClass followAnnotations)+
  | openNameClass "-" basicNameClass followAnnotations

basicNameClass ::=
  annotations? QName
  | openNameClass
  | annotations? "(" nameClass ")"

openNameClass ::= annotations? (nsName | anyName)
ref ::= identifierNotKeyword

datatypeName ::= CName | "string" | "token"

datatypeValue ::= literal

params ::= "{" (annotations? identifier "=" literal)+ "}"

grammar ::= (definition | include | annotationElementNotKeyword)*

definition ::= annotations? subject ("=" | "|=" | "&=") pattern

subject ::= "start" | identifierNotKeyword

include ::= annotations? "include" literal inherit? includeBody?

includeBody ::= "{" (definition | annotationElementNotKeyword)* "}"

inherit ::= "inherit" "=" identifier

followAnnotations ::= (">" annotationElement)*

annotations ::= "[" prefixedAnnotationAttribute* annotationElement* "]"

annotationAttribute ::= (identifier | CName) "=" literal

prefixedAnnotationAttribute ::= CName "=" literal

annotationElement ::= (identifier | CName) annotationElementBody

annotationElementNotKeyword ::=
  (identifierNotKeyword | CName) annotationElementBody

annotationElementBody ::=
  "[" annotationAttribute* (annotationElement | literal)* "]"

identifierNotKeyword ::= identifier - keyword

identifier ::= NCName | escapedIdentifier

keyword ::=
  "attribute" | "default" | "datatypes" | "element" | "empty"
  | "externalRef" | "grammar" | "include" | "inherit" | "list" 
  | "mixed" | "namespace" | "notAllowed" | "parent" | "start"
  | "string" | "text" | "token" 

CName ::= NCName ":" NCName
escapedIdentifier ::= "\" NCName
literal ::= '"' ([^"] | '""')* '"' | "'" ([^'] | "''")* "'"
nsName ::= NCName ":*"
anyName ::= "*"

Comments start with a # and continue to the end of the line.

element is defined in the XML 1.0 Recommendation; NCName is defined in the XML Namespaces Recommendation.

Note that keywords are case-sensitive. To use a keyword as the name of a definition, the keyword must be escaped with \. It is not necessary to escape a keyword that is used as the name of an element, attribute or datatype parameter.

Mapping to RELAX NG Syntax

The correspondence between the non-XML syntax and RELAX NG's XML syntax is shown by the following tables.


Non-XML Syntax RELAX NG Syntax
p1 | p2 <choice> p1 p2 </choice>
p1 , p2 <group> p1 p2 </group>
p1 & p2 <interleave> p1 p2 </interleave>
p* <zeroOrMore> p </zeroOrMore>
p+ <oneOrMore> p </oneOrMore>
p? <optional> p </optional>
(p) p
element QName { p } <element name="QName"> p </element>
element nameClass { p } <element> nameClass p </element>
attribute QName { p } <attribute name="QName"> p </attribute>
attribute nameClass { p } <attribute> nameClass p </attribute>
empty <empty/>
notAllowed <notAllowed/>
text <text/>
mixed { p } <mixed> p </mixed>
list { p } <list> p </list>
identifierNotKeyword <ref name="identifierNotKeyword"/>
\identifier <ref name="identifier"/>
externalRef "uri" <externalRef href="uri"/>
parent identifier <parentRef name="identifier"/>
grammar { defs } <grammar> defs </grammar>
"string" <value>string</value>
string <data type="string"/>
token <data type="token"/>
prefix:localName <data type="localName" datatypeLibrary="uri"/>
prefix:localName "string" <value type="localName" datatypeLibrary="uri">string</value>
prefix:localName - p <data type="localName" datatypeLibrary="uri"><except> p </except></data>
prefix:localName { params } <data type="localName" datatypeLibrary="uri"> params </data>

Name classes

Non-XML Syntax RELAX NG Syntax
QName <name>QName</name>
prefix:* <nsName ns="uri"/>
prefix:* - nameClass <nsName ns="uri"<except> nameClass </except></nsName>
* <anyName/>
* - nameClass <anyName><except> nameClass </except></anyName>
nameClass1 | nameClass2 <choice> nameClass1 nameClass2 </choice>
(nameClass) nameClass


Non-XML Syntax RELAX NG Syntax
localName = "string" <param name="localName">string</param>


Non-XML Syntax RELAX NG Syntax
identifierNotKeyword = p <define name="identifierNotKeyword"> p </define>
identifierNotKeyword |= p <define name="identifierNotKeyword" combine="choice"> p </define>
identifierNotKeyword &= p <define name="identifierNotKeyword" combine="interleave"> p </define>
start = p <start> p </start>
\identifier = p <define name="identifier"> p </define>
include "uri" <include href="uri"/>
include "uri" { defs } <include href="uri"> defs </include>


A datatypes declaration declares a prefix used in a QName identifying a datatype. For example,

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
element height { xsd:double }

A namespace declaration declares a prefix used in a QName specifying the name of an element or attribute. For example,

namespace rng = "http://relaxng.org/ns/structure/1.0"
element rng:text { empty }

A default namespace declaration declares the namespace used for unprefixed names specifying the name of an element (but not of an attribute). For example,

default namespace = "http://example.com"
element foo { attribute bar { string } }

is equivalent to

namespace ex = "http://example.com"
element ex:foo { attribute bar { string } }

A default namespace declaration may have a prefix as well. For example,

default namespace ex = "http://example.com"

is equivalent to

default namespace = "http://example.com"
namespace ex = "http://example.com"

The URI may be empty. This makes the prefix stand for the absent namespace URI. This is necessary for specifying a name class that matches any name with an absent namespace URI. For example:

namespace local = ""
element foo { attribute * - local:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
	  <nsName ns=""/>
      <data type="string"/>

RELAX NG has the feature that if a file does not specify an ns attribute then the ns attribute can be inherited from the including file. To support this feature, the keyword inherit can be specified in place of the namespace URI in a namespace declaration. For example,

default namespace this = inherit
element foo { element * - this:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
      <data type="string"/>

In addition, the include and externalRef patterns can specify inherit = prefix to specify the namespace to be inherited by the referenced file. For example,

namespace x = "http://www.example.com"
externalRef "foo.rng" inherit = x

is equivalent to

<externalRef href="foo.rng"

In the absence of an inherit parameter on include or externalRef, the default namespace will be inherited by the referenced file.

In the absence of a default namespace declaration, a declaration of

default namespace = inherit

is assumed.


RELAX NG supports two kinds of annotation: element annotations and attribute annotations. In this non-XML syntax, attribute annotations are written in a similar way to the XML syntax. For example, xml:lang = "en". Element annotations are written using the syntax

elementName [ attributesAndContent ]

where elementName is the QName of the element and attributesAndContent is a list of attributes followed by a list of elements and literals.

Annotations are attached in one of the following ways:

For example,

namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"

[ a:documentation [ "Represents a foo" ] ]
element foo
  [ a:defaultValue = "42" ]
  attribute bar { text }?,

turns into

<element name="foo"
  <a:documentation>Represents a foo</a:documentation>
    <attribute a:defaultValue="42" name="bar">

Here's another example using the RelaxNGCC annotations:

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace c = "http://www.xml.gr.jp/xmlns/relaxngcc"

[ c:class="sample1" ]
start =
  element team {
    element player {
      attribute number {
        [ c:alias="number" ]
	xsd:positiveInteger > c:java [ "System.out.println(number);" ]
      element name {
        [ c:alias="name" ]
	text > c:java [ "System.out.println(name);" ]

turns into

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
  <start c:class="sample1">
    <element name="team">
        <element name="player">
          <attribute name="number">
            <data c:alias="number" type="positiveInteger"/>
          <element name="name">
            <text c:alias="name"/>

Open issues

div element

The non-XML syntax cannot represent the div element.

Namespace declarations and value

There is a problem in translating a schema such as

<element xmlns="http://relaxng.org/ns/structure/1.0""
    <value type="QName" xmlns:bar="http://example.com/1">bar:baz</value>
    <value type="QName" xmlns:bar="http://example.com/2">bar:baz</value>

into the non-XML syntax. Although this can be translated, for example, into

namespace bar1 = "http://example.com/1"
namespace bar2 = "http://example.com/2"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

element foo { xsd:QName "bar1:baz" | xsd:QName "bar2:baz" }

doing so requires that the translator have knowledge of the QName datatype.

James Clark

