relax-ng message

Subject: [relax-ng] Re: [xml-dev] DTDs, W3C Schemas, RELAX NG, Schematron?
From: John Cowan <jcowan@reutershealth.com>
To: clbullar@ingr.com ("Bullard, Claude L (Len)")
Date: Thu, 23 May 2002 14:24:37 -0400 (EDT)
"Bullard, Claude L (Len)" scripsit:

> Please elaborate on how RELAX is more powerful than XSD.

1) RNG allows nondeterministic content models.  A content model like
((X+,A*,Y+,A*) | (X+, A*) | (Y+, A*) | empty) need not be painstakingly
rewritten as ((X+,A*,(Y+,A*)?) | (Y+,A*)?).

2) The SGML DTD connector &, meaning "both operands in either order",
was removed from XML DTDs on grounds of implementation complexity.
XSD restored it, but *only* at the top level of an element content model.
RNG (which pronounces it "interleave") allows it at any level, and with
extended meaning:  in SGML, (A & B*) means either A followed by any
number of Bs, or any number of Bs followed by A; but in RNG it means any
number of Bs with an A somewhere, either before, after, or interleaved.
(There are two modest restrictions: a given element name can't appear
on both sides of &, and neither can character data.)

3) RNG treats attributes and child elements uniformly insofar as possible.
Thus a content model like (attribute id {xsd:ID} | element id {xsd:ID})
allows an attribute named id or a child element named id but not both.

4) Because RNG has a general notion of pattern that can be used
anywhere it makes sense (obviously elements inside attributes and the
like are meaningless), context-free and context-sensitive definitions
are accomplished by the same mechanism, and are free of arbitrary
restrictions.  Thus one can write a rule that refers to some attributes
and/or some child elements, and plug it almost anywhere into another
content pattern.

5) RNG allows a (simple) datatype to be restricted not only by facet,
but by explicit extension and exception.  One can write 'xsd:integer -
"0"' to mean a nonzero integer (although unfortunately "00" will still
validate), or 'xsd:integer | "Inf" | "+Inf" | "-Inf"' when a value may
be integral or infinite.

6) RNG wildcard names can likewise be excepted from: one can allow any
attribute from the "foo" namespace except "foo:bar" thus: "attribute
(foo:* - foo:bar) { text}".  Such an element or attribute has a content
model like any other, not a mere processing model as in XSD.

7) An RNG schema may include another RNG schema textually, but overriding
specified definientia with new definitions potentially quite unrelated
to the old ones.  Likewise, including a single definition from another
schema is easy.  A partial definition in another schema can be extended
either as an alternative or as an interleave (see above).

8) RNG has a concept of the start element, corresponding to the DOCTYPE
declaration's notion of the root element.

9) RNG is closed under union and intersection: given two document classes
described by RNG schemas, one can mechanically construct an RNG schema
which describes documents appearing in both classes, and another which
describes documents appearing in either class.

10) RNG allows schemas to reference datatype libraries other than the
XSD datatypes.  The default library includes only "string" and "token"
datatypes, but the full XSD datatype library is available in existing
validators.  There is a standard (Java) interface for plug-in libraries.

11) RNG lists consist of non-whitespace separated by whitespace, and the
non-whitespace need not be all of the same (simple) type.  A pattern like
"list {xsd:integer, xsd:string}+" is perfectly valid, and matches content
like "2 foo 5 bar".

12) The location of character data can be individually constrained.
The content model (A, text, B, C) allows character data between A and B,
but not between B and C.

13) RNG respects namespaces, but there is no such rule as "one namespace for
one schema".  A schema can define elements and attributes from multiple
namespaces, and a single namespace can be described by multiple schemas
simultaneously.


Some of these things may be feasible in XSD, and if so, attribute their
presence here to my ignorance of XSD.

With a very, very few exceptions, if a syntactically valid RNG pattern
makes any sense at all, it is also semantically valid.  RNG can't cope
with a simple datatype directly followed by child elements or another simple
datatype in a single element or attribute value, but then neither can XSD.

What does RNG lack?  Mainly, it has no notion of keys or other identity
constraints, except for DTD-compatible ID and IDREF(S).  RNG validators
also don't provide any sort of PSVI, and indeed only a subset of legal
RNG schemas would be able to produce type tags at all (validity is always
decidable, but just *which* rule(s) were applicable may not always be).

-- 
John Cowan <jcowan@reutershealth.com>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_