relax-ng-comment message

Subject: Types produced from regular models are sets
From: "Bob Foster" <bob@objfac.com>
To: <relax-ng-comment@lists.oasis-open.org>
Date: Mon, 21 Jul 2003 15:30:13 -0500
Lattice, I don't know, but I have been thinking that RELAX NG types should
be treated as sets. Perhaps this is obvious, but I haven't seen it, so I'll
ramble on for a bit.

The motivation for this is to construct a type system that respects the
regular model from which types must be derived and isn't ashamed of
non-determinism. Since RELAX NG models are closed under union, which last
time I looked was a commutative operator, ambiguity resolution by ordering
(a la W3 XML Schema) isn't well-founded theoretically and requires the
validator to maintain declaration order in inherently unordered patterns.
While understandable within the confines of a single pattern expression, the
choice of a single type out of a number of equally valid types is likely to
seem "random" to users when patterns are ambiguous at the element level.

In this note, I'm going to stick to "atomic" types because they are simpler
and RELAX NG doesn't have a concept of named types for non-text values. To
illustrate by example, given:

element e { xsd:integer | xsd:boolean }

and the input <e>1</e>, which matches both xsd:integer and xsd:boolean, the
type of e is the set,

{xsd:integer, xsd:boolean}

which, it should go without saying, is unordered and contains no duplicates.
It is also reasonable to view these types as patterns with (only) choice
operators, as long as the set properties are honored by the implementation.
(The pattern view may be easier to extend to complex/hedge types, but I'm
not going to go there.)

Each member of a type set names a set of strings; the type set describes the
union of the member sets. Neither uniquely names its set.

We can define the subtype (<=) relation in a conventional way: A type is a
subtype of a base type if every valid value of the subtype is a valid value
of the base. Equivalently, if the set of strings permitted by a type is a
subset of the set allowed by a base type, the type is a subtype of the base.
Then, trivially, if two types are subtypes of each other, they are
equivalent (identify the same set of strings), and each member of a type set
is a subtype of the type set. Given two type sets, one is a subtype of the
other if every member of the one is a subtype of the other. Every atomic
type is a subtype of {text}.

This is not an object-oriented definition; subtypes may or may not have an
"isa" relationship with their base types. For example, it is not true that
an integer "isa" decimal even though every integer value may also also be a
valid decimal value. This is a semantic issue outside the scope of the type
system (as it is in object-oriented languages, except for exhortation).

Note that in this interpretation, a list type is a sequence/array of sets.

Assuming there were an application API to receive it, the (atomic) type of
each element and attribute in an input stream (that has an atomic type)can
be constructed during validation by forming the union of the types of each
successfully matched pattern. (This seems easy to build as a byproduct of
the derivative with respect to text. I don't know about other
implementations.)

Since the type decorations of nodes would be sets, an API like that of SAX2
for attributes seems inappropriate, and not just because single names are
inappropriate. Instead, if an application is written to have different
behaviors depending on the discovered type of the input, it would be more
useful if the API allowed the application to ask if a particular type is a
subtype of the node type. For example, given the pattern:

element color { "red"|"green"|"blue"| list { part, part, part }}
...
part = xsd:int { minValue="0" maxValue="255" }

the application could ask if the input is a token and if so do a table
lookup, otherwise if it is a list consisting of three xsd:int values,
construct a color. The more rare application that needs to know the entire
set of valid types could be provided a list of names.

Bob Foster