relax-ng message

Subject: Re: Issue: overlapping with XML Schema Part 2
From: James Clark <jjc@jclark.com>
To: trex@lists.oasis-open.org
Date: Sun, 06 May 2001 18:56:35 +0700
Here's my current thinking on how we should modify the way TREX handles
datatypes to avoid overlap between TREX and the datatype system.

TREX's model of a datatype system should be as follows:

1.  A datatype is identified by a name (a QName) and a set of named
parameters.  A parameter is named by a NCName.  The value of a parameter is
a string.  A collection of parameters cannot have two parameters with the
same name.

2. A datatype provides two operations:

(a) Is a string a representation of a member of the datatype?

(b) Are two strings equivalent (ie represent the same value)?

When XML Schema Part 2 is the datatype system, TREX should use

1. its collection of builtin datatypes

2. its facets (these would be treated as named parameters).

TREX should provide builtin support for the following ideas from XML Schema

1. union: this would turn into <choice>

2. list: this would turn into oneOrMore or zeroOrMore (or some variant
thereof)

3. restriction: this would turn into parameters to the datatype

4. enumeration or equivalently a pattern that matches a constant of a
particular datatype (it already needs the operation of equivalence between
strings for keys)

Thus I would propose the following:

1. Get rid of anonymous datatypes

2. Extend <data> to support parameters

3. Get rid of <string>

4. Extend <data> to support matching against a constant of a particular
datatype

5. Add one builtin datatype, with one builtin parameter (whiteSpace).

6. Rename <anyString/> to <text/> or <chars/>

7.  Provide some way to specify a white space separated list of tokens
matching some pattern, perhaps by extending oneOrMore and zeroOrMore.

The extended <data> element could look something like this:

<element name="data">
   <optional>
      <choice>
         <attribute name="key">
            <data type="xsd:NCName"/>
         </attribute>
         <attribute name="keyRef">
             <data type="xsd:NCName"/>
         </attribute>
       </choice>
    </optional>
    <attribute name="type">
       <data type="xsd:QName"/>
    </attribute>
   <optional>
      <attribute name="ns">
         <data type="xsd:anyURI">
       </attribute>
   </optional>
   <zeroOrMore>
       <element name="param">
          <attribute name="name">
             <data type="xsd:NCName"/>
          </attribute>
          <anyString/
       </element>
   </zeroOrMore>
   <optional>
     <element name="choice">
        <oneOrMore>
          <element name="value">
             <anyString/>
          </element>
        </oneOrMore>
     </element>
  </optional>
</element>

For example, instead of:

<choice>
   <string whiteSpace="preserve">foo</string>
   <string whiteSpace="preserve">bar</string>
</choice>

you would have:

<data type="string">
   <choice>
      <value>foo</value>
      <value>bar</value>
   </choice>
</data>

Instead of

<xsd:simpleType trex:role="datatype">
   <xsd:restriction baseType="xsd:int">
      <xsd:enumeration value="1"/>
      <xsd:enumeration value="2"/>
    </xsd:restriction>
</xsd:simpleType>

you would have

<data type="xsd:int">
   <choice>
      <value>1</value>
      <value>2</value>
    </choice>
</data>

Instead of

<xsd:simpleType trex:role="datatype">
   <xsd:restriction baseType="xsd:positiveInteger">
       <xsd:maxInclusive value="100"/>
   </xsd:restriction>
</xsd:simpleType>

you would have

<data type="xsd:positiveInteger">
   <param name="maxInclusive">100</param>
</data>

I am suggesting we use the more semantically neutral notion of a parameter
to a datatype in place of XML Schema's notion of restriction, because I
think XML Schema's handling of restriction is broken.  XML Schema Part 2
claims that a constraining facet restricts the value space of the a
datatype.  But in several cases, it doesn't seem to me that facets really
are doing this:

1. whiteSpace

2. pattern (it says it applies to the value space, but I think this is
unimplementable except for datatypes derived from string; the only way I can
see to implement it is to treat it as restricting the lexical space)

3. totalDigits, fractionDigits (values in the value space of decimal don't
contain information about the totalDigits and the fractionDigits; they just
specify the mathematical value)

James
Follow-Ups:
- Re: Issue: overlapping with XML Schema Part 2
  - From: Kohsuke KAWAGUCHI <kohsuke.kawaguchi@eng.sun.com>
- Re: Issue: overlapping with XML Schema Part 2
  - From: James Clark <jjc@jclark.com>