relax-ng message

Subject: Re: Common annotations first draft
From: James Clark <jjc@jclark.com>
To: Murata Makoto <mura034@attglobal.net>, relax-ng@lists.oasis-open.org
Date: Sat, 04 Aug 2001 10:34:01 +0700

> I think that we should agree on publication of this document int our next
teleconference.

I don't know whether we will be able to settle enough issues by next week to
allow publication as a committee document.

> The most controvertial part of this document is the scope.  Which
requiements
> does this address?   I would like to incorporate something about
> requirements/scopes/goals in our first document about annotations.

This is an important question to ask.  My one-sentence statement of the
goals would be as follows:

"The goal of this specification is to facilitate transition from XML 1.0
DTDs to RELAX NG by providing annotations for some of the features of XML
1.0 DTDs that are not provided by RELAX NG."

I think we should firmly limit the scope to XML 1.0 DTD features; otherwise
the floodgates are open to including any random feature.  I count include
the a:documentation element as an XML 1.0 feature, in that it corresponds to
XML 1.0 DTD comments: a:documentation is the element/attribute markup
counterpart of DTD comments, just as the RELAX NG defined elements are the
element/attribute markup counterpart of other DTD markup declarations.

If we adopt this approach to stating our goals, then I think we need to
explain why we aren't supporting other DTD features.  For example, we should
say why we aren't supporting entity declarations and notation declarations.

> 2. Goals
>
> 1) General
>
> - It shall be straightforward to onvert DTDs with default values,
>   ID/IDREF/IDREFS to a RELAX NG with annotations.

I think this is the key point.  I would add "DTDs with comments" to the list
of what we are trying to support.

> - Use of default values and ID/IDREF/IDREFS shall not be confused by
>   ambiguous grammars.  When a grammar is ambiguous, it is not possible
>   to uniquely determine an <attribute> pattern for each attribute and an
>   <element> pattern for each element.  (Note:  This makes everything
>   hard.)

I think is another important issue, but I think it needs a bit of expansion.
To start with, I think we also need to mention the fact that something may
be unambiguous, but lookahead or multiple passes may be required to do the
assignment.  I also wouldn't quite describe dealing with ambiguity as a
goal. It's more of a fact: it is possible to have a RELAX NG grammar that is
ambiguous, so the annotation spec has to address this possibility by
imposing some sort of restrictions.  Where we need a set of goals is in
guiding our choice of restrictions. For example,

1. It must be possible to straightforwardly implement the restrictions.
2. Processing of the instance should not require lookahead or multiple
passes
3. The restrictions should be statically checkable (ie the schema should be
checkable independent of any instance)
4. The restrictions should not be any more restrictive than necessary
5. The transformed infoset should be XML 1.0 compatible: it must be an
infoset that could have been produced by a validating XML 1.0 parser for
some DTD.

4 conflicts with 1-2 to some extent, and we need to use our collective
judgement to balance them.

> 2) Attributes
>
> - Attributes can be defaultable but elements cannot be defaultable.
>
> - It should be possible to examine default values against datatypes.
>
 - When a defaultable attribute is missing in an information set, it
>   should be possible to change the information set by adding default
>   values.
>
> - When a defaultable attribute is missing in an information set, it
>   should be possible for application programs to use default values.
>   For example, it should be possible to generate Java classes from
>   RELAX NG grammars with default value annotation and embed default
>   values in the Java classes.  Such Java classes do not require
>   changed information sets.

I would remove this section, because I think our goals with respect to
attribute defaults should be derivable from our general goals, specifically
XML 1.0 compatibility.

I'm not clear what your 2nd bullet means.

> 3) ID/IDREF/IDREFS
>
> - Elements can have attributes as identifiers.  Two elements in an
>   element collection (e.g, all elements of the same tag name) can be
>   distinguished by their identifier attributes.
>
> - Elements can have subelements as identifiers.  Two elements in an
>   element collection can be distinguished by their identifier
>   subelements.
>
> - Different collections of elements have different symbol spaces or
ID/IDREF tables.
>
> - A unique element in an element collection can be referenced by
specifying
>   its identifier by an attribute or element.  (IDREF)
>
> - A sequence of elements in an element collection can be referenced by
specifying
>   their identifiers by an attribute or element.  (IDREFS)
>
> - Application programs shall be able to determine which element is
referenced by
>   examining an identifier element or attribute.
>
> - Comparison of identifiers is done by using datatype information.
>
> - Multi-part keys are outside the scope of this specification.
>
> - Scoped keys are  outside the scope of this specification.

As with the previous section, I would remove this section, because I think
our goals with respect to ID/IDREF/IDREFS should be derivable from our
general goals, specifically XML 1.0 compatibility.

The goals you've stated go far beyond XML 1.0.  I think we should stick to
XML 1.0.

It is vital for true compatibility that an "annotation processor" can modify
the infoset to make the [attribute type] of attribute info items be ID,
IDREF or IDREFS.  If an annotation processor makes such changes, then I
think we must comply with goal 5 above.  For example, if a valid document
has an IDREF with normalized value "1", then I think there must be an ID
with normalized value "1" as well: it is not sufficient for there to be an
ID with normalized value "1.0".  This means that comparison of identifiers
should be done exactly as in XML 1.0, and should NOT take into account
datatype information (which I am opposed to in any case, because it goes
beyond XML 1.0).

If we were going to go beyond XML 1.0, the first thing I would want to do
would be to make the ambiguity constraint be context sensitive (as in the
last RELAX NG draft).  I think this is a far more serious limitation than
being limited to the "token" datatype.  For example, if in the schema you
have

<grammar>
  <start>
    <ref name="any"/>
  </start>
  <define name="any">
    <element>
      <anyName/>
      <zeroOrMore>
        <choice>
          <attribute>
            <anyName/>
          </attribute>
          <text/>
          <ref name="any"/>
        </choice>
      </zeroOrMore>
    </element>
  </define>
</grammar>

and somewhere else in the schema you have:

<element name="foo"
  <attribute name="id" a:attributeType="ID"/>
</element>

you will have an error.

We need to get this annotations spec out rapidly, and I think the only way
we will be able to do that is to limit ourselves narrowly to XML 1.0.
Perhaps the spec should be called "RELAX NG DTD Compatibility Annotations"
to emphasize this.

James
Follow-Ups:
- Re: Common annotations first draft
  - From: Murata Makoto <mura034@attglobal.net>
References:
- Common annotations first draft
  - From: James Clark <jjc@jclark.com>
- Re: Common annotations first draft
  - From: Murata Makoto <mura034@attglobal.net>