relax-ng message

Subject: Re: Common annotations first draft
From: Murata Makoto <mura034@attglobal.net>
To: relax-ng@lists.oasis-open.org
Date: Sat, 04 Aug 2001 13:19:32 +0900
James Clark wrote:

> > I think that we should agree on publication of this document int our next
> teleconference.
> 
> I don't know whether we will be able to settle enough issues by next week to
> allow publication as a committee document.

I think that even publication as a personal document should be discussed at the TC.

> This is an important question to ask.  My one-sentence statement of the
> goals would be as follows:
> 
> "The goal of this specification is to facilitate transition from XML 1.0
> DTDs to RELAX NG by providing annotations for some of the features of XML
> 1.0 DTDs that are not provided by RELAX NG."

I like this.
 
> I think we should firmly limit the scope to XML 1.0 DTD features; otherwise
> the floodgates are open to including any random feature.  I count include
> the a:documentation element as an XML 1.0 feature, in that it corresponds to
> XML 1.0 DTD comments: a:documentation is the element/attribute markup
> counterpart of DTD comments, just as the RELAX NG defined elements are the
> element/attribute markup counterpart of other DTD markup declarations.

Good.

> If we adopt this approach to stating our goals, then I think we need to
> explain why we aren't supporting other DTD features.  For example, we should
> say why we aren't supporting entity declarations and notation declarations.

I agree.  Why don't we support replacement of skipped entity references with 
a mixed sequence?  (BTW, I certainly do not want to support entity and notation 
declarations.)

A very basic principle of RELAX Core is that the information set is not touched 
at all.  Do we decide to change the information set by annotations (e.g., default 
values)?  If so, why don't we also support entities?
 
> > 2. Goals
> >

> > - Use of default values and ID/IDREF/IDREFS shall not be confused by
> >   ambiguous grammars.  When a grammar is ambiguous, it is not possible
> >   to uniquely determine an <attribute> pattern for each attribute and an
> >   <element> pattern for each element.  (Note:  This makes everything
> >   hard.)
> 
> I think is another important issue, but I think it needs a bit of expansion.

Definitely yes.

> To start with, I think we also need to mention the fact that something may
> be unambiguous, but lookahead or multiple passes may be required to do the
> assignment.  I also wouldn't quite describe dealing with ambiguity as a
> goal. It's more of a fact: it is possible to have a RELAX NG grammar that is
> ambiguous, so the annotation spec has to address this possibility by
> imposing some sort of restrictions.  Where we need a set of goals is in
> guiding our choice of restrictions. For example,
> 
> 1. It must be possible to straightforwardly implement the restrictions.
> 2. Processing of the instance should not require lookahead or multiple
> passes
> 3. The restrictions should be statically checkable (ie the schema should be
> checkable independent of any instance)
> 4. The restrictions should not be any more restrictive than necessary
> 5. The transformed infoset should be XML 1.0 compatible: it must be an
> infoset that could have been produced by a validating XML 1.0 parser for
> some DTD.
> 
> 4 conflicts with 1-2 to some extent, and we need to use our collective
> judgement to balance them.

I like this list.  It is a very nice thing to have in our spec.

> > 2) Attributes
> >
> > - Attributes can be defaultable but elements cannot be defaultable.
> >
> > - It should be possible to examine default values against datatypes.
> >
>  - When a defaultable attribute is missing in an information set, it
> >   should be possible to change the information set by adding default
> >   values.
> >
> > - When a defaultable attribute is missing in an information set, it
> >   should be possible for application programs to use default values.
> >   For example, it should be possible to generate Java classes from
> >   RELAX NG grammars with default value annotation and embed default
> >   values in the Java classes.  Such Java classes do not require
> >   changed information sets.
> 
> I would remove this section, because I think our goals with respect to
> attribute defaults should be derivable from our general goals, specifically
> XML 1.0 compatibility.

I think that the last one is important and is not derivable from our general goals.  
In the case of XML, it is clear that everything is done by the XML processor.  
But we are not forced to everything with the RELAX NG processor.  (This also 
relates to my conern about the conformance section.)  In fact, this is the main 
motivation for creating this list.  What is appropriate layering?

> I'm not clear what your 2nd bullet means.

"true" is not a good default for integers.

> > 3) ID/IDREF/IDREFS
> >
> > - Elements can have attributes as identifiers.  Two elements in an
> >   element collection (e.g, all elements of the same tag name) can be
> >   distinguished by their identifier attributes.
> >
> > - Elements can have subelements as identifiers.  Two elements in an
> >   element collection can be distinguished by their identifier
> >   subelements.
> >
> > - Different collections of elements have different symbol spaces or
> ID/IDREF tables.
> >
> > - A unique element in an element collection can be referenced by
> specifying
> >   its identifier by an attribute or element.  (IDREF)
> >
> > - A sequence of elements in an element collection can be referenced by
> specifying
> >   their identifiers by an attribute or element.  (IDREFS)
> >
> > - Application programs shall be able to determine which element is
> referenced by
> >   examining an identifier element or attribute.
> >
> > - Comparison of identifiers is done by using datatype information.
> >
> > - Multi-part keys are outside the scope of this specification.
> >
> > - Scoped keys are  outside the scope of this specification.
> 
> As with the previous section, I would remove this section, because I think
> our goals with respect to ID/IDREF/IDREFS should be derivable from our
> general goals, specifically XML 1.0 compatibility.

Again, which should be covered by application programs, which should be covered 
by a thin layer between the RELAX NG processor and application programs, and 
which should be covered by the RELAX NG processor?  To discuss about layering, 
I think that we need a more concrete list.

> The goals you've stated go far beyond XML 1.0.  I think we should stick to
> XML 1.0.

That is fine to me.  In fact, I agree very strongly.

> It is vital for true compatibility that an "annotation processor" can modify
> the infoset to make the [attribute type] of attribute info items be ID,
> IDREF or IDREFS.

This is an interesting observation.  But even when something is declared as CDATA 
in accompanying DTDs, should the RELAX NG processor should override it?

>  If an annotation processor makes such changes, then I
> think we must comply with goal 5 above.  For example, if a valid document
> has an IDREF with normalized value "1", then I think there must be an ID
> with normalized value "1" as well: it is not sufficient for there to be an
> ID with normalized value "1.0".  This means that comparison of identifiers
> should be done exactly as in XML 1.0, and should NOT take into account
> datatype information (which I am opposed to in any case, because it goes
> beyond XML 1.0).

Then, we might want to impose restrictions on datatypes.  For example, 
we can attach ID/IDREF/IDREFS annotations only when the datatype is "token".

>For example, if in the schema you
> have
> 
> <grammar>
>   <start>
>     <ref name="any"/>
>   </start>
>   <define name="any">
>     <element>
>       <anyName/>
>       <zeroOrMore>
>         <choice>
>           <attribute>
>             <anyName/>
>           </attribute>
>           <text/>
>           <ref name="any"/>
>         </choice>
>       </zeroOrMore>
>     </element>
>   </define>
> </grammar>
> 
> and somewhere else in the schema you have:
> 
> <element name="foo"
>   <attribute name="id" a:attributeType="ID"/>
> </element>
> 
> you will have an error.

True.  I am still happy.

> We need to get this annotations spec out rapidly, and I think the only way
> we will be able to do that is to limit ourselves narrowly to XML 1.0.
> Perhaps the spec should be called "RELAX NG DTD Compatibility Annotations"
> to emphasize this.

I like this proposal very much.

Cheers,

Makoto
Follow-Ups:
- Re: Common annotations first draft
  - From: James Clark <jjc@jclark.com>
References:
- Re: Common annotations first draft
  - From: James Clark <jjc@jclark.com>