relax-ng message

Subject: Re: [relax-ng] Limitation in the compact syntax
From: James Clark <jjc@jclark.com>
To: John Cowan <jcowan@reutershealth.com>
Date: Sun, 09 Jun 2002 13:29:04 +0700
>> Round-tripping (compact -> xml -> compact) while preserving special
>> syntax  for RNG elements embedded in annotations would be tricky: you
>> would have to  figure out the maximal subtree that corresponded to a
>> syntactically legal  pattern (or name class?) and then use compact
>> syntax for such subtrees,  recursively handling annotations inside such
>> subtrees.
>
> *Perfect* round-tripping would require that, yes.  (Note that XML > RNC >
> XML already doesn't work correctly.)

I assume you are referring to the lack of local namespace declarations in 
the compact syntax. I think that is a very different kind of imperfection 
to the one you are contemplating. XML -> compact -> XML works just fine 
except for perverse XML documents that use prefixes inconsistently.  More 
importantly, such perverse XML documents can be cured of their perversion 
by a person/tool that understands the semantics of datatypes and 
annotations.

I think the possibility of a compact -> XML -> compact round-trip is a key 
part of the value proposition of the dual XML/compact syntaxes.  It allows 
you to have your cake and eat it too.  You get all the advantages of a 
pleasant, human-readable compact syntax, but you retain the ability to 
apply general-purpose XML processing tools whenever you wish.  For example 
you can use XSLT to transform a schema in the compact syntax by 
transforming to the XML syntax, applying XSLT and transforming back to the 
compact syntax.

> I would be content if conversion
> from XML syntax to compact syntax didn't regenerate embedded RNG within
> annotations

I don't think this is acceptable.  That means that you affectively lose the 
ability to use XML tools to manipulate the compact syntax whenever to use 
embedded RNG in annotations feature.

After thinking abou this some more, I think there is a fundamental 
conceptual barrier to regenerating embedded RNG in annotations during XML 
-> compact syntax conversion.  The problem is that the compact syntax is 
not a representation of the XML syntax rather it is a representation of the 
same semantics that are represented by the XML syntax.  For example, the 
compact syntax represents both

  <element name="foo"><empty/></element>

and

  <element><name>foo</name><empty/></element>

by

  element foo { empty }

and all of

  <choice><ref name="foo"/></choice>
  <group><ref name="foo"/></choice>
  <interleave><ref name="foo"/></interleave>

by

  (foo)

Now, RELAX NG defines no semantics for elements and attributes occurring in 
annotations. They are just elements and attributes, no more.  This includes 
elements from the RELAX NG namespace.  A correct translation from the XML 
syntax to the compact syntax must ensure that the annotations after 
translation represent exactly the same elements and attributes as the 
annotations before translation.  If it doesn't do that, it is making 
assumptions about the semantics of annotations that the RELAX NG 
specification provides no basis for.  Translating annotations in the XML 
syntax into the embedded RNG syntax in the compact syntax would not 
preserve elements and attributes because the compact syntax is not a 
representation of elements and attributes but a representation of an 
abstract structure which can have multiple representations as elements and 
attributes.

The underlying problem is that your proposed feature is going beyond what 
is expressible in the XML syntax.  By using the embedded RNC syntax in 
annotations you are effectively making an assertion about the semantics of 
the annotation, that it's OK to manipulate it as if it had the semantic of 
RELAX NG.  But the XML syntax provides no way to make such an assertion.

Another complication with this embedding idea is that there is not just one 
kind of thing that one might want to embed.  One might wish to embed any of 
the following:

- a complete schema, along with declarations
- a pattern
- a name class
- a sequence of grammar components (things that can occur in a grammar)

You cannot automatically detect between these: in particular, the same 
sequence of tokens can be interpreted both as a name class and a pattern.

>> I think this would add significant complexity to an implementation and
>> more  importantly to the specification, without providing much practical
>> benefit.
>
> I think it will be very important for any kind of layered protocol over
> RNG.  I agree that ooRNG as written does not require this, but it does
> not follow that there will never be such things

Let's suppose ooRNG had been done differently, so that we had rng:grammar 
elements containing os:class elements that in turn contained rng:* 
elements.  Is that the kind of thing you are thinking of?  In such a case, 
the ooRNG schema might well conform to the RNG schema.   However, it 
wouldn't semantically be RELAX NG.  If I fed it to jing -c it, wouldn't 
work.  It would need to be translated first.  I think this is out of scope.

I also thing that if you are doing something like ooRNG, then the syntax 
you get using embedded RNG in annotations would not be at all 
user-friendly.  It would be much better to design a proper, integrated 
syntax such as I illustrated for ooRNG.

I remain of the opinion that this would be hugely complicated for both the 
specification and the implementation but would be of little practical value.

James
Follow-Ups:
- Re: [relax-ng] Limitation in the compact syntax
  - From: John Cowan <cowan@mercury.ccil.org>
References:
- Re: [relax-ng] Limitation in the compact syntax
  - From: John Cowan <jcowan@reutershealth.com>