relax-ng message

Subject: The relaxng-compact processing instruction
From: James Clark <jjc@jclark.com>
To: RELAX NG List <relax-ng@lists.oasis-open.org>
Date: Sun, 07 Sep 2003 13:39:49 +0700
One possibility for a mechanism for associating a schema with a document 
is to have a processing instruction that directly contains a compact 
syntax schema.   For the sake of discussion, I will use a target name of 
relaxng-compact for this.  If a user wants simply to reference an 
external schema, they can just do:

<?relaxng-compact external "myschema.rnc"?>

But they can also easily do customizations as is sometimes done in the 
internal subset.  For a one-off schema, they could even put the entire 
schema in the processing instruction.

Lexically, this works rather well. Processing instructions can contain 
anything other than ?>, but that is unlikely to occur in a compact 
schema.  The compact syntax provides its own character escaping 
mechanism which works well with the fact that processing instructions 
don't have a character escaping mechanism.  With the compact syntax 
inside a processing instruction:

- there's one and only one way to escape characters
- there's a way to allow any schema no matter what character sequences 
it contains

We also get to take advantage of the XML encoding declaration mechanism, 
which works well the fact that the compact syntax doesn't provide an 
encoding declaration syntax.

One of the arguments against processing instruction is that it doesn't 
expose the internal structure of the contents as XML, but that doesn't 
really apply here since the whole point of the compact syntax is to be 
an alternative non-XML syntax.

Another thing I like about this is that there's very little to specify. 
  There are basically only two things:

i) what target name to use
ii) where the processing instruction can go

To keep things simply I think the processing instruction should go 
before the document element; I don't want to get into specifying schemas 
for subtrees.  So I think the reasonable possibilities are

a) anywhere in the prolog except after the DOCTYPE
b) anywhere in the prolog, but in the document entity
c) anywhere in the prolog, including the external subset and external 
parameter entities

(c) creates some intriguing possibilities.  For example, the XHTML folks 
have, as far as I understand it, a couple of problems in moving from 
DTDs to schemas.  One big problem is that they need to be able to use 
character entities. Another is that they have lots of different profiles 
with the same namespace URI: they have a tradition of using the DOCTYPE 
public id identifies what profile they are using. XHTML conformance 
requires a particular DOCTYPE external id; browsers tweak their 
rendering behaviour based on particular DOCTYPEs.  With (c), you can 
create a DTD that contains just internal general parsed ENTITY 
declarations and a processing instruction containing a compact syntax 
schema.  Users can continue to create documents just as before, 
including character entities and a DOCTYPE declaration.  The OASIS 
catalog mechanism could be used to switch between different versions of 
the DOCTYPE depending on the context: perhaps one without the processing 
instruction, one with the processing instruction and another 
approximating the RELAX NG schema with complete DTD.

James
Follow-Ups:
- Re: The relaxng-compact processing instruction
  - From: Kohsuke Kawaguchi <Kohsuke.Kawaguchi@Sun.COM>