relax-ng message

Subject: Relationship among our data model, patterns, and XML 1.0
From: Murata Makoto <mura034@attglobal.net>
To: relax-ng@lists.oasis-open.org
Date: Mon, 11 Jun 2001 23:20:14 +0900
Murata Makoto wrote:

> I have two completely different answers.  One of them requires whole-sale 
> reengineering and I will write more about it in a different thread.

Note: This mail is based on productive discussion with Hosoya-san.
All good ideas are his and all mistakes are mine.


1. The data model, patterns, and XML 1.0

First, we consider the relationship among our data model, patterns,
and XML 1.0.  There are three positions.

1.1 Position 1

Out data model is different from XML 1.0.  That is, attributes can
contain subordinate elements, and an element can have multiple
attributes of the same name.  For example,

	foo="<bar/>"

and

	<... bar="1" bar="2">

are fine.

Our patterns represent sets of trees permitted by our data model.  Thus, 

	<attribute name="foo">
	  <element name="bar"><empty/></element>
	</attribute> 
and

	<element name="foo">
	  <attribute name=bar"/>
	  <attribute name="bar"/>
	</element> 

are fine.

The validation algorithm MUST work even when they receive foo="<bar/>"
and <... bar="1" bar="2">.  (I believe that JTREX does work.)

When we consider the equivalence of subset relationship of patterns, we 
compare the extension of patterns in our data model rather than that 
in XML 1.0.  In other words, 

	<attribute name="foo">
	  <optional><element name="bar"><empty/></element></optional>
	</attribute>

and

	<attribute name="foo/> 

are different.  Likewise,

	<element name="foo"><attribute name=bar"/></element> 
and

	<element name="foo">
	  <attribute name=bar"/>
	  <optional><attribute name="bar"/></optional>
	</element> 

are different.


1.2 Position 2
	
Out data model is the same as XML 1.0.  That is, attributes containing
subordinate elements are disallowed, and an element cannot have
multiple attributes of the same name.  For example, foo="<bar/>" and
<... bar="1" bar="2"> cannot be represented in our data model.

Our patterns MUST NOT generate trees disallowed by XML 1.0.  Thus, 

	<attribute name="foo">
	  <element name="bar"><empty/></element>
	</attribute> 

is an illegal pattern.  Moreover, 

	<attribute name="foo">
	  <optional><element name="bar"><empty/></element></optional>
	</attribute> 

is also an illegal pattern.  For the same reason, 

	<element name="foo">
	  <attribute name="bar"/>
	  <attribute name="bar"/>
	</element>

is illegal and 

	<element name="foo">
	  <attribute name="bar"/>
	  <optional><attribute name="bar"/></optional>
	</element>

is also illegal.

Our validation algorithm does not work when they receive foo="<bar/>"
and <... bar="1" bar="2">.  Such data are simply broken (as broken as
<foo><bar></foo></bar>).

When we consider the equivalence of subset relationship of patterns,
we compare the extension of patterns in XML 1.0.


1.3 Position 3
	
Out data model is the same as XML 1.0.  Attributes containing
subordinate elements are disallowed.  An element cannot have multiple
attributes of the same name.  For example, foo="<bar/>" and
<... bar="1" bar="2"> cannot be represented in our data model.

Our patterns MAY generate trees disallowed by XML 1.0.  Thus, 

	<attribute name="foo">
	  <element name="bar"><empty/></element>
	</attribute> 

is useless but not illegal.  Moreover, 

	<attribute name="foo">
	  <optional><element name="bar"><empty/></element></optional>
	</attribute> 

is legal and is not entirely useless.  For the same reason,

	<element name="foo">
	  <attribute name="bar"/>
	  <attribute name="bar"/>
	</element>
and

	<element name="foo">
	  <attribute name="bar"/>
	  <optional><attribute name="bar"/></optional>
	</element>

are legal.

Our validation algorithm does not work when they receive foo="<bar/>"
and <... bar="1" bar="2">.

When we consider the equivalence or subset relationship of patterns,
we compare the extension of patterns in XML 1.0.  In other words,

	<element name="foo">
	  <attribute name="bar"/>
	  <optional><attribute name="bar"/></optional>
	</element>

and

	<element name="foo">
	  <attribute name="bar"/>
	</element>

are equivalent.


1.4 Discussion

TREX adopts position 1.  This position is consistent.  By alloing 

<oneOrMore>
 <group>
   <attribute><anyName></attribute>
   <element><anyName><empty/></element>
 </group>
</oneOrMore>,

TREX is more expressive than the class of string/tree regular 
languages (this example requires counting).   I think that 
this expressiveness is excessive, but JTREX does not have any 
problems in validation.

As for testing of the equivalence or subset relationship of patterns,
I cannot think of any algorithms.  Such testing might be simply
impossible.

Position 2 is also consistent, and is in perfect alignment with XML
1.0.  Although this position is my favorite, it requires a lot of
changes to <attribute>.  I will introduce the required changes in the
next section.

Position 3 is inconsistent.  One consequence of this inconsistency is
that it is very difficult to test the equivalence or subset
relationship of patterns.  For example,

<oneOrMore>
  <attribute name="a"/>
  <attribute name="b"/>
</oneOrMore>

and 

<group>
  <optional>
    <attribute name="a"/>
  </optional>
  <optional>
    <attribute name="b"/>
  </optional>
</group>

should be equivalent.  I have no ideas about algorithms for detecting
such equivalence and think that this is an ugly problem.


2.  Attribute declarations in Position 2

In position 2, we CANNOT allow 

	<oneOrMore>
	  <attribute><anyName></attribute>
	</oneOrMore>, 

since it generates multiple attributes of the same name, 
for example <... a="" a="" a="" a="">.   In general,
<oneOrMore> and <zeroOrMore> MAY NOT contain <attribute>.

Instead, we need a construct which generates a collection of
NON-COLLIDING attributes.  XML Schema already has such a construct.  I
was planning to introduce such a construct to RELAX Core.

I would suggest 

	<multipleAttributes ....>

its syntax is the same as <attribute>.  However, the name class in a
<multipleAtributes> MUST generate an infinite set of names.  If not, 
it is a syntax error.

Now, all problems about <oneOrMore>..<attribute>...</oneOrMore> are
gone.  The validation algorithm sketched in my mail [1] works
perfectly well.

I have not fully considered algorithms for checking the equivalence of
subset relationship of patterns.  Rather, I am counting on Hosoya-san.
I am very optimistic, since name classes are closed under boolean operations.


[1] http://lists.oasis-open.org/archives/relax-ng/200106/msg00118.html

Cheers,

Makoto
Follow-Ups:
- Re: Relationship among our data model, patterns, and XML 1.0
  - From: James Clark <jjc@jclark.com>