relax-ng message

Subject: Guarded normal form (part 2)
From: Murata Makoto <mura034@attglobal.net>
To: trex@lists.oasis-open.org
Date: Sat, 05 May 2001 21:51:36 +0900
Guarded normal form (part 2)

MURATA Makoto

6. Equivalence and subset relationship

Given two schemas, we sometimes want to examine if they permit the
same set of documents.  Or, we might want to examine if the set of
documents permitted by the first is a subset of that by the second.
Query language and programming languages for XML require such 
operations.

In DTDs, we can compare attribute list declarations and then compare
content models.  Comparison of content models is not strightfoward,
since (a?, b?) is identical to ((a, b?)?), for example.  However, we
can compare content models by creating a string automaton from a
content model.  (We also need determinization and minimization of
string automata).

In schema languages mimicking hedge regular grammars, testing
equivalence or subset relationship requires hedge automata.  One
straightfoward algorithm converts hedge automata to binary tree
automata, and determinizes and minimizes them.  An interesting
algorithm developed in the XDuce project does NOT detemrinimizes
binary tree automata.

JamesC has unified attribute list declarations and content models in a
single pattern.  Although this unificaiton is very powerful in
capturing interdependencies of attributes and elements, examining
equivalence or subset relationship becomes much more difficult.

For example, 

<element name="bar">
  <attribute name="foo"/>
  <element name="bar"><anyString/></element>
</element>

and 

<element name="bar">
  <attribute name="foo"/>
  <element name="bar"><anyString/></element>
</element>

are equivalent.

Moreover, 

<element name="bar">
  <oneOrMore>  <attribute name="foo"/>  </oneOrMore>
</element>

and 

<element name="bar">
  <attribute name="foo"/>
</element>

are equivalent.

GNF normalization makes it easy to test equivalence and subset
relationship of patterns.  After GNF normalization, we can apply
techniques developped for hedge automata (and binary tree automata).


7. Non-lazy construction of hedge automata

GNF normalization allows non-lazy construction of hedge automata.
Intuitively, guards represent terminal symbols and clauses
collectively represent hedge automata.

After examining a start tag, we can identify a set of terminal symbols
for that start tag, and then continue evaluation of a hedge automata.
In fact, such implementation is very similar to some implementations
of RELAX Core.


8. Combinatorial explosion

Alghouth GNF has many advantages, normalization may lead 
to combinatorial explosion.  For example,  consider the 
following pattern.

<group>
<choice>
  <element name="a"><anyString/></element>
  <attribute name="a"/>
</choice>

<choice>
  <element name="a"><anyString/></element>
  <attribute name="a"/>
</choice>

...

<choice>
  <element name="z"><anyString/></element>
  <attribute name="z"/>
</choice>

</group>

(Note: <interleave> is probably more common.)

The first (or second) <choice> specify that we can have either a
subelement "a" (resp., "b) or attribute "a" (resp. "b").  This 
pattern has 26 such <choice> patterns.  Such patterns may arise 
very frequently.

To convert this pattern to the GNF, we need a guard-clause pair for
each subset of {@a, @b, ..., @z}.  The number of these subsets is
2^26!


9.  Discussion

I once wrote:

> On the other hand, some constructs of TREX make it hard to actually 
> create tree automata in advance.  I am concerned, since creation of tree 
> automata is required for type inference (note: XML Query people plan to 
> incorporate type inference into their query language).  I am hoping that 
> introduction of reasonable restrictions will make it possible to construct 
> tree automata in advance.

My observataions are shown below:

1) GNF normalization allows use of all techniques developed for tree 
   automata.

2) We need restrictions on <oneOrMore>.

3) If we freely use <attribute> in content models, GNF normalization 
   will explode.
   
Cheers,

Makoto