OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Guarded normal form (part 1)


Guarded normal form (part 1)

MURATA Makoto

1. Introduction

This note proposes a normal form called "Guarded Normal Form" (GNF for
short).  Motivationes for GNF are as below:

	- Designing restrictions on patterns

	- Testing the equivalence and subset relationship 
	  of patterns

	- Validation implementation based on non-lazy construction 
	  of hedge automata (i.e., RELAX-like implementation).

Although GNF does have some advantages, there is bad news: 
GNF normalization may lead to combinatory explosion.


2. Definition


We first introduce guards and clauses.  A guard is either a (possibly
empty) sequence of <attribute> elements or an <oneOrMore> element
containing a single <attribute> element.  Examples are shown below:

	- <attribute name="foo"/>

	- <attribute name="foo"/> followed by <attribute name="bar"/>

	- <oneOrMore><attribute name="foo"/></oneOrMore>

A clause is a (possibly empty) sequence of patterns which do not
contain <attribute> elements as imidiate or non-imediate subordinates.

A pattern is said to be in GNF if it is a content model (a subordinate
of an <element>) in Jame's normal form
(http://lists.oasis-open.org/archives/trex/200104/msg00046.html) and
and is either

	<group>guard  clause</group> 
or

	<choice>
	  <group>guard   clause</group> 
	  <group>guard   clause</group> 
	  ...
	</choice>

For example, the following content model is in GNF.

<choice>
  <group>  <!-- (@foo, EMPTY) -->
    <attribute name="foo"/>
    <empty/>
  </group>

  <group>  <!--  (@foo, @bar, hoge) -->
    <attribute name="foo"/>
    <attribute name="bar"/>
    <ref name="hoge"/>
  </group>
</choice>

Patterns do not always have a GNF.  For example, the following pattern
does not have a guarded normal form.

	<oneOrMore>
	  <attribute><anyName/></attribute>
	  <ref name="foo"/>
	</oneOrMore>

In general, when an <oneOrMore> element contains both <attribute> and
<ref>, it does not have a GNF.

This example pattern requires counting of attributes and subordinate
elements, which is beyond the descriptive power of automata.  One
could argue that such patterns are mal-formed.


3. Advanced example

In the primitive syntax, an element ( <group>, <interleave>, <choice>,
etc.) has two subordinates.  However, we abuse the primtive syntax and
allow an element to have any number of subordinates.  We also
abbreviate <group> elements.

1) DTD-like content model

((@a, a)?, (@b, b)?, (@c, c)?)

Subordinate element "a" is present only when the attribute "a" is
present.  And so forth.

2) A pattern

<group>
  <optional><attribute name="a"/><ref name="a"/></optional>
  <optional><attribute name="b"/><ref name="b"/></optional>
  <optional><attribute name="b"/><ref name="c"/></optional>
</group>

3) An equivalent pattern in GNF

<choice>
  <group>
    <attribute name="a"/>
    <attribute name="b"/>
    <attribute name="c"/>
    <ref name="a"/>
    <ref name="b"/>
    <ref name="c"/>
  </group>
  
  <group>
    <attribute name="a"/>
    <attribute name="b"/>
    <ref name="a"/>
    <ref name="b"/>
  </group>
  
  <group>
    <attribute name="a"/>
    <attribute name="c"/>
    <ref name="a"/>
    <ref name="c"/>
  </group>
  
  <group>
    <attribute name="a"/>
    <ref name="a"/>
  </group>
  
  <group>
    <attribute name="b"/>
    <attribute name="c"/>
    <ref name="b"/>
    <ref name="c"/>
  </group>
  
  <group>
    <attribute name="b"/>
    <ref name="b"/>
  </group>
  
  <group>
    <attribute name="c"/>
    <ref name="c"/>
  </group>
  
  <group>
    <empty/>
  </group>
</choice>


4. Normalization algorithm

A GNF pattern can be assumed as a set of (guard, clause) pairs.

We first introcude a recursive program gnf.  We first 
consider trivial cases, namely <empty/>, <ref name="a">, 
and <attribute ...>...</attribute>.

1) <empty/>

gnf(<empty/>) = { (an empty-sequence, <empty/>) }

The GNF of <empty/> is <group><empty/></group>.

2) <ref name="a"/>

gnf(<ref name="a"/>) = { (an empty-sequence, <ref name="a"/>) }

The GNF of <ref name="a"/> is <group><ref name="a"/></group>.

Other simple cases such as <data> and <string> are very similar, 
and we omit them in this note.

3) <attribute ...>...</attribute>

gnf(<attribute ..>...</attribute>)
= {(<attribute ..>...</attribute>, an empty-sequence)}

4) <choice>....</choice>

Now, concentrate on important primitives, namely <choice>, <group>,
<interleave>, and <oneOrMore>

First, <choice> is straightforward.

gnf( <choice>p1 p2</choice> ) =  gnf(p1) \cup gnf(p2)

For example:

gnf(<choice><empty/><ref name="a"/></choice>) = 
 { (an empty-sequence, <empty/>) , 
   (an empty-sequence, <ref name="a"/>) }

In fact, the GNF of this pattern is:

<choice>
  <group><empty/></group>
  <group><ref name="a"/></group>
</choice>

5) <group>..</group>

Next, we consider <group> and <interleave>.  We use ":" to 
denote concatenation of <attribute> sequences.

gnf( <group>p1 p2</group> ) = 
    { (g1:g2, <group>c1 c2</group>)  |
          (g1, c1) \in gnf(p1),
          (g2, c2) \in gnf(p2)
    }

6) <interleave> ... </interleave>

gnf( <interleave>p1 p2</interleave> ) = 
    { (g1:g2, <interleave>c1 c2</interleave>)  |
          (g1, c1) \in gnf(p1),
          (g2, c2) \in gnf(p2)
    }

For example, 

gnf(<group><ref name="a"/><ref name="a"/></group>) = 
 {(an empty-sequence, <group><ref name="a"/><ref name="a"/></group>) }

and	

gnf(<interleave><ref name="a"/><ref name="a"/></interleave>) = 
 {(an empty-sequence, <interleave><ref name="a"/><ref name="a"/></interleave>) }

7) <oneOrMore>....</oneOrMore>

As for <oneOrMore>, we need a restriction.  The subordinate of a
<oneOrMore> pattern is either a single <attribute> pattern or an
<attribute>-free pattern.

case 1: a single <attribute> pattern 

gnf(<oneOrMore><attribute ..../></oneOrMore> )  = 
  {(<oneOrMore><attribute ..../></oneOrMore>, <empty/>)}

case 2: an <attribute>-free pattern

When a pattern p does not contain any <attribute> element, guards in
its GNF are empty.  Thus, we can assume that 

	gnf(p) = { (an empty-sequence, c) }

(If gnf(p) contain more than one pair, we can merge them by using
<choice>.)

Then, we have:

gnf(<oneOrMore>p </oneOrMore>)  =
  {(an empty-sequence, <oneOrMore>c</oneOrMore>)}


Once the function gnf is well-defined, it is very easy to define GNF
normalization.  We only have to create <group>guard clause</group> for
each pair and wrap such pairs with <choice>, if necessary.


5. Restrictions on patterns

JamesC's normalization can already be used to examine subordinates of
<attribute>.  GNF allows further restrictions on patterns.

1) subordinates of <oneOrMore>

The subordinates of each <oneOrMore> pattern has either a single
<attribute> pattern or a <attribute>-free pattern.  Otherwise, 
the pattern does not have a GNF.

2) duplicate <attribute>

Given a pattern p, it has duplicate <attribute> when 
some guard in its GNF has more than one <attribute> pattern 
of the same name.

For example, consider

<group>
  <choice>
    <attribute name="foo"/>
    <ref name="a"/>
  </choice>
  <choice>
    <attribute name="foo"/>
    <ref name="b"/>
  </choice>
</group>

Its GNF is:

<choice>
  <group>
    <attribute name="foo"/>
    <attribute name="foo"/>
  </group>
  <group>
    <attribute name="foo"/>
    </ref name="a"/>
  </group>
  <group>
    <attribute name="foo"/>
    </ref name="b"/>
  </group>
  <group>
    </ref name="a"/>
    </ref name="b"/>
  </group>
</choice>

Since the first <group> contains two occurrences of <attribute
name="foo"/>, this pattern has duplicate attributes.


To be continued...

Makoto


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC