[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Guarded normal form (part 1)
Guarded normal form (part 1) MURATA Makoto 1. Introduction This note proposes a normal form called "Guarded Normal Form" (GNF for short). Motivationes for GNF are as below: - Designing restrictions on patterns - Testing the equivalence and subset relationship of patterns - Validation implementation based on non-lazy construction of hedge automata (i.e., RELAX-like implementation). Although GNF does have some advantages, there is bad news: GNF normalization may lead to combinatory explosion. 2. Definition We first introduce guards and clauses. A guard is either a (possibly empty) sequence of <attribute> elements or an <oneOrMore> element containing a single <attribute> element. Examples are shown below: - <attribute name="foo"/> - <attribute name="foo"/> followed by <attribute name="bar"/> - <oneOrMore><attribute name="foo"/></oneOrMore> A clause is a (possibly empty) sequence of patterns which do not contain <attribute> elements as imidiate or non-imediate subordinates. A pattern is said to be in GNF if it is a content model (a subordinate of an <element>) in Jame's normal form (http://lists.oasis-open.org/archives/trex/200104/msg00046.html) and and is either <group>guard clause</group> or <choice> <group>guard clause</group> <group>guard clause</group> ... </choice> For example, the following content model is in GNF. <choice> <group> <!-- (@foo, EMPTY) --> <attribute name="foo"/> <empty/> </group> <group> <!-- (@foo, @bar, hoge) --> <attribute name="foo"/> <attribute name="bar"/> <ref name="hoge"/> </group> </choice> Patterns do not always have a GNF. For example, the following pattern does not have a guarded normal form. <oneOrMore> <attribute><anyName/></attribute> <ref name="foo"/> </oneOrMore> In general, when an <oneOrMore> element contains both <attribute> and <ref>, it does not have a GNF. This example pattern requires counting of attributes and subordinate elements, which is beyond the descriptive power of automata. One could argue that such patterns are mal-formed. 3. Advanced example In the primitive syntax, an element ( <group>, <interleave>, <choice>, etc.) has two subordinates. However, we abuse the primtive syntax and allow an element to have any number of subordinates. We also abbreviate <group> elements. 1) DTD-like content model ((@a, a)?, (@b, b)?, (@c, c)?) Subordinate element "a" is present only when the attribute "a" is present. And so forth. 2) A pattern <group> <optional><attribute name="a"/><ref name="a"/></optional> <optional><attribute name="b"/><ref name="b"/></optional> <optional><attribute name="b"/><ref name="c"/></optional> </group> 3) An equivalent pattern in GNF <choice> <group> <attribute name="a"/> <attribute name="b"/> <attribute name="c"/> <ref name="a"/> <ref name="b"/> <ref name="c"/> </group> <group> <attribute name="a"/> <attribute name="b"/> <ref name="a"/> <ref name="b"/> </group> <group> <attribute name="a"/> <attribute name="c"/> <ref name="a"/> <ref name="c"/> </group> <group> <attribute name="a"/> <ref name="a"/> </group> <group> <attribute name="b"/> <attribute name="c"/> <ref name="b"/> <ref name="c"/> </group> <group> <attribute name="b"/> <ref name="b"/> </group> <group> <attribute name="c"/> <ref name="c"/> </group> <group> <empty/> </group> </choice> 4. Normalization algorithm A GNF pattern can be assumed as a set of (guard, clause) pairs. We first introcude a recursive program gnf. We first consider trivial cases, namely <empty/>, <ref name="a">, and <attribute ...>...</attribute>. 1) <empty/> gnf(<empty/>) = { (an empty-sequence, <empty/>) } The GNF of <empty/> is <group><empty/></group>. 2) <ref name="a"/> gnf(<ref name="a"/>) = { (an empty-sequence, <ref name="a"/>) } The GNF of <ref name="a"/> is <group><ref name="a"/></group>. Other simple cases such as <data> and <string> are very similar, and we omit them in this note. 3) <attribute ...>...</attribute> gnf(<attribute ..>...</attribute>) = {(<attribute ..>...</attribute>, an empty-sequence)} 4) <choice>....</choice> Now, concentrate on important primitives, namely <choice>, <group>, <interleave>, and <oneOrMore> First, <choice> is straightforward. gnf( <choice>p1 p2</choice> ) = gnf(p1) \cup gnf(p2) For example: gnf(<choice><empty/><ref name="a"/></choice>) = { (an empty-sequence, <empty/>) , (an empty-sequence, <ref name="a"/>) } In fact, the GNF of this pattern is: <choice> <group><empty/></group> <group><ref name="a"/></group> </choice> 5) <group>..</group> Next, we consider <group> and <interleave>. We use ":" to denote concatenation of <attribute> sequences. gnf( <group>p1 p2</group> ) = { (g1:g2, <group>c1 c2</group>) | (g1, c1) \in gnf(p1), (g2, c2) \in gnf(p2) } 6) <interleave> ... </interleave> gnf( <interleave>p1 p2</interleave> ) = { (g1:g2, <interleave>c1 c2</interleave>) | (g1, c1) \in gnf(p1), (g2, c2) \in gnf(p2) } For example, gnf(<group><ref name="a"/><ref name="a"/></group>) = {(an empty-sequence, <group><ref name="a"/><ref name="a"/></group>) } and gnf(<interleave><ref name="a"/><ref name="a"/></interleave>) = {(an empty-sequence, <interleave><ref name="a"/><ref name="a"/></interleave>) } 7) <oneOrMore>....</oneOrMore> As for <oneOrMore>, we need a restriction. The subordinate of a <oneOrMore> pattern is either a single <attribute> pattern or an <attribute>-free pattern. case 1: a single <attribute> pattern gnf(<oneOrMore><attribute ..../></oneOrMore> ) = {(<oneOrMore><attribute ..../></oneOrMore>, <empty/>)} case 2: an <attribute>-free pattern When a pattern p does not contain any <attribute> element, guards in its GNF are empty. Thus, we can assume that gnf(p) = { (an empty-sequence, c) } (If gnf(p) contain more than one pair, we can merge them by using <choice>.) Then, we have: gnf(<oneOrMore>p </oneOrMore>) = {(an empty-sequence, <oneOrMore>c</oneOrMore>)} Once the function gnf is well-defined, it is very easy to define GNF normalization. We only have to create <group>guard clause</group> for each pair and wrap such pairs with <choice>, if necessary. 5. Restrictions on patterns JamesC's normalization can already be used to examine subordinates of <attribute>. GNF allows further restrictions on patterns. 1) subordinates of <oneOrMore> The subordinates of each <oneOrMore> pattern has either a single <attribute> pattern or a <attribute>-free pattern. Otherwise, the pattern does not have a GNF. 2) duplicate <attribute> Given a pattern p, it has duplicate <attribute> when some guard in its GNF has more than one <attribute> pattern of the same name. For example, consider <group> <choice> <attribute name="foo"/> <ref name="a"/> </choice> <choice> <attribute name="foo"/> <ref name="b"/> </choice> </group> Its GNF is: <choice> <group> <attribute name="foo"/> <attribute name="foo"/> </group> <group> <attribute name="foo"/> </ref name="a"/> </group> <group> <attribute name="foo"/> </ref name="b"/> </group> <group> </ref name="a"/> </ref name="b"/> </group> </choice> Since the first <group> contains two occurrences of <attribute name="foo"/>, this pattern has duplicate attributes. To be continued... Makoto
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC