[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Guarded normal form (part 2)
Guarded normal form (part 2) MURATA Makoto 6. Equivalence and subset relationship Given two schemas, we sometimes want to examine if they permit the same set of documents. Or, we might want to examine if the set of documents permitted by the first is a subset of that by the second. Query language and programming languages for XML require such operations. In DTDs, we can compare attribute list declarations and then compare content models. Comparison of content models is not strightfoward, since (a?, b?) is identical to ((a, b?)?), for example. However, we can compare content models by creating a string automaton from a content model. (We also need determinization and minimization of string automata). In schema languages mimicking hedge regular grammars, testing equivalence or subset relationship requires hedge automata. One straightfoward algorithm converts hedge automata to binary tree automata, and determinizes and minimizes them. An interesting algorithm developed in the XDuce project does NOT detemrinimizes binary tree automata. JamesC has unified attribute list declarations and content models in a single pattern. Although this unificaiton is very powerful in capturing interdependencies of attributes and elements, examining equivalence or subset relationship becomes much more difficult. For example, <element name="bar"> <attribute name="foo"/> <element name="bar"><anyString/></element> </element> and <element name="bar"> <attribute name="foo"/> <element name="bar"><anyString/></element> </element> are equivalent. Moreover, <element name="bar"> <oneOrMore> <attribute name="foo"/> </oneOrMore> </element> and <element name="bar"> <attribute name="foo"/> </element> are equivalent. GNF normalization makes it easy to test equivalence and subset relationship of patterns. After GNF normalization, we can apply techniques developped for hedge automata (and binary tree automata). 7. Non-lazy construction of hedge automata GNF normalization allows non-lazy construction of hedge automata. Intuitively, guards represent terminal symbols and clauses collectively represent hedge automata. After examining a start tag, we can identify a set of terminal symbols for that start tag, and then continue evaluation of a hedge automata. In fact, such implementation is very similar to some implementations of RELAX Core. 8. Combinatorial explosion Alghouth GNF has many advantages, normalization may lead to combinatorial explosion. For example, consider the following pattern. <group> <choice> <element name="a"><anyString/></element> <attribute name="a"/> </choice> <choice> <element name="a"><anyString/></element> <attribute name="a"/> </choice> ... <choice> <element name="z"><anyString/></element> <attribute name="z"/> </choice> </group> (Note: <interleave> is probably more common.) The first (or second) <choice> specify that we can have either a subelement "a" (resp., "b) or attribute "a" (resp. "b"). This pattern has 26 such <choice> patterns. Such patterns may arise very frequently. To convert this pattern to the GNF, we need a guard-clause pair for each subset of {@a, @b, ..., @z}. The number of these subsets is 2^26! 9. Discussion I once wrote: > On the other hand, some constructs of TREX make it hard to actually > create tree automata in advance. I am concerned, since creation of tree > automata is required for type inference (note: XML Query people plan to > incorporate type inference into their query language). I am hoping that > introduction of reasonable restrictions will make it possible to construct > tree automata in advance. My observataions are shown below: 1) GNF normalization allows use of all techniques developed for tree automata. 2) We need restrictions on <oneOrMore>. 3) If we freely use <attribute> in content models, GNF normalization will explode. Cheers, Makoto
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC