[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Alternative approach for key/keyref
Having seen 7.4 of the spec, I would like to point out another possibility for keys and keyrefs. Most proposals for keys utilize simplified path expressions. Advanced features (e.g., multi-part keys) can be developed on top of such path expressions. Path expressions are easy to evaluate. Validators can easily validate documents. More importantly, *application programs* can easily find out which element is a key. However, use of such path expressions together with RELAX NG is a challenge. We have to ensure datatypes of keys are unambiguous; that is, for each key, any element/attribute of this key is required to have a unique datatype. How can we do this for RELAX NG grammars, which may provide more than one interpretation per document? We can examine datatype ambiguity by rewriting grammars [1]. Given a grammar and a simple path expression, we create a grammar such that (1) it accepts exactly what the original grammar accepts and (2) some of its named patterns are marked, and (3) an element matches a marked named pattern if and only if that element satisfies the path expression. Intuitively speaking, the new grammar simulates the original grammar (in the bottom-up manner) as well as the path expression (in the top-down manner). Formally put, we rewrite grammars while constructing a DFA from path expressions. DFA construction is done by computing derivatives. We introduce a new named pattern for a named pattern and a state of the DFA. We demonstrate such rewriting. Consider a path expression "/doc/foo@number" and a grammar as below: <grammar ...> <start><ref name="doc"/></start> <define name="doc"> <element name="doc"> <ref name="foo"/> <ref name="bar"/> </element> </define> <define name="foo"> <element name="foo"> <attribute name="number"> <data type="xsd:short"/> </attribute> <zeroOrMore> <ref name="foo1"/> </zeroOrMore> </element> </define> <define name="foo1"> <element name="foo"> <attribute name="number"/> <zeroOrMore> <ref name="foo1"/> </zeroOrMore> </element> </define> <define name="bar"> <element name="bar"> <ref name="foo1"/> </element> </define> </grammar> First, we have a start state, say #0. Our <start> has only one child <ref name="doc"/>. There is only one <define> for "doc", which has <element name="doc"> ...</element> as the child. The derivative of /doc/foo@number by "doc" is foo@number. This derivative corresponds to a new state, say #1. By combining the non-terminal in the original grammr "doc" and #1, we have a new non-terminal "doc#1". A new <start> is: <start><ref name="doc#1"/></start> Now consider the content model of the first <define> element. It contains <ref name="foo"/> and <ref name="bar"/>. The element names for these two named patterns are "foo" and "bar". The derivative of foo@number by foo is @number. This derivative corresponds to a new state, say #2. The derivative of /foo@number by bar is the empty set, which corresponds to a deadend state, say #3. Thus, we have a new <define> as below: <define name="doc#1"> <element name="doc"> <ref name="foo#2"/> <ref name="bar#3"/> </element> </define> Now, we consider the content model of the second <define> element. It has <ref name="foo1"/> and <attribute name="number">. There is only one <define> for "foo1", which has <element name="foo"> ...</element> as the child. The derivative of @number by "foo" is the empty set, which corresponds to #3. <define name="foo#2"> <element name="foo"> <attribute name="number"> <data type="xsd:short"/> <!-- match --> </attribute> <zeroOrMore> <ref name="foo1#3"/> </zeroOrMore> </element> </define> The derivative of @number by @number is the empty pattern, which is a final state. Thus, <attribute name="number"> in the above <define> matches the given path expression. Now, we consider the content model of the third <define> element. As does the previous <define>, it has <ref name="foo1"/> and <attribute name="number">. The derivative of the empty set by "foo" (which is the element name for "foo1") is the empty set (#3). <define name="foo1#3"> <element name="foo"> <key =" <attribute name="number"/> <zeroOrMore> <ref name="foo1#3"/> </zeroOrMore> </element> </define> Since the derivative of the emptyset by <attribute name="number"/> is the empty set, this <attribute> does not match the given path expression. Now, we consider bar#3. The <define> element for the non-terminal "bar" has <element name="bar">. The content model has <ref name="foo2"/>. The derivateive of the empty set by "foo" (which is the element name for "foo2") is the empty set (#3). Thus, we reference to a named pattern "foo1#3", which is already defined. Therefore, we have a new grammar as below: <grammar...> <start><ref name="doc#1"/></start> <define name="doc#1"> <element name="doc"> <ref name="foo#2"/> <ref name="bar#3"/> </element> </define> <define name="foo#2"> <element name="foo"> <attribute name="number"> <!-- match --> <data type="xsd:short"/> </attribute> <zeroOrMore> <ref name="foo1#3"/> </zeroOrMore> </element> </define> <define name="foo1#3"> <element name="foo"> <key =" <attribute name="number"/> <zeroOrMore> <ref name="foo1#3"/> </zeroOrMore> </element> </define> <define name="bar#3"> <element name="bar"> <ref name="foo1#3"/> </element> </define> </grammar> We have found that only one <attribute> in this grammar matches the given pattern. Thus, the datatype for "doc/foo@number" is unique. Suppose that we have changed the original grammar. We replace <ref name="foo1"> in the second <define> with <ref name="foo">. Then, from the path expression and the modified grammar, we get a grammar as below. Observe that this rewriting introduces two named patterns, namely foo#2 and foo#3. The former matches the pattern, while the latter does not. <grammar...> <start><ref name="doc#1"/></start> <define name="doc#1"> <element name="doc"> <ref name="foo#2"/> <ref name="bar#3"/> </element> </define> <define name="foo#2"> <element name="foo"> <attribute name="number"> <!-- match --> <data type="xsd:short"/> </attribute> <zeroOrMore> <ref name="foo#3"/> </zeroOrMore> </element> </define> <define name="foo#3"> <element name="foo"> <attribute name="number"> <data type="xsd:short"/> </attribute> <zeroOrMore> <ref name="foo#3"/> </zeroOrMore> </element> </define> <define name="foo1#3"> <element name="foo"> <key =" <attribute name="number"/> <zeroOrMore> <ref name="foo1#3"/> </zeroOrMore> </element> </define> <define name="bar#3"> <element name="bar"> <ref name="foo1#3"/> </element> </define> </grammar> [1] Makoto Murata, Extended Path Expressions for XML, PODS 2001. http://www.trl.ibm.com/projects/xml/hedge/murataPODS01final.pdf Cheers, Makoto
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC