[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: General concern about inter-grammar references
I have been thinking about the conversion from W3C XML Schema and SOX to RELAX NG. One of the problems I found is the difference between those languages and RELAX NG about the mechanism of references. RELAX NG -------- In RELAX NG, a grammar constitutes the basic unit of references; any references can be made as long as its source(<ref name="..."/>) and its target(<define name="..."/>) are in the same grammar. grammars can be then organized into a tree structure. A grammar can have possibly multiple child grammars, and it can have at most one grammar. parent <-- --> child G1 -+- G2 --- G4 | +- G3 -+- G5 | +- G5' Sometimes one grammar is loaded several times and therefore the same grammar can appear more than once (like G5 and G5'). But conceptually they are treated as different, although a smart implementation may exploit this equality to achieve higher performance. Inter-grammar references are allowed only if its source and its target is directly connected. So a reference between G1 and G2 is OK, but G2 and G3 is not. There is further restriction. A grammar can refer to its parent grammar as many times as it wants. But the parent grammar can refer to its child grammar only once. So there is asymmetry here. This asymmetry can be partially solved by using <withGrammar>. But this resolution is only partial because you cannot write patterns like this: <group> <choice> <ref name="G2:foo"/> <!-- reference to "foo" in G2 --> <ref name="G3:bar"/> </choice> <choice> <ref name="G2:foo"/> <ref name="G3:bar"/> </choice> </group> And references between G2 and G3 or between G5 and G4 is still not allowed. W3C XML Schema / SOX -------------------- In these languages, there is no basic unit of references. They do have the concept of "schema", but the "schema" doesn't impose any restriction on the references. Any inter-schema reference can be made between any schemata. A schema is designated by an URI, and a reference is made through a (URI,local) pair, which is usually represented by a QName. S1 /|\ / | \ S2 --+-- S3 \ | / \|/ S4 So unlike RELAX NG, references are not tied to the tree structure. Consequence(1) -------------- Let's try to convert those languages into RELAX NG. Due to the above difference, we cannot convert a "schema" into a grammar. As a result, we are forced to create one monolithic grammar, which contains all definitions. This makes the name collision highly possible. Typically this results in pattern name like "{http://example.org/.../}bar". Another problem is how to assemble necessary files. Say the "schema" A references a definition in B. The problem is, we cannot write <include href="B.rng"/> in A.rng because C.rng might have a reference to B, too. If both A and C has references to B and we write <include href="B.rng"/> to both A.rng and C.rng, then it causes a collision because B.rng is included twice. So converted RELAX NG files cannot contain <include> statements. Instead, you have to create a hub file by yourself and includes all necessary files, which is a very tiresome labor. Consequence(2) -------------- Forget about the conversion and consider three grammars: ext1, ext2, and base. "base" contains a set of definitions. "ext1" and "ext2" adds some extra functionalities to the "base" module. base.rng <define name="foo"> <element name="base">...</element> </define> ext1.rng <define name="foo" combine="choice"> <element name="ext1">...</element> </define> ext2.rng <define name="foo" combine="choice"> <element name="ext2">...</element> </define> Now what you want to do is, to let "ext1"/"ext2" be used by itself. That is, you don't want to write <include href="base.rng"/> <include href="ext1.rng"/> to use "ext1". Using "ext1" should be possible by just <include href="ext1.rng"/> Using "ext2" should be possible in the same way. Also, you want to write <include href="ext1.rng"/> <include href="ext2.rng"/> to use both. But this cannot be done due to the restriction imposed on the inter-grammar references. I think Jeni Tennisson has gave us a similar feedback, regarding XHTML m12n. Conclusion ---------- It might be useful to name a included grammar and refer to it by using the name. <?xml version="1.0"?> <grammar> <grammarRef name="table" href="..."/> <grammarRef name="list" href="..."/> <define name="foo"> <choice> <ref name="xyz" grammar="table" /> <ref name="abc" grammar="list" /> </choice> </define> </grammar> If there are multiple <grammarRef> with the same name, then only the first one is loaded and the rest is ignored. I don't know if this proposal works well... regards, -- Kohsuke KAWAGUCHI +1 650 786 0721 Sun Microsystems kohsuke.kawaguchi@sun.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC