[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: A mechanism for handling inclusion/exclusion
A mechanism for handling inclusion/exclusion 11 October, 2003 MURATA Makoto [FAMILY Given] 1. Introduction Inclusion/exclusion of SGML addresses an important problem, although it was never inherited by any other schema languages. This note proposes an alternative mechanism inspired by assertion grammars [1]. Suppose that we want to disallow <a>, <em>, and <span> to have immediate or non-immediate subordinate <a>, <em>, and <span>, respectively. A schema (in the compact syntax) for this constraint is shown below: a = element a { ([ancestor::em], (([ancestor::span], text) | ([not(ancestor::span)], mixed{span*}))) | ([not(ancestor::em)], (([ancestor::sp], mixed{em*}) | ([not(ancestor::sp)], mixed{(span|em)*})))} span = element span { ([ancestor::a], (([ancestor::em], text) | ([not(ancestor::em)], mixed{em*}))) | ([not(ancestor::a)], (([ancestor::em], mixed{a*}) | ([not(ancestor::em)], mixed{(em|a)*})))} em = element em { ([ancestor::span], (([ancestor::a], text) | ([not(ancestor::a)], mixed{a*}))) | ([not(ancestor::span)], (([ancestor::a], mixed{span*}) | ([not(ancestor::a)], mixed{(a|span)*})))} This can be more compact as below: a = element a { mixed{ (([not(ancestor::em)], em) | ([not(ancestor::span)], span))* } } span = element span { mixed{ (([not(ancestor::a)], a) | ([not(ancestor::em)], em))* } } em = element em { mixed{ (([not(ancestor::span)], span) | ([not(ancestor::a)], a))* } } 2. Syntax 1) Compact syntax We only have to allow "[" aSimpleXPathExpression "]" as a pattern, where aSimpleXPathExpression is ancestor::QName or not(ancestor::QName) . Note: Is this too restrictive? 2) XML syntax We only have to allow <context exp="aSimpleXPathExpression"/> as a pattern. 3. Restrictions Just like we imposed some restrictions on attribute patterns, we probably need some restrictions. However, we can allow (([not(ancestor::a)], a) | ([not(ancestor::em)], em))* since it is easy to convert this to ([not(ancestor::a)],[not(ancestor::em)], (a | em)*) | ([not(ancestor::a)],[ancestor::em], a*) | ([ancestor::a],[not(ancestor::em)], em*) | ([ancestor::a],[ancestor::em], empty) 4. Validator implementation 1) derivative-based We only have to make a <context> pattern nullable if the specified path expression is satisfied. 2) automaton-based We only have to rewrite a <path> transition as an null transition if the specified path expression is satisfied. 5. Converstion to RELAX NG V1 Schemas with <context> patterns can be rewrriten as schemas without <context> patterns. The idea is to use match-identifying automata [2]. 1) simplification (a variation of the normalization of RNG V1) 2) construction of path automaton We create a path (string) automaton from all of the path regular expressions p1, p2, ..., pm in the given schema. We then create a path automaton M such that subsets Q1, Q2, ..., Qm of the set Q of states captures p1, p2, ..., pm, respectively. 3) rewriting the simplified schema We introduce a new <define> for each (n, q) pair, where n is a non-terminal (i.e., the value of the name attribute of <define>) and q is a state of the path automaton. The non-terminal of this new <define> is denoted n[q]. A rewritten schema uses n[q] rather than n. Every n in content models is replaced by the choice of n[q1], n[q2], ..., n[qn]. For each <define name="n">...</define>, we introduce <define name="n[q1]">...</define>, <define name="n[q2]">...</define>, and <define name="n[qn]">...</define>. Observe that states in this schema identify which path expresssion is satisfied. Thus, we can statically evaluate <context> patterns, thus providing a RNG V1 schema. [1] Raggett, http://www.w3.org/People/Raggett/dtdgen/Docs/, 1999 {2] Murata, Extended Path Expressions for XML, PODS 2001 -- MURATA Makoto <eb2m-mrt@asahi-net.or.jp>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]