OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: A mechanism for handling inclusion/exclusion


A mechanism for handling inclusion/exclusion

11 October, 2003

MURATA Makoto [FAMILY Given]

1. Introduction

Inclusion/exclusion of SGML addresses an important problem, although 
it was never inherited by any other schema languages.  This note proposes 
an alternative mechanism inspired by assertion grammars [1].

Suppose that we want to disallow <a>, <em>, and <span> to have
immediate or non-immediate subordinate <a>, <em>, and <span>,
respectively.  A schema (in the compact syntax) for this constraint is
shown below:

a = element a {
      ([ancestor::em], (([ancestor::span], text) | 
                 ([not(ancestor::span)], mixed{span*}))) | 
      ([not(ancestor::em)], (([ancestor::sp], mixed{em*}) |
                      ([not(ancestor::sp)], mixed{(span|em)*})))}

span = element span {
  ([ancestor::a], (([ancestor::em], text) |
            ([not(ancestor::em)], mixed{em*}))) | 
  ([not(ancestor::a)], (([ancestor::em], mixed{a*}) |
                 ([not(ancestor::em)], mixed{(em|a)*})))}

em = element em {
  ([ancestor::span], (([ancestor::a], text) |
               ([not(ancestor::a)], mixed{a*}))) |
  ([not(ancestor::span)], (([ancestor::a], mixed{span*}) |
                    ([not(ancestor::a)], mixed{(a|span)*})))}

This can be more compact as below:

a =
  element a {
    mixed{
      (([not(ancestor::em)], em) | ([not(ancestor::span)], span))*
    }
  }

span =
  element span {
    mixed{
      (([not(ancestor::a)], a) | ([not(ancestor::em)], em))*
    }
  }

em =
  element em {
    mixed{
      (([not(ancestor::span)], span) | ([not(ancestor::a)], a))*
    }
  }


2. Syntax

1) Compact syntax

We only have to allow "[" aSimpleXPathExpression "]" as a pattern,
where aSimpleXPathExpression is

  ancestor::QName

or

  not(ancestor::QName)

.

Note:  Is this too restrictive?

2) XML syntax

We only have to allow 

  <context exp="aSimpleXPathExpression"/>

as a pattern.


3. Restrictions

Just like we imposed some restrictions on attribute patterns, 
we probably need some restrictions.  However, we can allow

      (([not(ancestor::a)], a) | ([not(ancestor::em)], em))*

since it is easy to convert this to 

([not(ancestor::a)],[not(ancestor::em)],  (a | em)*) |
([not(ancestor::a)],[ancestor::em],       a*)        | 
([ancestor::a],[not(ancestor::em)],       em*)       |
([ancestor::a],[ancestor::em],            empty)      


4.  Validator implementation

1) derivative-based

We only have to make a <context> pattern nullable if the specified path
expression is satisfied.

2) automaton-based

We only have to rewrite a <path> transition as an null transition if
the specified path expression is satisfied.


5.  Converstion to RELAX NG V1

Schemas with <context> patterns can be rewrriten as schemas 
without <context> patterns.  The idea is to use match-identifying 
automata [2].

1) simplification (a variation of the normalization of RNG V1)

2) construction of path automaton

We create a path (string) automaton from all of the path regular
expressions p1, p2, ..., pm in the given schema.  We then create a
path automaton M such that subsets Q1, Q2, ..., Qm of the set Q of
states captures p1, p2, ..., pm, respectively.

3) rewriting the simplified schema

We introduce a new <define> for each (n, q) pair, where n is a
non-terminal (i.e., the value of the name attribute of <define>) and q
is a state of the path automaton.  The non-terminal of this new
<define> is denoted n[q].

A rewritten schema uses n[q] rather than n.  Every n in content models
is replaced by the choice of n[q1], n[q2], ..., n[qn].  For each
<define name="n">...</define>, we introduce <define
name="n[q1]">...</define>, <define name="n[q2]">...</define>, and
<define name="n[qn]">...</define>.

Observe that states in this schema identify which path expresssion 
is satisfied.  Thus, we can statically evaluate <context> patterns, 
thus providing a RNG V1 schema.



[1] Raggett, http://www.w3.org/People/Raggett/dtdgen/Docs/, 1999

{2] Murata, Extended Path Expressions for XML, PODS 2001
-- 
MURATA Makoto <eb2m-mrt@asahi-net.or.jp>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]