[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

*Subject*: **Re: [relax-ng] Algebraic characterization of RELAX NG patterns**

*From*:**James Clark <jjc@jclark.com>***To*: Murata Makoto <mura034@attglobal.net>, relax-ng@lists.oasis-open.org*Date*: Mon, 22 Oct 2001 10:15:00 +0700

Thanks for taking the time to explain this. I found it quite hard to understand this. I *think* I do now understand it. Let me check my understanding by attempting a more informal, intuitive explanation. Hopefully, Murata-san will correct me if I have misunderstood. Suppose we have some string regular expression R. We can consider this to define an equivalence relation over strings, where two strings are equivalent if they always affect the match of R in exactly the same way. More precisely, two strings x and y are equivalent if and only if for all strings pref and suff, either both pref + x + suff and pref + y + suff match R or neither match R (+ means string concatentation). Now we can consider the abstract structure whose members are equivalence classes of strings under this equivalence relation, and which has a single operation corresponding to concatenation on the underlying strings. This abstract structure will have the properties of a "monoid" i.e. there is an identity (corresponding to the equivalence class containing the empty string) and the operation is associative. We also have a mapping from strings to this abstract structure: each string maps to the equivalence class that contains it. Suppose we have a language that is not describable by a regular expression, e.g. the classic example of the set of strings with balanced parentheses. We can consider the monoid for this in the same way as with the regular expression. Since the string with N open parentheses will affect a match differently from the string with N+1 open parentheses, the monoid in this case will have an infinite number of members. In fact, a language having a corresponding monoid that is finite is exactly equivalent to the language being regualar. Now to extend this to RELAX NG we have to deal with (a) attributes (b) the fact that we have an infinite set of possible attribute/element names To deal with (b), we want to set things up so that we are dealing with a finite alphabet just as in the string regular expression case. To do this, instead of dealing particular names we deal with classes of names that are equivalent as far as a RELAX NG schema is concerned. For example, if we have <element name="foo"> <element name="bar"> <empty/> </element> <zeroOrMore> <element> <anyName/> <empty/> </element> </zeroOrMore> </element> then we have three classes of names: 1. foo 2. bar 3. anything other than foo or bar Thus we can treat this as if we had three names, for example, foo, bar, baz; in dealing with a particular instance, we map any name that isn't foo or bar onto baz. This brings on to (a). Attributes differ from elements in two ways: 1. Order isn't significant 2. You can't affect two attributes with the same name There's no problem dealing with 1. But our solution to (b) has an impact on 2. The solution to (b) makes us deal with classes of names rather than individual names. Although we cannot have two attributes with the same name, we can have two attributes whose name is from the same class. So what happens to 2? This is where the proposed restriction comes in: if we require that all infinite name classes are in one or more, then we can replace 2 by 2'. The number of occurrences isn't significant (i.e. if you have one attribute from a particular name class, having another attribute from the same name class won't affect the match) This is consistent with the solution to (b). --On 21 October 2001 22:55 +0900 Murata Makoto <mura034@attglobal.net> wrote: > Algebraic characterization of RELAX NG patterns > > 2001 10/21 > Murata Makoto > EB2M-MRT@asahi-net.or.jp > > 1. Introduction > > I have argued that infinite-nameclass <attribute> should be allowed only > when it is a descendant of <oneOrMore>. With this restriction, we have > a nice characterization theorem of RELAX NG patterns. > > 2. Regular languages and syntactic monoids > > We first consider regular languages. It is well known that regular > expressions and finite automata exactly capture regular languages. > Meanwhile, another interesting characterization is given by syntactic > monoids. > > In preparation, let $\Sigma$ be a finite set. The set of strings over > $\Sigma$ is denoted by $\Sigma^*$. > > Theorem 1: > > A subset $L$ of $\Sigma^*$ is regular if and only if > $L$ is the inverse image of $X$ under $\phi$, where > > > - $N$ is a finite monoid; that is, > > - $N$ is a finite set, > > - $N$ has a binary operation $\circ$, > > - $\circ$ has an identity $0$ (i.e., > $0 \circ n = n \circ 0 = n$ for every $n$ > in $N$), and > > - $\circ$ is associative; that is, > $(s_1 \circ s_2) \circ s_3 = s_1 \circ (s_2 \circ s_3)$, > > - $\phi$ is a homomorphism from $\Sigma^*$ to $N$; that is, > $\phi(u v) = \phi(u) \circ \phi(v)$ for every $u, v$ in $\Sigma^*$, > and > > - $X$ is a subset of $N$. > > > 3. Generalization for RELAX NG patterns > > For sake of simplicity, we ignore <text>, <list>, <data>, <value>. We > also assume that any pattern is allowed as the child of <start>. > > In prepartion, let $\Delta$ be a finite set of disjoint name classes > for attributes, and let $\Sigma$ be a finite set of disjoint name > classes for elements. > > We can nicely generalize Theorem 1 as below. > > Theorem 2: > > A subset $L$ of $(2 ^ \Delta) \times \Sigma ^*$ is described by a > RELAX NG pattern over $\Delta$ and $\Sigma$ if and only if $L$ is the > inverse image of $X$ under $\xi \oplus \phi$, where > > - $M$ is a finite set such that > > - $M$ has a binary operation $+$, > > - $+$ has an identity $0$ (i.e., > $0 + m = m + 0 = m$ for every $m$ > in $M$), > > - $+$ is associative; that is, > $(m_1 + m_2) + m_3 = m_1 + (m_2 + m_3)$, > and > > - $+$ is commutative; that is, > $m_1 + m_2 = m_2 + m_1$, and > > - $+$ is idempotent; that is, $m_1 + m_1 = m_1$ > > - $\xi$ is a homomorphism from $2^\Delta$ to $M$; that is, > $\xi(\Delta_1 \cup \Delta_2) = \xi(\Delta_1) + \xi(\Delta_2)$ > for every subset $\Delta_1, \Delta_2$ of $\Sigma^*$ > > - $N$ is a finite monoid; that is, > > - $N$ is a finite set, > > - $N$ has a binary operation $\circ$, > > - $\circ$ has an identity $1$ (i.e., > $1 \circ n = n \circ 1 = n$ for every $n$ > in $N$), and > > - $\circ$ is associative; that is, > $(n_1 \circ n_2) \circ n_3 = n_1 \circ (n_2 \circ n_3)$. > > - $\phi$ is a homomorphism from $\Sigma^*$ to $N$, that is, > $\phi(u v) = \phi(u) \circ \phi(v)$ for every $u, v$ in $\Sigma^*$ > > - $\xi \oplus \phi$ is a mapping from $2^\Delta \times \Sigma^*$ > to $M \times N$ defined as $\xi \oplus \phi(\Delta_1, u) = > (\xi(\Delta_1), \phi(, u))$ > > - $X$ is a subset of $M \times N$. > > > Note: Observe that idempotency does not hold, if we allow > infinite-nameclass <attribute> without an ascending <oneOrMore>, > > ---- > Murata Makoto EB2M-MRT@asahi-net.or.jp > > ---------------------------------------------------------------- > To subscribe or unsubscribe from this elist use the subscription > manager: <http://lists.oasis-open.org/ob/adm.pl> > >

**Follow-Ups**:**Re: [relax-ng] Algebraic characterization of RELAX NG patterns***From:*MURATA Makoto <EB2M-MRT@asahi-net.or.jp>

**References**:**[relax-ng] Algebraic characterization of RELAX NG patterns***From:*Murata Makoto <mura034@attglobal.net>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC