OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: Maybe it's too late, but...

As for the list of tokens...

My implementation experiment reveals that the following syntax is very
easy for the implementation and has greater expressiveness.

  any RELAX NG pattern except element/attribute.

For example, things like

    <data type="xsd:integer" />


  <!-- foo cannot have element/attribute descendants -->
  <ref name="foo" />

By using this proposal, the current <zeroOrMoreToken> P
</zeroOrMoreToken> is expressed as


From the view point of implementations, a residual of a <token> pattern
by a string token S is defined as

function residual( <token> P </token>,  S ) {
   Let {t1,t2,..., tn} be tokenization of S.
   if( residual( residual( residual( P, t1 ), t2 ) ..., tn ) == <empty/> )
      return <empty/>
      return <notAllowed/>

Therefore, <token> has the minimal impact on the complexity of the spec
(and implementations.)

I think the followings are the problems of the current <oneOrMoreToken>.

- First, I thought of the possibility to parse <oneOrMoreToken> as
  the list datatype of XSD. But it is difficult because of the pattern
  like <oneOrMoreToken><ref name="..."/></oneOrMoreToken>

- Then I thought of the possibility to implement a datatype that keeps
  a pattern as its body. In this way, <oneOrMoreToken> can be
  implemented as

function residual( <oneOrMoreToken> P </oneOrMoreToken>,  S ) {
   Let {t0,t1,..., tn-1} be tokenization of S.
   for( i=0; i<n; i++ )
     if( residual( P, ti ) != <empty/> )
      return <notAllowed/>
   return <notAllowed/>

- Then I found that there is really no reason to prohibit a sequence of
  data inside <oneOreMoreToken>. And in fact it is useful. The above
  implementation can correctly handle
      <data type="xsd:integer"/>
  The reason why we have to prohibit a sequence of data is we can't know
  how to split one big character sequence into sub-sequences.
  But as you see, oneOrMoreToken knows how to split them. So in fact
  there is no problem.

- For the above reasoning, there is no reason to prohibit plain
  <oneOrMore> within <oneOrMoreToken>. That implies <oneOrMoreToken>
  does not necessarily implement the "one-or-more" semantics.
  Instead, it can simply split one big string into sub-sequences.
- This observation leads me to this proposal.

E-Mail: kohsukekawaguchi@yahoo.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC