[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Maybe it's too late, but...
As for the list of tokens...
My implementation experiment reveals that the following syntax is very
easy for the implementation and has greater expressiveness.
<token>
any RELAX NG pattern except element/attribute.
</token>
For example, things like
<token>
<oneOrMore>
<data type="xsd:integer" />
<value>cm</value>
</oneOrMore>
</token>
or
<token>
<!-- foo cannot have element/attribute descendants -->
<ref name="foo" />
</token>
By using this proposal, the current <zeroOrMoreToken> P
</zeroOrMoreToken> is expressed as
<token>
<zeroOrMore>
P
</zeroOrMore>
</token>
From the view point of implementations, a residual of a <token> pattern
by a string token S is defined as
function residual( <token> P </token>, S ) {
Let {t1,t2,..., tn} be tokenization of S.
if( residual( residual( residual( P, t1 ), t2 ) ..., tn ) == <empty/> )
return <empty/>
else
return <notAllowed/>
}
Therefore, <token> has the minimal impact on the complexity of the spec
(and implementations.)
I think the followings are the problems of the current <oneOrMoreToken>.
- First, I thought of the possibility to parse <oneOrMoreToken> as
the list datatype of XSD. But it is difficult because of the pattern
like <oneOrMoreToken><ref name="..."/></oneOrMoreToken>
- Then I thought of the possibility to implement a datatype that keeps
a pattern as its body. In this way, <oneOrMoreToken> can be
implemented as
function residual( <oneOrMoreToken> P </oneOrMoreToken>, S ) {
Let {t0,t1,..., tn-1} be tokenization of S.
for( i=0; i<n; i++ )
if( residual( P, ti ) != <empty/> )
return <notAllowed/>
return <notAllowed/>
}
- Then I found that there is really no reason to prohibit a sequence of
data inside <oneOreMoreToken>. And in fact it is useful. The above
implementation can correctly handle
<oneOrMoreToken>
<group>
<data type="xsd:integer"/>
<value>cm</value>
</group>
</oneOrMoreToken>
The reason why we have to prohibit a sequence of data is we can't know
how to split one big character sequence into sub-sequences.
But as you see, oneOrMoreToken knows how to split them. So in fact
there is no problem.
- For the above reasoning, there is no reason to prohibit plain
<oneOrMore> within <oneOrMoreToken>. That implies <oneOrMoreToken>
does not necessarily implement the "one-or-more" semantics.
Instead, it can simply split one big string into sub-sequences.
- This observation leads me to this proposal.
regards,
----------------------
K.Kawaguchi
E-Mail: kohsukekawaguchi@yahoo.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC