[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Maybe it's too late, but...
As for the list of tokens... My implementation experiment reveals that the following syntax is very easy for the implementation and has greater expressiveness. <token> any RELAX NG pattern except element/attribute. </token> For example, things like <token> <oneOrMore> <data type="xsd:integer" /> <value>cm</value> </oneOrMore> </token> or <token> <!-- foo cannot have element/attribute descendants --> <ref name="foo" /> </token> By using this proposal, the current <zeroOrMoreToken> P </zeroOrMoreToken> is expressed as <token> <zeroOrMore> P </zeroOrMore> </token> From the view point of implementations, a residual of a <token> pattern by a string token S is defined as function residual( <token> P </token>, S ) { Let {t1,t2,..., tn} be tokenization of S. if( residual( residual( residual( P, t1 ), t2 ) ..., tn ) == <empty/> ) return <empty/> else return <notAllowed/> } Therefore, <token> has the minimal impact on the complexity of the spec (and implementations.) I think the followings are the problems of the current <oneOrMoreToken>. - First, I thought of the possibility to parse <oneOrMoreToken> as the list datatype of XSD. But it is difficult because of the pattern like <oneOrMoreToken><ref name="..."/></oneOrMoreToken> - Then I thought of the possibility to implement a datatype that keeps a pattern as its body. In this way, <oneOrMoreToken> can be implemented as function residual( <oneOrMoreToken> P </oneOrMoreToken>, S ) { Let {t0,t1,..., tn-1} be tokenization of S. for( i=0; i<n; i++ ) if( residual( P, ti ) != <empty/> ) return <notAllowed/> return <notAllowed/> } - Then I found that there is really no reason to prohibit a sequence of data inside <oneOreMoreToken>. And in fact it is useful. The above implementation can correctly handle <oneOrMoreToken> <group> <data type="xsd:integer"/> <value>cm</value> </group> </oneOrMoreToken> The reason why we have to prohibit a sequence of data is we can't know how to split one big character sequence into sub-sequences. But as you see, oneOrMoreToken knows how to split them. So in fact there is no problem. - For the above reasoning, there is no reason to prohibit plain <oneOrMore> within <oneOrMoreToken>. That implies <oneOrMoreToken> does not necessarily implement the "one-or-more" semantics. Instead, it can simply split one big string into sub-sequences. - This observation leads me to this proposal. regards, ---------------------- K.Kawaguchi E-Mail: kohsukekawaguchi@yahoo.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC