[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: Formal semantics of <text/>
> I think this is problematic for other > schema-applications, because number of tokens that <text/> can match > depends on the context. I don't understand the problem you perceive with the current definition. > Is it possible to change the definition of <text/> so that it can match > at most one token. I would prefer not to. <text/> is designed to support mixed content. Whereas with simple content: <foo>123</foo> it is natural to think of the content of foo as a single string, in mixed content, I think the more natural conceptual model is that the content consists of a sequence of characters. In something like: <p>This is <em>mixed</em> content</p> the string "This is " is no more a significant unit than "This " or "is". The fact that the characters occurring before the <em> are coalesced into a single string is in this context really just an artefact of the formal semantics and of the implementation. I don't like restricting <text/> to a single token, because it makes the clumping of characters into "text nodes" significant, and <text/> is designed for circumstances in which such clustering is not significant. I think of the semantics of <text/> as matching zero or more *characters*. Another issue is with whitespace. Consider: <p>Open the <a href="#file">file</a> <var>f</var>.</p> Consider the space between the "</a>" and the "<var>". This space is probably significant for the user. However, in other cases whitespace separating tags is not significant. This is a problem with a long history in SGML. TREX solves the problem by ensuring that the validation outcome for mixed content is the same whether or not whitespace such as in the above case is stripped. With your change, the following pattern would match the above example: <element name="p"> <interleave> <group> <element name="a">...</element> <element name="var">...</element> </group> <group> <text/> <text/> </group> </interleave> </element> whereas it would not match: <p>Open the <a href="#file">file</a>_<var>f</var>.</p> In other words, the semantics would become appropriate only if you assume that the space is insignificant, whereas currently the semantics are equally appropriate whether you regard the space as significant or insignificant. James
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC