OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

relax-ng message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: Re: Formal semantics of <text/>

> I think this is problematic for other
> schema-applications, because number of tokens that <text/> can match
> depends on the context.

I don't understand the problem you perceive with the current definition.

> Is it possible to change the definition of <text/> so that it can match
> at most one token.

I would prefer not to. <text/> is designed to support mixed content.
Whereas with simple content:


it is natural to think of the content of foo as a single string, in mixed
content, I think the more natural conceptual model is that the content
consists of a sequence of characters.  In something like:

  <p>This is <em>mixed</em> content</p>

the string "This is " is no more a significant unit than "This " or "is".
The fact that the characters occurring before the <em> are coalesced into a
single string is in this context really just an artefact of the formal
semantics and of the implementation.

I don't like restricting <text/> to a single token, because it makes the
clumping of characters into "text nodes" significant, and <text/> is
designed for circumstances in which such clustering is not significant.  I
think of the semantics of <text/> as matching zero or more *characters*.

Another issue is with whitespace.  Consider:

  <p>Open the <a href="#file">file</a> <var>f</var>.</p>

Consider the space between the "</a>" and the "<var>".  This space is
probably significant for the user.  However, in other cases whitespace
separating tags is not significant.  This is a problem with a long history
in SGML.  TREX solves the problem by ensuring that the validation outcome
for mixed content is the same whether or not whitespace such as in the above
case is stripped.  With your change, the following pattern would match the
above example:

<element name="p">
      <element name="a">...</element>
      <element name="var">...</element>

whereas it would not match:

 <p>Open the <a href="#file">file</a>_<var>f</var>.</p>

In other words, the semantics would become appropriate only if you assume
that the space is insignificant, whereas currently the semantics are equally
appropriate whether you regard the space as significant or insignificant.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC