OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [office-comment] <text:s> element (etc) needs to be removed (ODF 1.2)


Patrick hi

>  I think you are missing the interaction of these elements with the
> white-space handling rules for paragraphs declared in 5.1.1. There are
> specific white-space rules for paragraphs and other elements. More
> specifically the rule is "collapse" so simply using additional "space
> characters" as you say, will not provide the desired result.

I hadn't missed that - but that is the other side of this bad penny :-)
I've only time for one comment per day!

I take it the intention of this construct is for situations where
somebody might want to "pretty print" an ODF instance without fouling-up
the whitespace. It is a common problem, but the ODF solution is a dumb
one.

What ODF is doing is trying to re-define how whitespace is represented
and handled in XML.

(BTW, I'm not entirely clear whether that "collapse" you mention refers
just to the rendering, or whether an ODF processor must modify the
instance; which is it? or both?)

> Granted, this is one of those cases where I personally would prefer
> draconian rules that force users into good behavior using styles but
> successful applications (and I suspect formats) accommodate the
desires
> of users, even ones that we don't experience ourselves. ;-)

Yes, well a user searching for two contiguous space characters might
expect to do so using a string function; ODF is using a non-Unicode,
non-interoperable way of representing whitespace. 

> Would the "confusion" you sense be reduced if we added text that
> specifically mentions the representation of space, tab, line-break by
> elements as avoiding the application of the white-space handling rule?

If the user wants to represent a space character, they should use a
space character. If a renderer wants to collapse those spaces, it is
free to do so. Remember XML processors will faithfully pass all
whitespace in element content through (though, as per an email of a few
days ago, the ODF spec wrongly characterises XML whitespace handling;
maybe the same misconceptions were at the root of this text:s feature).

Consider the fragment string "a   b" (that is with THREE space
characters between the "a" and the "b").

In ODF this is represented as:

<text:p>a <text:s text:c="2"/>b</text:p>

This is bad. If I ask (using XPath/XQuery etc.) for the string value of
the element I get "a b" (i.e. only one space). It is bad practice to
force processors to understand your schema's semantics for no reason
when, as you will see, there is a much more natural solution to this
problem ...

If you really want to preserve a distinction between collapsed and
preserved whitespace in such contexts, then redesign the text:s element
to use actual content and not an attribute value. So,

<text:p>a <text:s>  </text:s>b</text:p>

At least then the text nodes of your XML correspond to what you actually
want to represent (and in this case you would document <text:s> as being
an environment in which whitespace collapsing shall not occur). In this
example the XPath/XQuery string value of the <p> element is "a   b" - in
other words, it is correct!

For possible bonus points, remember that the XML Rec. specifies the
xml:space attribute for non-homegrown representation of these semantics.

<text:p xml:space='collapse'>a <text:s xml:space='preserve'>
</text:s>b</text:p>

Personally, I'd be happy if the semantics were just document though.

> They are currently subclauses of the white-space handling clause. That
> seems self-evident to me but I am way too close to the text to be a
> reliable judge on that score.

IMO <text:s> as it is, has to go. 

- Alex.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]