OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [office] Fwd: ODF spec question (white-space processing)

Hello all,

Michael Brauer wrote:
>>>>>>So  <text:p>         foo</text:p>
>>>>>>will have only one word and zero spaces in Writer.
>>>>>>I expect it to have 1 space and one word.
>> Note that this would indeed be "collapsing" (to a single space), instead of removing.
>> The spec does talk about collapsing, not about removing.
>Well, the sentence you are refering to continues with "they [white-space 
>characters] are collapsed, in other words they are processed in the same way 
>that [HTML4] processes them"
>The term "collapsed" may be a little bit unprecise here, but the essential 
>information is that they are processed as in HTML.

If a sentence says: "A, in other words B" and A and B are actually not
the same then the sentence is broken, not a little bit unprecise.

The only precise thing in that sentence is the term "collapsed. It is
well defined and used in several XML-related specs. It seems that
everybody outside of OOo & some people inside OOo (such as myself)
have all come to identical conclusions of what it might mean.

Also, 'A' is the definition and 'B' an adjunct as explanation. I
personally cannot really see how the adjunct would automatically get
precedence over the actual definition in 'A'. 

>>>Well, the intention behind the white-space processing rules is to allow 
>>>authors to pretty-print paragraph text. HTML is used as an archetype here, 
>>>because its rules do work very well in practice. It may be that we could find 
>>>some better wording for the relation of the OpenDocument white space 
>>>processing rules to HTML, but IMHO it is consistent with the HTML 
>>>specification to ignore white space characters at the paragraph start.
>> But HTML doesn't do that, and therefore OpenDocument shouldn't do it either.
>What do you think HTML is not doing? 

There is a difference between layout and content. For HTML, the
difference isn't that relevant (except e.g. for scripting), but for
OpenDocument it is rather vital.

HTML 4 does not remove whitespace from content. Ever.

It's just like, say, invisible sections: The mere fact that they are
not displayed does not mean one can just drop them from the document,
even if that looks just the same. For the same reason, one cannot take
display rules from one spec as a reason to modify content in another,
even if the latter spec is referenced from the former.

OpenDocument does remove whitespace: It specifies that whitespaces
should be collapsed. There is no rule that beginning of paragraphs
should be treated specially.

The OpenOffice.org implementation apparently introduces a third type
of behaviour: Collapse whitespaces and additionally remove the
whitespace at the beginning of paragraph elements, in such a fasion
that it more or less matches the layout result of HTML. I don't see
how the OpenDocument spec could possibly be interpreted to support
this behaviour.

(Oh, and just for amusement I'd like to mention the HTML rule of
optionally ignoring whitespace immediatly after start elements.
According to HTML, the layout results of OOo AND KOffice would BOTH be
correct. I'm fairly certain we don't want that.)

>And if I try,
><p>    Foo</p>
>in my Mozilla, it does not display a space character in front of the "Foo" - 
>just like in OpenDocument. Is your Mozilla behaving different?

In HTML/Mozilla, the first text character of that paragraph is
whitespace. If one accesses the first character in the paragraph
through JavaScript or some other DOM access, then the result will be a
whitespace character. Same in OpenDocument. In OpenOffice.org, the
first text character would be 'F'. Which is indeed a quite different

>In any case, I think it is very convenient to give authors the possibility to 
>add a line break behind the opening tag of a paragraph without influencing 
>the layout of the document. Paragraph start tags may get very long. I 
>personally wouln't like it if I would have to add a paragraph's first word 
>always immediately behind the start tag.

If you really think so, you should propose a modification to the spec
that would introduce this behaviour.

Michael, the point of a spec is to give a reasonably unambigious
definition of how the format is supposed to work. The current spec
contains a very sensible, compact, and rather unambigious rule that
whitespace is collapsed. Rather unfortunately, it also contains the
'in other words' part which was orignally meant as an explanation, but
in fact introduces something different. Thereby making the spec
anything but unambigious, as proven by this discussion.

For the various reasons given in this and previous posts, I propose
that the 'in other words' part is simply being removed from the spec.
That should fix the problem with different interpretations in a very
easy, understandable, and concise manner, and in perfect accordance
with the original intentions.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]